~ubuntu-branches/ubuntu/precise/wget/precise-proposed

Viewing changes to .pc/CVE-2010-2252/doc/wget.texi

Committer: Bazaar Package Importer
Author(s): Steve Langasek
Date: 2011-10-19 00:00:09 UTC
mfrom: (2.1.13 sid)
Revision ID: james.westby@ubuntu.com-20111019000009-8p33w3wz4b1rdri0

Tags: 1.13-1ubuntu1

* Merge from Debian unstable, remaining changes:
  - Add wget-udeb to ship wget.gnu as alternative to busybox wget
    implementation.
  - Depend on libssl-dev 0.9.8k-7ubuntu4 (LP: #503339)
* Dropped changes, superseded in Debian:
  - Keep build dependencies in main:
    + debian/control: remove info2man build-dep
    + debian/patches/series: disable wget-infopod_generated_manpage
  - Mark wget Multi-Arch: foreign, so packages that aren't of the same arch
    can depend on it.
* Pass --with-ssl=openssl; we don't want to use gnutls, there's no udeb for
  it.
* Add a second build pass for the udeb, so we can build without libidn.

files added:
.pc/debian-changes-1.13-1

.pc/debian-changes-1.13-1/po

.pc/debian-changes-1.13-1/po/de.po

.tarball-version

.version

build-aux/snippet

build-aux/snippet/_Noreturn.h

build-aux/snippet/arg-nonnull.h

build-aux/snippet/c++defs.h

build-aux/snippet/warn-on-use.h

debian/patches/debian-changes-1.13-1

lib/accept.c

lib/alignof.h

lib/arpa_inet.in.h

lib/asnprintf.c

lib/asprintf.c

lib/basename-lgpl.c

lib/binary-io.h

lib/bind.c

lib/cloexec.c

lib/cloexec.h

lib/close.c

lib/connect.c

lib/dirname-lgpl.c

lib/dirname.h

lib/dosname.h

lib/dup-safer-flag.c

lib/dup-safer.c

lib/dup2.c

lib/fatal-signal.c

lib/fatal-signal.h

lib/fcntl.c

lib/fcntl.in.h

lib/fd-hook.c

lib/fd-hook.h

lib/fd-safer-flag.c

lib/fd-safer.c

lib/float+.h

lib/float.c

lib/float.in.h

lib/fseek.c

lib/futimens.c

lib/gai_strerror.c

lib/getaddrinfo.c

lib/getdtablesize.c

lib/getpeername.c

lib/getsockname.c

lib/gettime.c

lib/gettimeofday.c

lib/glthread

lib/glthread/lock.c

lib/glthread/lock.h

lib/glthread/threadlib.c

lib/gnulib.mk

lib/iconv.in.h

lib/inet_ntop.c

lib/ioctl.c

lib/listen.c

lib/lstat.c

lib/malloc.c

lib/mbtowc-impl.h

lib/mbtowc.c

lib/md5.c

lib/md5.h

lib/mkdir.c

lib/netdb.in.h

lib/netinet_in.in.h

lib/open.c

lib/pipe-safer.c

lib/pipe.h

lib/pipe2-safer.c

lib/pipe2.c

lib/printf-args.c

lib/printf-args.h

lib/printf-parse.c

lib/printf-parse.h

lib/rawmemchr.c

lib/rawmemchr.valgrind

lib/recv.c

lib/sched.in.h

lib/select.c

lib/send.c

lib/setsockopt.c

lib/sig-handler.h

lib/sigaction.c

lib/signal.in.h

lib/sigprocmask.c

lib/size_max.h

lib/snprintf.c

lib/socket.c

lib/sockets.c

lib/sockets.h

lib/spawn-pipe.c

lib/spawn-pipe.h

lib/spawn.in.h

lib/spawn_faction_addclose.c

lib/spawn_faction_adddup2.c

lib/spawn_faction_addopen.c

lib/spawn_faction_destroy.c

lib/spawn_faction_init.c

lib/spawn_int.h

lib/spawnattr_destroy.c

lib/spawnattr_init.c

lib/spawnattr_setflags.c

lib/spawnattr_setsigmask.c

lib/spawni.c

lib/spawnp.c

lib/stat-time.h

lib/stat.c

lib/strchrnul.c

lib/strchrnul.valgrind

lib/strerror-override.c

lib/strerror-override.h

lib/strerror_r.c

lib/stripslash.c

lib/sys_ioctl.in.h

lib/sys_select.in.h

lib/sys_socket.in.h

lib/sys_stat.in.h

lib/sys_time.in.h

lib/sys_uio.in.h

lib/sys_wait.in.h

lib/time.in.h

lib/timespec.h

lib/unistd--.h

lib/unistd-safer.h

lib/unlocked-io.h

lib/utimens.c

lib/utimens.h

lib/vasnprintf.c

lib/vasnprintf.h

lib/vasprintf.c

lib/w32sock.h

lib/w32spawn.h

lib/wait-process.c

lib/wait-process.h

lib/waitpid.c

lib/write.c

lib/xalloc-oversized.h

lib/xsize.h

m4/arpa_inet_h.m4

m4/asm-underscore.m4

m4/clock_time.m4

m4/close.m4

m4/configmake.m4

m4/dirname.m4

m4/double-slash-root.m4

m4/dup2.m4

m4/environ.m4

m4/fatal-signal.m4

m4/fcntl-o.m4

m4/fcntl.m4

m4/fcntl_h.m4

m4/float_h.m4

m4/fseek.m4

m4/futimens.m4

m4/getaddrinfo.m4

m4/getdtablesize.m4

m4/gettime.m4

m4/gettimeofday.m4

m4/hostent.m4

m4/iconv_h.m4

m4/inet_ntop.m4

m4/intlmacosx.m4

m4/intmax_t.m4

m4/inttypes_h.m4

m4/ioctl.m4

m4/largefile.m4

m4/lock.m4

m4/lstat.m4

m4/mbtowc.m4

m4/md5.m4

m4/mkdir.m4

m4/mode_t.m4

m4/netdb_h.m4

m4/netinet_in_h.m4

m4/nocrash.m4

m4/open.m4

m4/pipe2.m4

m4/posix_spawn.m4

m4/printf.m4

m4/rawmemchr.m4

m4/sched_h.m4

m4/select.m4

m4/servent.m4

m4/sig_atomic_t.m4

m4/sigaction.m4

m4/signal_h.m4

m4/signalblocking.m4

m4/sigpipe.m4

m4/size_max.m4

m4/snprintf.m4

m4/socketlib.m4

m4/sockets.m4

m4/socklen.m4

m4/sockpfaf.m4

m4/spawn-pipe.m4

m4/spawn_h.m4

m4/stat-time.m4

m4/stat.m4

m4/stdint_h.m4

m4/strchrnul.m4

m4/strerror_r.m4

m4/sys_ioctl_h.m4

m4/sys_select_h.m4

m4/sys_socket_h.m4

m4/sys_stat_h.m4

m4/sys_time_h.m4

m4/sys_uio_h.m4

m4/sys_wait_h.m4

m4/threadlib.m4

m4/time_h.m4

m4/timespec.m4

m4/unistd-safer.m4

m4/unlocked-io.m4

m4/utimbuf.m4

m4/utimens.m4

m4/utimes.m4

m4/vasnprintf.m4

m4/vasprintf.m4

m4/wait-process.m4

m4/waitpid.m4

m4/warn-on-use.m4

m4/wchar_h.m4

m4/wctype_h.m4

m4/write.m4

m4/xsize.m4

po/LINGUAS

tests/Test-auth-retcode.px

tests/Test-i-ftp.px

tests/Test-i-http.px

tests/Test-idn-cmd-utf8.px

tests/Test-idn-robots-utf8.px

files removed:
.pc/CVE-2010-2252

.pc/CVE-2010-2252/doc

.pc/CVE-2010-2252/doc/wget.texi

.pc/CVE-2010-2252/src

.pc/CVE-2010-2252/src/http.c

.pc/CVE-2010-2252/src/http.h

.pc/CVE-2010-2252/src/init.c

.pc/CVE-2010-2252/src/main.c

.pc/CVE-2010-2252/src/options.h

.pc/CVE-2010-2252/src/retr.c

.pc/disable-SSLv2

.pc/disable-SSLv2/doc

.pc/disable-SSLv2/doc/wget.texi

.pc/disable-SSLv2/po

.pc/disable-SSLv2/po/ca.po

.pc/disable-SSLv2/po/cs.po

.pc/disable-SSLv2/po/de.po

.pc/disable-SSLv2/po/es.po

.pc/disable-SSLv2/po/et.po

.pc/disable-SSLv2/po/fi.po

.pc/disable-SSLv2/po/fr.po

.pc/disable-SSLv2/po/ga.po

.pc/disable-SSLv2/po/hr.po

.pc/disable-SSLv2/po/hu.po

.pc/disable-SSLv2/po/id.po

.pc/disable-SSLv2/po/it.po

.pc/disable-SSLv2/po/ja.po

.pc/disable-SSLv2/po/lt.po

.pc/disable-SSLv2/po/nl.po

.pc/disable-SSLv2/po/pl.po

.pc/disable-SSLv2/po/pt.po

.pc/disable-SSLv2/po/pt_BR.po

.pc/disable-SSLv2/po/ru.po

.pc/disable-SSLv2/po/sk.po

.pc/disable-SSLv2/po/sl.po

.pc/disable-SSLv2/po/sv.po

.pc/disable-SSLv2/po/tr.po

.pc/disable-SSLv2/po/vi.po

.pc/disable-SSLv2/po/zh_CN.po

.pc/disable-SSLv2/po/zh_TW.po

.pc/disable-SSLv2/src

.pc/disable-SSLv2/src/init.c

.pc/disable-SSLv2/src/main.c

.pc/disable-SSLv2/src/openssl.c

.pc/fix-paramter-spelling-error-in-wget.texi

.pc/fix-paramter-spelling-error-in-wget.texi/doc

.pc/fix-paramter-spelling-error-in-wget.texi/doc/wget.texi

.pc/refresh-pofiles

.pc/refresh-pofiles/po

.pc/refresh-pofiles/po/be.po

.pc/refresh-pofiles/po/bg.po

.pc/refresh-pofiles/po/ca.po

.pc/refresh-pofiles/po/cs.po

.pc/refresh-pofiles/po/da.po

.pc/refresh-pofiles/po/de.po

.pc/refresh-pofiles/po/el.po

.pc/refresh-pofiles/po/en_GB.po

.pc/refresh-pofiles/po/en_US.po

.pc/refresh-pofiles/po/eo.po

.pc/refresh-pofiles/po/es.po

.pc/refresh-pofiles/po/et.po

.pc/refresh-pofiles/po/eu.po

.pc/refresh-pofiles/po/fi.po

.pc/refresh-pofiles/po/fr.po

.pc/refresh-pofiles/po/ga.po

.pc/refresh-pofiles/po/gl.po

.pc/refresh-pofiles/po/he.po

.pc/refresh-pofiles/po/hr.po

.pc/refresh-pofiles/po/hu.po

.pc/refresh-pofiles/po/id.po

.pc/refresh-pofiles/po/it.po

.pc/refresh-pofiles/po/ja.po

.pc/refresh-pofiles/po/lt.po

.pc/refresh-pofiles/po/nb.po

.pc/refresh-pofiles/po/nl.po

.pc/refresh-pofiles/po/pl.po

.pc/refresh-pofiles/po/pt.po

.pc/refresh-pofiles/po/pt_BR.po

.pc/refresh-pofiles/po/ro.po

.pc/refresh-pofiles/po/ru.po

.pc/refresh-pofiles/po/sk.po

.pc/refresh-pofiles/po/sl.po

.pc/refresh-pofiles/po/sr.po

.pc/refresh-pofiles/po/sv.po

.pc/refresh-pofiles/po/tr.po

.pc/refresh-pofiles/po/uk.po

.pc/refresh-pofiles/po/vi.po

.pc/refresh-pofiles/po/zh_CN.po

.pc/refresh-pofiles/po/zh_TW.po

.pc/wget-de.po-remove-double-quote-signs

.pc/wget-de.po-remove-double-quote-signs/po

.pc/wget-de.po-remove-double-quote-signs/po/de.po

.pc/wget-zh_CN.po-translation-correction

.pc/wget-zh_CN.po-translation-correction/po

.pc/wget-zh_CN.po-translation-correction/po/zh_CN.po

autogen.sh

build-aux/link-warning.h

build-aux/mkinstalldirs

configure.bat

debian/patches/CVE-2010-2252

debian/patches/fix-paramter-spelling-error-in-wget.texi

debian/patches/refresh-pofiles

debian/patches/wget-de.po-remove-double-quote-signs

debian/patches/wget-infopod_generated_manpage

debian/patches/wget-zh_CN.po-translation-correction

lib/getpagesize.c

lib/strcasecmp.c

lib/strings.in.h

lib/strncasecmp.c

m4/exitfail.m4

m4/getpagesize.m4

m4/strcase.m4

m4/strings_h.m4

m4/wchar.m4

m4/wctype.m4

md5/Makefile.am

md5/Makefile.in

md5/dummy.c

md5/m4

md5/m4/gnulib-cache.m4

md5/m4/gnulib-comp.m4

md5/m4/md5.m4

md5/md5.c

md5/md5.h

md5/stddef.in.h

md5/stdint.in.h

md5/wchar.in.h

po/en@boldquot.gmo

po/en@boldquot.po

po/en@quot.gmo

po/en@quot.po

po/en_US.gmo

po/en_US.po

src/gen-md5.c

src/gen-md5.h

src/snprintf.c

windows

windows/ChangeLog

windows/Makefile.am

windows/Makefile.doc

windows/Makefile.in

windows/Makefile.src

windows/Makefile.src.bor

windows/Makefile.src.mingw

windows/Makefile.top

windows/Makefile.top.bor

windows/Makefile.top.mingw

windows/README

windows/config-compiler.h

windows/config.h

files modified:
.pc/applied-patches

.pc/wget-doc-remove-usr-local-in-wget.texi/doc/wget.texi

.pc/wget-fr.po-spelling-correction/po/fr.po

AUTHORS

ChangeLog

GNUmakefile

INSTALL

Makefile.am

Makefile.in

NEWS

README

aclocal.m4

build-aux/announce-gen

build-aux/build_info.pl

build-aux/compile

build-aux/config.guess

build-aux/config.rpath

build-aux/config.sub

build-aux/depcomp

build-aux/gnupload

build-aux/install-sh

build-aux/mdate-sh

build-aux/missing

build-aux/texinfo.tex

build-aux/update-copyright

build-aux/useless-if-before-free

build-aux/vc-list-files

build-aux/ylwrap

configure

configure.ac

debian/changelog

debian/control

debian/copyright

debian/patches/series

debian/patches/wget-doc-remove-usr-local-in-wget.texi

debian/patches/wget-fr.po-spelling-correction

debian/rules

doc/ChangeLog

doc/Makefile.am

doc/Makefile.in

doc/fdl.texi

doc/stamp-vti

doc/texi2pod.pl

doc/version.texi

doc/wget.info

doc/wget.texi

lib/Makefile.am

lib/Makefile.in

lib/alloca.c

lib/alloca.in.h

lib/c-ctype.c

lib/c-ctype.h

lib/config.charset *

lib/errno.in.h

lib/error.c

lib/error.h

lib/exitfail.c

lib/exitfail.h

lib/fseeko.c

lib/getdelim.c

lib/getline.c

lib/getopt.c

lib/getopt.in.h

lib/getopt1.c

lib/getopt_int.h

lib/getpass.c

lib/getpass.h

lib/gettext.h

lib/intprops.h

lib/localcharset.c

lib/localcharset.h

lib/lseek.c

lib/mbrtowc.c

lib/mbsinit.c

lib/memchr.c

lib/quote.c

lib/quote.h

lib/quotearg.c

lib/quotearg.h

lib/realloc.c

lib/ref-add.sin

lib/ref-del.sin

lib/stdbool.in.h

lib/stddef.in.h

lib/stdint.in.h

lib/stdio-impl.h

lib/stdio-write.c

lib/stdio.in.h

lib/stdlib.in.h

lib/str-two-way.h

lib/strcasestr.c

lib/streq.h

lib/strerror.c

lib/string.in.h

lib/unistd.in.h

lib/verify.h

lib/wchar.in.h

lib/wctype.in.h

lib/xalloc-die.c

lib/xalloc.h

lib/xmalloc.c

m4/00gnulib.m4

m4/alloca.m4

m4/codeset.m4

m4/errno_h.m4

m4/error.m4

m4/extensions.m4

m4/fseeko.m4

m4/getdelim.m4

m4/getline.m4

m4/getopt.m4

m4/getpass.m4

m4/gettext.m4

m4/glibc21.m4

m4/gnulib-common.m4

m4/gnulib-comp.m4

m4/iconv.m4

m4/include_next.m4

m4/inline.m4

m4/lib-ld.m4

m4/lib-link.m4

m4/lib-prefix.m4

m4/localcharset.m4

m4/locale-fr.m4

m4/locale-ja.m4

m4/locale-zh.m4

m4/longlong.m4

m4/lseek.m4

m4/malloc.m4

m4/mbrtowc.m4

m4/mbsinit.m4

m4/mbstate_t.m4

m4/memchr.m4

m4/mmap-anon.m4

m4/multiarch.m4

m4/nls.m4

m4/po.m4

m4/quote.m4

m4/quotearg.m4

m4/realloc.m4

m4/stdbool.m4

m4/stddef_h.m4

m4/stdint.m4

m4/stdio_h.m4

m4/stdlib_h.m4

m4/strcasestr.m4

m4/strerror.m4

m4/string_h.m4

m4/unistd_h.m4

m4/wchar_t.m4

m4/wget.m4

m4/wint_t.m4

m4/xalloc.m4

maint.mk

msdos/config.h

po/Makefile.in.in

po/Makevars

po/POTFILES.in

po/Rules-quot

po/be.gmo

po/be.po

po/bg.gmo

po/bg.po

po/boldquot.sed

po/ca.gmo

po/ca.po

po/cs.gmo

po/cs.po

po/da.gmo

po/da.po

po/de.gmo

po/de.po

po/el.gmo

po/el.po

po/en_GB.gmo

po/en_GB.po

po/eo.gmo

po/eo.po

po/es.gmo

po/es.po

po/et.gmo

po/et.po

po/eu.gmo

po/eu.po

po/fi.gmo

po/fi.po

po/fr.gmo

po/fr.po

po/ga.gmo

po/ga.po

po/gl.gmo

po/gl.po

po/he.gmo

po/he.po

po/hr.gmo

po/hr.po

po/hu.gmo

po/hu.po

po/id.gmo

po/id.po

po/it.gmo

po/it.po

po/ja.gmo

po/ja.po

po/lt.gmo

po/lt.po

po/nb.gmo

po/nb.po

po/nl.gmo

po/nl.po

po/pl.gmo

po/pl.po

po/pt.gmo

po/pt.po

po/pt_BR.gmo

po/pt_BR.po

po/quot.sed

po/ro.gmo

po/ro.po

po/ru.gmo

po/ru.po

po/sk.gmo

po/sk.po

po/sl.gmo

po/sl.po

po/sr.gmo

po/sr.po

po/sv.gmo

po/sv.po

po/tr.gmo

po/tr.po

po/uk.gmo

po/uk.po

po/vi.gmo

po/vi.po

po/wget.pot

po/zh_CN.gmo

po/zh_CN.po

po/zh_TW.gmo

po/zh_TW.po

src/ChangeLog

src/Makefile.am

src/Makefile.in

src/build_info.c

src/build_info.c.in

src/cmpt.c

src/config.h.in

src/connect.c

src/connect.h

src/convert.c

src/convert.h

src/cookies.c

src/cookies.h

src/css-tokens.h

src/css-url.c

src/css-url.h

src/css.c

src/css.l

src/exits.c

src/exits.h

src/ftp-basic.c

src/ftp-ls.c

src/ftp-opie.c

src/ftp.c

src/ftp.h

src/gettext.h

src/gnutls.c

src/hash.c

src/hash.h

src/host.c

src/host.h

src/html-parse.c

src/html-parse.h

src/html-url.c

src/html-url.h

src/http-ntlm.c

src/http-ntlm.h

src/http.c

src/http.h

src/init.c

src/init.h

src/iri.c

src/iri.h

src/log.c

src/log.h

src/main.c

src/mswindows.c

src/mswindows.h

src/netrc.c

src/netrc.h

src/openssl.c

src/options.h

src/progress.c

src/progress.h

src/ptimer.c

src/ptimer.h

src/recur.c

src/recur.h

src/res.c

src/res.h

src/retr.c

src/retr.h

src/spider.c

src/spider.h

src/ssl.h

src/sysdep.h

src/test.c

src/test.h

src/url.c

src/url.h

src/utils.c

src/utils.h

src/wget.h

tests/ChangeLog

tests/FTPServer.pm

tests/Makefile.am

tests/Makefile.in

tests/Test--no-content-disposition-trivial.px

tests/Test--no-content-disposition.px

tests/Test--spider-fail.px

tests/Test--spider-r--no-content-disposition-trivial.px

tests/Test--spider-r--no-content-disposition.px

tests/Test--spider-r-HTTP-Content-Disposition.px

tests/Test--spider-r.px

tests/Test--spider.px

tests/Test-E-k-K.px

tests/Test-E-k.px

tests/Test-HTTP-Content-Disposition-1.px

tests/Test-HTTP-Content-Disposition-2.px

tests/Test-HTTP-Content-Disposition.px

tests/Test-N--no-content-disposition-trivial.px

tests/Test-N--no-content-disposition.px

tests/Test-N-HTTP-Content-Disposition.px

tests/Test-N-current.px

tests/Test-N-no-info.px

tests/Test-N-old.px

tests/Test-N-smaller.px

tests/Test-N.px

tests/Test-O--no-content-disposition-trivial.px

tests/Test-O--no-content-disposition.px

tests/Test-O-HTTP-Content-Disposition.px

tests/Test-O-nc.px

tests/Test-O-nonexisting.px

tests/Test-O.px

tests/Test-Restrict-Lowercase.px

tests/Test-Restrict-Uppercase.px

tests/Test-auth-basic.px

tests/Test-auth-no-challenge-url.px

tests/Test-auth-no-challenge.px

tests/Test-auth-with-content-disposition.px

tests/Test-c-full.px

tests/Test-c-partial.px

tests/Test-c-shorter.px

tests/Test-c.px

tests/Test-cookies-401.px

tests/Test-cookies.px

tests/Test-ftp-bad-list.px

tests/Test-ftp-iri-disabled.px

tests/Test-ftp-iri-fallback.px

tests/Test-ftp-iri-recursive.px

tests/Test-ftp-iri.px

tests/Test-ftp-pasv-fail.px

tests/Test-ftp-recursive.px

tests/Test-ftp.px

tests/Test-idn-cmd.px

tests/Test-idn-headers.px

tests/Test-idn-meta.px

tests/Test-idn-robots.px

tests/Test-iri-disabled.px

tests/Test-iri-forced-remote.px

tests/Test-iri-list.px

tests/Test-iri-percent.px

tests/Test-iri.px

tests/Test-k.px

tests/Test-meta-robots.px

tests/Test-nonexisting-quiet.px

tests/Test-noop.px

tests/Test-np.px

tests/Test-proxied-https-auth.px

tests/Test-proxy-auth-basic.px

tests/Test-restrict-ascii.px

tests/run-px

util/Makefile.am

util/Makefile.in

util/rmold.pl

Show diffs side-by-side

added added

removed removed

.pc/CVE-2010-2252/doc/wget.texi

\input texinfo @c -*-texinfo-*-

@c %**start of header

@setfilename wget.info

@include version.texi

@settitle GNU Wget @value{VERSION} Manual

@c Disable the monstrous rectangles beside overfull hbox-es.

@finalout

@c Use `odd' to print double-sided.

@setchapternewpage on

@c %**end of header

@iftex

@c Remove this if you don't use A4 paper.

@afourpaper

@end iftex

@c Title for man page. The weird way texi2pod.pl is written requires

@c the preceding @set.

@set Wget Wget

@c man title Wget The non-interactive network downloader.

@dircategory Network Applications

@direntry

* Wget: (wget). The non-interactive network downloader.

@end direntry

@copying

This file documents the GNU Wget utility for downloading network

data.

@c man begin COPYRIGHT

2004, 2005, 2006, 2007, 2008, 2009 Free Software Foundation, Inc.

@iftex

Permission is granted to make and distribute verbatim copies of

this manual provided the copyright notice and this permission notice

are preserved on all copies.

@end iftex

@ignore

Permission is granted to process this file through TeX and print the

results, provided the printed document carries a copying permission

notice identical to this one except for the removal of this paragraph

(this paragraph not being relevant to the printed manual).

@end ignore

Permission is granted to copy, distribute and/or modify this document

under the terms of the GNU Free Documentation License, Version 1.2 or

any later version published by the Free Software Foundation; with no

Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A

copy of the license is included in the section entitled ``GNU Free

Documentation License''.

@c man end

@end copying

@titlepage

@title GNU Wget @value{VERSION}

@subtitle The non-interactive download utility

@subtitle Updated for Wget @value{VERSION}, @value{UPDATED}

@author by Hrvoje Nik@v{s}i@'{c} and others

@ignore

@c man begin AUTHOR

Originally written by Hrvoje Niksic <hniksic@xemacs.org>.

Currently maintained by Micah Cowan <micah@cowan.name>.

@c man end

@c man begin SEEALSO

This is @strong{not} the complete manual for GNU Wget.

For more complete information, including more detailed explanations of

some of the options, and a number of commands available

for use with @file{.wgetrc} files and the @samp{-e} option, see the GNU

Info entry for @file{wget}.

@c man end

@end ignore

@page

@vskip 0pt plus 1filll

@insertcopying

@end titlepage

@contents

@ifnottex

@node Top, Overview, (dir), (dir)

@top Wget @value{VERSION}

@insertcopying

@end ifnottex

@menu

* Overview:: Features of Wget.

* Invoking:: Wget command-line arguments.

* Recursive Download:: Downloading interlinked pages.

* Following Links:: The available methods of chasing links.

* Time-Stamping:: Mirroring according to time-stamps.

* Startup File:: Wget's initialization file.

* Examples:: Examples of usage.

* Various:: The stuff that doesn't fit anywhere else.

100

* Appendices:: Some useful references.

101

* Copying this manual:: You may give out copies of this manual.

102

* Concept Index:: Topics covered by this manual.

103

@end menu

104

105

@node Overview, Invoking, Top, Top

106

@chapter Overview

107

@cindex overview

108

@cindex features

109

110

@c man begin DESCRIPTION

111

GNU Wget is a free utility for non-interactive download of files from

112

the Web. It supports @sc{http}, @sc{https}, and @sc{ftp} protocols, as

113

well as retrieval through @sc{http} proxies.

114

115

@c man end

116

This chapter is a partial overview of Wget's features.

117

118

@itemize @bullet

119

@item

120

@c man begin DESCRIPTION

121

Wget is non-interactive, meaning that it can work in the background,

122

while the user is not logged on. This allows you to start a retrieval

123

and disconnect from the system, letting Wget finish the work. By

124

contrast, most of the Web browsers require constant user's presence,

125

which can be a great hindrance when transferring a lot of data.

126

@c man end

127

128

@item

129

@ignore

130

@c man begin DESCRIPTION

131

132

@c man end

133

@end ignore

134

@c man begin DESCRIPTION

135

Wget can follow links in @sc{html}, @sc{xhtml}, and @sc{css} pages, to

136

create local versions of remote web sites, fully recreating the

137

directory structure of the original site. This is sometimes referred to

138

as ``recursive downloading.'' While doing that, Wget respects the Robot

139

Exclusion Standard (@file{/robots.txt}). Wget can be instructed to

140

convert the links in downloaded files to point at the local files, for

141

offline viewing.

142

@c man end

143

144

@item

145

File name wildcard matching and recursive mirroring of directories are

146

available when retrieving via @sc{ftp}. Wget can read the time-stamp

147

information given by both @sc{http} and @sc{ftp} servers, and store it

148

locally. Thus Wget can see if the remote file has changed since last

149

retrieval, and automatically retrieve the new version if it has. This

150

makes Wget suitable for mirroring of @sc{ftp} sites, as well as home

151

pages.

152

153

@item

154

@ignore

155

@c man begin DESCRIPTION

156

157

@c man end

158

@end ignore

159

@c man begin DESCRIPTION

160

Wget has been designed for robustness over slow or unstable network

161

connections; if a download fails due to a network problem, it will

162

keep retrying until the whole file has been retrieved. If the server

163

supports regetting, it will instruct the server to continue the

164

download from where it left off.

165

@c man end

166

167

@item

168

Wget supports proxy servers, which can lighten the network load, speed

169

up retrieval and provide access behind firewalls. Wget uses the passive

170

@sc{ftp} downloading by default, active @sc{ftp} being an option.

171

172

@item

173

Wget supports IP version 6, the next generation of IP. IPv6 is

174

autodetected at compile-time, and can be disabled at either build or

175

run time. Binaries built with IPv6 support work well in both

176

IPv4-only and dual family environments.

177

178

@item

179

Built-in features offer mechanisms to tune which links you wish to follow

180

(@pxref{Following Links}).

181

182

@item

183

The progress of individual downloads is traced using a progress gauge.

184

Interactive downloads are tracked using a ``thermometer''-style gauge,

185

whereas non-interactive ones are traced with dots, each dot

186

representing a fixed amount of data received (1KB by default). Either

187

gauge can be customized to your preferences.

188

189

@item

190

Most of the features are fully configurable, either through command line

191

options, or via the initialization file @file{.wgetrc} (@pxref{Startup

192

File}). Wget allows you to define @dfn{global} startup files

193

(@file{/etc/wgetrc} by default) for site settings.

194

195

@ignore

196

@c man begin FILES

197

@table @samp

198

@item /etc/wgetrc

199

Default location of the @dfn{global} startup file.

200

201

@item .wgetrc

202

User startup file.

203

@end table

204

@c man end

205

@end ignore

206

207

@item

208

Finally, GNU Wget is free software. This means that everyone may use

209

it, redistribute it and/or modify it under the terms of the GNU General

210

Public License, as published by the Free Software Foundation (see the

211

file @file{COPYING} that came with GNU Wget, for details).

212

@end itemize

213

214

@node Invoking, Recursive Download, Overview, Top

215

@chapter Invoking

216

@cindex invoking

217

@cindex command line

218

@cindex arguments

219

@cindex nohup

220

221

By default, Wget is very simple to invoke. The basic syntax is:

222

223

@example

224

@c man begin SYNOPSIS

225

wget [@var{option}]@dots{} [@var{URL}]@dots{}

226

@c man end

227

@end example

228

229

Wget will simply download all the @sc{url}s specified on the command

230

line. @var{URL} is a @dfn{Uniform Resource Locator}, as defined below.

231

232

However, you may wish to change some of the default parameters of

233

Wget. You can do it two ways: permanently, adding the appropriate

234

command to @file{.wgetrc} (@pxref{Startup File}), or specifying it on

235

the command line.

236

237

@menu

238

* URL Format::

239

* Option Syntax::

240

* Basic Startup Options::

241

* Logging and Input File Options::

242

* Download Options::

243

* Directory Options::

244

* HTTP Options::

245

* HTTPS (SSL/TLS) Options::

246

* FTP Options::

247

* Recursive Retrieval Options::

248

* Recursive Accept/Reject Options::

249

* Exit Status::

250

@end menu

251

252

@node URL Format, Option Syntax, Invoking, Invoking

253

@section URL Format

254

@cindex URL

255

@cindex URL syntax

256

257

@dfn{URL} is an acronym for Uniform Resource Locator. A uniform

258

resource locator is a compact string representation for a resource

259

available via the Internet. Wget recognizes the @sc{url} syntax as per

260

@sc{rfc1738}. This is the most widely used form (square brackets denote

261

optional parts):

262

263

@example

264

http://host[:port]/directory/file

265

ftp://host[:port]/directory/file

266

@end example

267

268

You can also encode your username and password within a @sc{url}:

269

270

@example

271

ftp://user:password@@host/path

272

http://user:password@@host/path

273

@end example

274

275

Either @var{user} or @var{password}, or both, may be left out. If you

276

leave out either the @sc{http} username or password, no authentication

277

will be sent. If you leave out the @sc{ftp} username, @samp{anonymous}

278

will be used. If you leave out the @sc{ftp} password, your email

279

address will be supplied as a default password.@footnote{If you have a

280

@file{.netrc} file in your home directory, password will also be

281

searched for there.}

282

283

@strong{Important Note}: if you specify a password-containing @sc{url}

284

on the command line, the username and password will be plainly visible

285

to all users on the system, by way of @code{ps}. On multi-user systems,

286

this is a big security risk. To work around it, use @code{wget -i -}

287

and feed the @sc{url}s to Wget's standard input, each on a separate

288

line, terminated by @kbd{C-d}.

289

290

You can encode unsafe characters in a @sc{url} as @samp{%xy}, @code{xy}

291

being the hexadecimal representation of the character's @sc{ascii}

292

value. Some common unsafe characters include @samp{%} (quoted as

293

@samp{%25}), @samp{:} (quoted as @samp{%3A}), and @samp{@@} (quoted as

294

@samp{%40}). Refer to @sc{rfc1738} for a comprehensive list of unsafe

295

characters.

296

297

Wget also supports the @code{type} feature for @sc{ftp} @sc{url}s. By

298

default, @sc{ftp} documents are retrieved in the binary mode (type

299

@samp{i}), which means that they are downloaded unchanged. Another

300

useful mode is the @samp{a} (@dfn{ASCII}) mode, which converts the line

301

delimiters between the different operating systems, and is thus useful

302

for text files. Here is an example:

303

304

@example

305

ftp://host/directory/file;type=a

306

@end example

307

308

Two alternative variants of @sc{url} specification are also supported,

309

because of historical (hysterical?) reasons and their widespreaded use.

310

311

@sc{ftp}-only syntax (supported by @code{NcFTP}):

312

@example

313

host:/dir/file

314

@end example

315

316

@sc{http}-only syntax (introduced by @code{Netscape}):

317

@example

318

host[:port]/dir/file

319

@end example

320

321

These two alternative forms are deprecated, and may cease being

322

supported in the future.

323

324

If you do not understand the difference between these notations, or do

325

not know which one to use, just use the plain ordinary format you use

326

with your favorite browser, like @code{Lynx} or @code{Netscape}.

327

328

@c man begin OPTIONS

329

330

@node Option Syntax, Basic Startup Options, URL Format, Invoking

331

@section Option Syntax

332

@cindex option syntax

333

@cindex syntax of options

334

335

Since Wget uses GNU getopt to process command-line arguments, every

336

option has a long form along with the short one. Long options are

337

more convenient to remember, but take time to type. You may freely

338

mix different option styles, or specify options after the command-line

339

arguments. Thus you may write:

340

341

@example

342

wget -r --tries=10 http://fly.srk.fer.hr/ -o log

343

@end example

344

345

The space between the option accepting an argument and the argument may

346

be omitted. Instead of @samp{-o log} you can write @samp{-olog}.

347

348

You may put several options that do not require arguments together,

349

like:

350

351

@example

352

wget -drc @var{URL}

353

@end example

354

355

This is completely equivalent to:

356

357

@example

358

wget -d -r -c @var{URL}

359

@end example

360

361

Since the options can be specified after the arguments, you may

362

terminate them with @samp{--}. So the following will try to download

363

@sc{url} @samp{-x}, reporting failure to @file{log}:

364

365

@example

366

wget -o log -- -x

367

@end example

368

369

The options that accept comma-separated lists all respect the convention

370

that specifying an empty list clears its value. This can be useful to

371

clear the @file{.wgetrc} settings. For instance, if your @file{.wgetrc}

372

sets @code{exclude_directories} to @file{/cgi-bin}, the following

373

example will first reset it, and then set it to exclude @file{/~nobody}

374

and @file{/~somebody}. You can also clear the lists in @file{.wgetrc}

375

(@pxref{Wgetrc Syntax}).

376

377

@example

378

wget -X '' -X /~nobody,/~somebody

379

@end example

380

381

Most options that do not accept arguments are @dfn{boolean} options,

382

so named because their state can be captured with a yes-or-no

383

(``boolean'') variable. For example, @samp{--follow-ftp} tells Wget

384

to follow FTP links from HTML files and, on the other hand,

385

@samp{--no-glob} tells it not to perform file globbing on FTP URLs. A

386

boolean option is either @dfn{affirmative} or @dfn{negative}

387

(beginning with @samp{--no}). All such options share several

388

properties.

389

390

Unless stated otherwise, it is assumed that the default behavior is

391

the opposite of what the option accomplishes. For example, the

392

documented existence of @samp{--follow-ftp} assumes that the default

393

is to @emph{not} follow FTP links from HTML pages.

394

395

Affirmative options can be negated by prepending the @samp{--no-} to

396

the option name; negative options can be negated by omitting the

397

@samp{--no-} prefix. This might seem superfluous---if the default for

398

an affirmative option is to not do something, then why provide a way

399

to explicitly turn it off? But the startup file may in fact change

400

the default. For instance, using @code{follow_ftp = on} in

401

@file{.wgetrc} makes Wget @emph{follow} FTP links by default, and

402

using @samp{--no-follow-ftp} is the only way to restore the factory

403

default from the command line.

404

405

@node Basic Startup Options, Logging and Input File Options, Option Syntax, Invoking

406

@section Basic Startup Options

407

408

@table @samp

409

@item -V

410

@itemx --version

411

Display the version of Wget.

412

413

@item -h

414

@itemx --help

415

Print a help message describing all of Wget's command-line options.

416

417

@item -b

418

@itemx --background

419

Go to background immediately after startup. If no output file is

420

specified via the @samp{-o}, output is redirected to @file{wget-log}.

421

422

@cindex execute wgetrc command

423

@item -e @var{command}

424

@itemx --execute @var{command}

425

Execute @var{command} as if it were a part of @file{.wgetrc}

426

(@pxref{Startup File}). A command thus invoked will be executed

427

@emph{after} the commands in @file{.wgetrc}, thus taking precedence over

428

them. If you need to specify more than one wgetrc command, use multiple

429

instances of @samp{-e}.

430

431

@end table

432

433

@node Logging and Input File Options, Download Options, Basic Startup Options, Invoking

434

@section Logging and Input File Options

435

436

@table @samp

437

@cindex output file

438

@cindex log file

439

@item -o @var{logfile}

440

@itemx --output-file=@var{logfile}

441

Log all messages to @var{logfile}. The messages are normally reported

442

to standard error.

443

444

@cindex append to log

445

@item -a @var{logfile}

446

@itemx --append-output=@var{logfile}

447

Append to @var{logfile}. This is the same as @samp{-o}, only it appends

448

to @var{logfile} instead of overwriting the old log file. If

449

@var{logfile} does not exist, a new file is created.

450

451

@cindex debug

452

@item -d

453

@itemx --debug

454

Turn on debug output, meaning various information important to the

455

developers of Wget if it does not work properly. Your system

456

administrator may have chosen to compile Wget without debug support, in

457

which case @samp{-d} will not work. Please note that compiling with

458

debug support is always safe---Wget compiled with the debug support will

459

@emph{not} print any debug info unless requested with @samp{-d}.

460

@xref{Reporting Bugs}, for more information on how to use @samp{-d} for

461

sending bug reports.

462

463

@cindex quiet

464

@item -q

465

@itemx --quiet

466

Turn off Wget's output.

467

468

@cindex verbose

469

@item -v

470

@itemx --verbose

471

Turn on verbose output, with all the available data. The default output

472

is verbose.

473

474

@item -nv

475

@itemx --no-verbose

476

Turn off verbose without being completely quiet (use @samp{-q} for

477

that), which means that error messages and basic information still get

478

printed.

479

480

@cindex input-file

481

@item -i @var{file}

482

@itemx --input-file=@var{file}

483

Read @sc{url}s from a local or external @var{file}. If @samp{-} is

484

specified as @var{file}, @sc{url}s are read from the standard input.

485

(Use @samp{./-} to read from a file literally named @samp{-}.)

486

487

If this function is used, no @sc{url}s need be present on the command

488

line. If there are @sc{url}s both on the command line and in an input

489

file, those on the command lines will be the first ones to be

490

retrieved. If @samp{--force-html} is not specified, then @var{file}

491

should consist of a series of URLs, one per line.

492

493

However, if you specify @samp{--force-html}, the document will be

494

regarded as @samp{html}. In that case you may have problems with

495

relative links, which you can solve either by adding @code{<base

496

href="@var{url}">} to the documents or by specifying

497

@samp{--base=@var{url}} on the command line.

498

499

If the @var{file} is an external one, the document will be automatically

500

treated as @samp{html} if the Content-Type matches @samp{text/html}.

501

Furthermore, the @var{file}'s location will be implicitly used as base

502

href if none was specified.

503

504

@cindex force html

505

@item -F

506

@itemx --force-html

507

When input is read from a file, force it to be treated as an @sc{html}

508

file. This enables you to retrieve relative links from existing

509

@sc{html} files on your local disk, by adding @code{<base

510

href="@var{url}">} to @sc{html}, or using the @samp{--base} command-line

511

option.

512

513

@cindex base for relative links in input file

514

@item -B @var{URL}

515

@itemx --base=@var{URL}

516

Resolves relative links using @var{URL} as the point of reference,

517

when reading links from an HTML file specified via the

518

@samp{-i}/@samp{--input-file} option (together with

519

@samp{--force-html}, or when the input file was fetched remotely from

520

a server describing it as @sc{html}). This is equivalent to the

521

presence of a @code{BASE} tag in the @sc{html} input file, with

522

@var{URL} as the value for the @code{href} attribute.

523

524

For instance, if you specify @samp{http://foo/bar/a.html} for

525

@var{URL}, and Wget reads @samp{../baz/b.html} from the input file, it

526

would be resolved to @samp{http://foo/baz/b.html}.

527

@end table

528

529

@node Download Options, Directory Options, Logging and Input File Options, Invoking

530

@section Download Options

531

532

@table @samp

533

@cindex bind address

534

@cindex client IP address

535

@cindex IP address, client

536

@item --bind-address=@var{ADDRESS}

537

When making client TCP/IP connections, bind to @var{ADDRESS} on

538

the local machine. @var{ADDRESS} may be specified as a hostname or IP

539

address. This option can be useful if your machine is bound to multiple

540

IPs.

541

542

@cindex retries

543

@cindex tries

544

@cindex number of retries

545

@item -t @var{number}

546

@itemx --tries=@var{number}

547

Set number of retries to @var{number}. Specify 0 or @samp{inf} for

548

infinite retrying. The default is to retry 20 times, with the exception

549

of fatal errors like ``connection refused'' or ``not found'' (404),

550

which are not retried.

551

552

@item -O @var{file}

553

@itemx --output-document=@var{file}

554

The documents will not be written to the appropriate files, but all

555

will be concatenated together and written to @var{file}. If @samp{-}

556

is used as @var{file}, documents will be printed to standard output,

557

disabling link conversion. (Use @samp{./-} to print to a file

558

literally named @samp{-}.)

559

560

Use of @samp{-O} is @emph{not} intended to mean simply ``use the name

561

@var{file} instead of the one in the URL;'' rather, it is

562

analogous to shell redirection:

563

@samp{wget -O file http://foo} is intended to work like

564

@samp{wget -O - http://foo > file}; @file{file} will be truncated

565

immediately, and @emph{all} downloaded content will be written there.

566

567

For this reason, @samp{-N} (for timestamp-checking) is not supported

568

in combination with @samp{-O}: since @var{file} is always newly

569

created, it will always have a very new timestamp. A warning will be

570

issued if this combination is used.

571

572

Similarly, using @samp{-r} or @samp{-p} with @samp{-O} may not work as

573

you expect: Wget won't just download the first file to @var{file} and

574

then download the rest to their normal names: @emph{all} downloaded

575

content will be placed in @var{file}. This was disabled in version

576

1.11, but has been reinstated (with a warning) in 1.11.2, as there are

577

some cases where this behavior can actually have some use.

578

579

Note that a combination with @samp{-k} is only permitted when

580

downloading a single document, as in that case it will just convert

581

all relative URIs to external ones; @samp{-k} makes no sense for

582

multiple URIs when they're all being downloaded to a single file.

583

584

@cindex clobbering, file

585

@cindex downloading multiple times

586

@cindex no-clobber

587

@item -nc

588

@itemx --no-clobber

589

If a file is downloaded more than once in the same directory, Wget's

590

behavior depends on a few options, including @samp{-nc}. In certain

591

cases, the local file will be @dfn{clobbered}, or overwritten, upon

592

repeated download. In other cases it will be preserved.

593

594

When running Wget without @samp{-N}, @samp{-nc}, @samp{-r}, or

595

@samp{-p}, downloading the same file in the same directory will result

596

in the original copy of @var{file} being preserved and the second copy

597

being named @samp{@var{file}.1}. If that file is downloaded yet

598

again, the third copy will be named @samp{@var{file}.2}, and so on.

599

(This is also the behavior with @samp{-nd}, even if @samp{-r} or

600

@samp{-p} are in effect.) When @samp{-nc} is specified, this behavior

601

is suppressed, and Wget will refuse to download newer copies of

602

@samp{@var{file}}. Therefore, ``@code{no-clobber}'' is actually a

603

misnomer in this mode---it's not clobbering that's prevented (as the

604

numeric suffixes were already preventing clobbering), but rather the

605

multiple version saving that's prevented.

606

607

When running Wget with @samp{-r} or @samp{-p}, but without @samp{-N},

608

@samp{-nd}, or @samp{-nc}, re-downloading a file will result in the

609

new copy simply overwriting the old. Adding @samp{-nc} will prevent

610

this behavior, instead causing the original version to be preserved

611

and any newer copies on the server to be ignored.

612

613

When running Wget with @samp{-N}, with or without @samp{-r} or

614

@samp{-p}, the decision as to whether or not to download a newer copy

615

of a file depends on the local and remote timestamp and size of the

616

file (@pxref{Time-Stamping}). @samp{-nc} may not be specified at the

617

same time as @samp{-N}.

618

619

Note that when @samp{-nc} is specified, files with the suffixes

620

@samp{.html} or @samp{.htm} will be loaded from the local disk and

621

parsed as if they had been retrieved from the Web.

622

623

@cindex continue retrieval

624

@cindex incomplete downloads

625

@cindex resume download

626

@item -c

627

@itemx --continue

628

Continue getting a partially-downloaded file. This is useful when you

629

want to finish up a download started by a previous instance of Wget, or

630

by another program. For instance:

631

632

@example

633

wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z

634

@end example

635

636

If there is a file named @file{ls-lR.Z} in the current directory, Wget

637

will assume that it is the first portion of the remote file, and will

638

ask the server to continue the retrieval from an offset equal to the

639

length of the local file.

640

641

Note that you don't need to specify this option if you just want the

642

current invocation of Wget to retry downloading a file should the

643

connection be lost midway through. This is the default behavior.

644

@samp{-c} only affects resumption of downloads started @emph{prior} to

645

this invocation of Wget, and whose local files are still sitting around.

646

647

Without @samp{-c}, the previous example would just download the remote

648

file to @file{ls-lR.Z.1}, leaving the truncated @file{ls-lR.Z} file

649

alone.

650

651

Beginning with Wget 1.7, if you use @samp{-c} on a non-empty file, and

652

it turns out that the server does not support continued downloading,

653

Wget will refuse to start the download from scratch, which would

654

effectively ruin existing contents. If you really want the download to

655

start from scratch, remove the file.

656

657

Also beginning with Wget 1.7, if you use @samp{-c} on a file which is of

658

equal size as the one on the server, Wget will refuse to download the

659

file and print an explanatory message. The same happens when the file

660

is smaller on the server than locally (presumably because it was changed

661

on the server since your last download attempt)---because ``continuing''

662

is not meaningful, no download occurs.

663

664

On the other side of the coin, while using @samp{-c}, any file that's

665

bigger on the server than locally will be considered an incomplete

666

download and only @code{(length(remote) - length(local))} bytes will be

667

downloaded and tacked onto the end of the local file. This behavior can

668

be desirable in certain cases---for instance, you can use @samp{wget -c}

669

to download just the new portion that's been appended to a data

670

collection or log file.

671

672

However, if the file is bigger on the server because it's been

673

@emph{changed}, as opposed to just @emph{appended} to, you'll end up

674

with a garbled file. Wget has no way of verifying that the local file

675

is really a valid prefix of the remote file. You need to be especially

676

careful of this when using @samp{-c} in conjunction with @samp{-r},

677

since every file will be considered as an "incomplete download" candidate.

678

679

Another instance where you'll get a garbled file if you try to use

680

@samp{-c} is if you have a lame @sc{http} proxy that inserts a

681

``transfer interrupted'' string into the local file. In the future a

682

``rollback'' option may be added to deal with this case.

683

684

Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}

685

servers that support the @code{Range} header.

686

687

@cindex progress indicator

688

@cindex dot style

689

@item --progress=@var{type}

690

Select the type of the progress indicator you wish to use. Legal

691

indicators are ``dot'' and ``bar''.

692

693

The ``bar'' indicator is used by default. It draws an @sc{ascii} progress

694

bar graphics (a.k.a ``thermometer'' display) indicating the status of

695

retrieval. If the output is not a TTY, the ``dot'' bar will be used by

696

default.

697

698

Use @samp{--progress=dot} to switch to the ``dot'' display. It traces

699

the retrieval by printing dots on the screen, each dot representing a

700

fixed amount of downloaded data.

701

702

When using the dotted retrieval, you may also set the @dfn{style} by

703

specifying the type as @samp{dot:@var{style}}. Different styles assign

704

different meaning to one dot. With the @code{default} style each dot

705

represents 1K, there are ten dots in a cluster and 50 dots in a line.

706

The @code{binary} style has a more ``computer''-like orientation---8K

707

dots, 16-dots clusters and 48 dots per line (which makes for 384K

708

lines). The @code{mega} style is suitable for downloading very large

709

files---each dot represents 64K retrieved, there are eight dots in a

710

cluster, and 48 dots on each line (so each line contains 3M).

711

712

Note that you can set the default style using the @code{progress}

713

command in @file{.wgetrc}. That setting may be overridden from the

714

command line. The exception is that, when the output is not a TTY, the

715

``dot'' progress will be favored over ``bar''. To force the bar output,

716

use @samp{--progress=bar:force}.

717

718

@item -N

719

@itemx --timestamping

720

Turn on time-stamping. @xref{Time-Stamping}, for details.

721

722

@cindex server response, print

723

@item -S

724

@itemx --server-response

725

Print the headers sent by @sc{http} servers and responses sent by

726

@sc{ftp} servers.

727

728

@cindex Wget as spider

729

@cindex spider

730

@item --spider

731

When invoked with this option, Wget will behave as a Web @dfn{spider},

732

which means that it will not download the pages, just check that they

733

are there. For example, you can use Wget to check your bookmarks:

734

735

@example

736

wget --spider --force-html -i bookmarks.html

737

@end example

738

739

This feature needs much more work for Wget to get close to the

740

functionality of real web spiders.

741

742

@cindex timeout

743

@item -T seconds

744

@itemx --timeout=@var{seconds}

745

Set the network timeout to @var{seconds} seconds. This is equivalent

746

to specifying @samp{--dns-timeout}, @samp{--connect-timeout}, and

747

@samp{--read-timeout}, all at the same time.

748

749

When interacting with the network, Wget can check for timeout and

750

abort the operation if it takes too long. This prevents anomalies

751

like hanging reads and infinite connects. The only timeout enabled by

752

default is a 900-second read timeout. Setting a timeout to 0 disables

753

it altogether. Unless you know what you are doing, it is best not to

754

change the default timeout settings.

755

756

All timeout-related options accept decimal values, as well as

757

subsecond values. For example, @samp{0.1} seconds is a legal (though

758

unwise) choice of timeout. Subsecond timeouts are useful for checking

759

server response times or for testing network latency.

760

761

@cindex DNS timeout

762

@cindex timeout, DNS

763

@item --dns-timeout=@var{seconds}

764

Set the DNS lookup timeout to @var{seconds} seconds. DNS lookups that

765

don't complete within the specified time will fail. By default, there

766

is no timeout on DNS lookups, other than that implemented by system

767

libraries.

768

769

@cindex connect timeout

770

@cindex timeout, connect

771

@item --connect-timeout=@var{seconds}

772

Set the connect timeout to @var{seconds} seconds. TCP connections that

773

take longer to establish will be aborted. By default, there is no

774

connect timeout, other than that implemented by system libraries.

775

776

@cindex read timeout

777

@cindex timeout, read

778

@item --read-timeout=@var{seconds}

779

Set the read (and write) timeout to @var{seconds} seconds. The

780

``time'' of this timeout refers to @dfn{idle time}: if, at any point in

781

the download, no data is received for more than the specified number

782

of seconds, reading fails and the download is restarted. This option

783

does not directly affect the duration of the entire download.

784

785

Of course, the remote server may choose to terminate the connection

786

sooner than this option requires. The default read timeout is 900

787

seconds.

788

789

@cindex bandwidth, limit

790

@cindex rate, limit

791

@cindex limit bandwidth

792

@item --limit-rate=@var{amount}

793

Limit the download speed to @var{amount} bytes per second. Amount may

794

be expressed in bytes, kilobytes with the @samp{k} suffix, or megabytes

795

with the @samp{m} suffix. For example, @samp{--limit-rate=20k} will

796

limit the retrieval rate to 20KB/s. This is useful when, for whatever

797

reason, you don't want Wget to consume the entire available bandwidth.

798

799

This option allows the use of decimal numbers, usually in conjunction

800

with power suffixes; for example, @samp{--limit-rate=2.5k} is a legal

801

value.

802

803

Note that Wget implements the limiting by sleeping the appropriate

804

amount of time after a network read that took less time than specified

805

by the rate. Eventually this strategy causes the TCP transfer to slow

806

down to approximately the specified rate. However, it may take some

807

time for this balance to be achieved, so don't be surprised if limiting

808

the rate doesn't work well with very small files.

809

810

@cindex pause

811

@cindex wait

812

@item -w @var{seconds}

813

@itemx --wait=@var{seconds}

814

Wait the specified number of seconds between the retrievals. Use of

815

this option is recommended, as it lightens the server load by making the

816

requests less frequent. Instead of in seconds, the time can be

817

specified in minutes using the @code{m} suffix, in hours using @code{h}

818

suffix, or in days using @code{d} suffix.

819

820

Specifying a large value for this option is useful if the network or the

821

destination host is down, so that Wget can wait long enough to

822

reasonably expect the network error to be fixed before the retry. The

823

waiting interval specified by this function is influenced by

824

@code{--random-wait}, which see.

825

826

@cindex retries, waiting between

827

@cindex waiting between retries

828

@item --waitretry=@var{seconds}

829

If you don't want Wget to wait between @emph{every} retrieval, but only

830

between retries of failed downloads, you can use this option. Wget will

831

use @dfn{linear backoff}, waiting 1 second after the first failure on a

832

given file, then waiting 2 seconds after the second failure on that

833

file, up to the maximum number of @var{seconds} you specify. Therefore,

834

a value of 10 will actually make Wget wait up to (1 + 2 + ... + 10) = 55

835

seconds per file.

836

837

By default, Wget will assume a value of 10 seconds.

838

839

@cindex wait, random

840

@cindex random wait

841

@item --random-wait

842

Some web sites may perform log analysis to identify retrieval programs

843

such as Wget by looking for statistically significant similarities in

844

the time between requests. This option causes the time between requests

845

to vary between 0.5 and 1.5 * @var{wait} seconds, where @var{wait} was

846

specified using the @samp{--wait} option, in order to mask Wget's

847

presence from such analysis.

848

849

A 2001 article in a publication devoted to development on a popular

850

consumer platform provided code to perform this analysis on the fly.

851

Its author suggested blocking at the class C address level to ensure

852

automated retrieval programs were blocked despite changing DHCP-supplied

853

addresses.

854

855

The @samp{--random-wait} option was inspired by this ill-advised

856

recommendation to block many unrelated users from a web site due to the

857

actions of one.

858

859

@cindex proxy

860

@itemx --no-proxy

861

Don't use proxies, even if the appropriate @code{*_proxy} environment

862

variable is defined.

863

864

@c man end

865

For more information about the use of proxies with Wget, @xref{Proxies}.

866

@c man begin OPTIONS

867

868

@cindex quota

869

@item -Q @var{quota}

870

@itemx --quota=@var{quota}

871

Specify download quota for automatic retrievals. The value can be

872

specified in bytes (default), kilobytes (with @samp{k} suffix), or

873

megabytes (with @samp{m} suffix).

874

875

Note that quota will never affect downloading a single file. So if you

876

specify @samp{wget -Q10k ftp://wuarchive.wustl.edu/ls-lR.gz}, all of the

877

@file{ls-lR.gz} will be downloaded. The same goes even when several

878

@sc{url}s are specified on the command-line. However, quota is

879

respected when retrieving either recursively, or from an input file.

880

Thus you may safely type @samp{wget -Q2m -i sites}---download will be

881

aborted when the quota is exceeded.

882

883

Setting quota to 0 or to @samp{inf} unlimits the download quota.

884

885

@cindex DNS cache

886

@cindex caching of DNS lookups

887

@item --no-dns-cache

888

Turn off caching of DNS lookups. Normally, Wget remembers the IP

889

addresses it looked up from DNS so it doesn't have to repeatedly

890

contact the DNS server for the same (typically small) set of hosts it

891

retrieves from. This cache exists in memory only; a new Wget run will

892

contact DNS again.

893

894

However, it has been reported that in some situations it is not

895

desirable to cache host names, even for the duration of a

896

short-running application like Wget. With this option Wget issues a

897

new DNS lookup (more precisely, a new call to @code{gethostbyname} or

898

@code{getaddrinfo}) each time it makes a new connection. Please note

899

that this option will @emph{not} affect caching that might be

900

performed by the resolving library or by an external caching layer,

901

such as NSCD.

902

903

If you don't understand exactly what this option does, you probably

904

won't need it.

905

906

@cindex file names, restrict

907

@cindex Windows file names

908

@item --restrict-file-names=@var{modes}

909

Change which characters found in remote URLs must be escaped during

910

generation of local filenames. Characters that are @dfn{restricted}

911

by this option are escaped, i.e. replaced with @samp{%HH}, where

912

@samp{HH} is the hexadecimal number that corresponds to the restricted

913

character. This option may also be used to force all alphabetical

914

cases to be either lower- or uppercase.

915

916

By default, Wget escapes the characters that are not valid or safe as

917

part of file names on your operating system, as well as control

918

characters that are typically unprintable. This option is useful for

919

changing these defaults, perhaps because you are downloading to a

920

non-native partition, or because you want to disable escaping of the

921

control characters, or you want to further restrict characters to only

922

those in the @sc{ascii} range of values.

923

924

The @var{modes} are a comma-separated set of text values. The

925

acceptable values are @samp{unix}, @samp{windows}, @samp{nocontrol},

926

@samp{ascii}, @samp{lowercase}, and @samp{uppercase}. The values

927

@samp{unix} and @samp{windows} are mutually exclusive (one will

928

override the other), as are @samp{lowercase} and

929

@samp{uppercase}. Those last are special cases, as they do not change

930

the set of characters that would be escaped, but rather force local

931

file paths to be converted either to lower- or uppercase.

932

933

When ``unix'' is specified, Wget escapes the character @samp{/} and

934

the control characters in the ranges 0--31 and 128--159. This is the

935

default on Unix-like operating systems.

936

937

When ``windows'' is given, Wget escapes the characters @samp{\},

938

@samp{|}, @samp{/}, @samp{:}, @samp{?}, @samp{"}, @samp{*}, @samp{<},

939

@samp{>}, and the control characters in the ranges 0--31 and 128--159.

940

In addition to this, Wget in Windows mode uses @samp{+} instead of

941

@samp{:} to separate host and port in local file names, and uses

942

@samp{@@} instead of @samp{?} to separate the query portion of the file

943

name from the rest. Therefore, a URL that would be saved as

944

@samp{www.xemacs.org:4300/search.pl?input=blah} in Unix mode would be

945

saved as @samp{www.xemacs.org+4300/search.pl@@input=blah} in Windows

946

mode. This mode is the default on Windows.

947

948

If you specify @samp{nocontrol}, then the escaping of the control

949

characters is also switched off. This option may make sense

950

when you are downloading URLs whose names contain UTF-8 characters, on

951

a system which can save and display filenames in UTF-8 (some possible

952

byte values used in UTF-8 byte sequences fall in the range of values

953

designated by Wget as ``controls'').

954

955

The @samp{ascii} mode is used to specify that any bytes whose values

956

are outside the range of @sc{ascii} characters (that is, greater than

957

127) shall be escaped. This can be useful when saving filenames

958

whose encoding does not match the one used locally.

959

960

@cindex IPv6

961

@itemx -4

962

@itemx --inet4-only

963

@itemx -6

964

@itemx --inet6-only

965

Force connecting to IPv4 or IPv6 addresses. With @samp{--inet4-only}

966

or @samp{-4}, Wget will only connect to IPv4 hosts, ignoring AAAA

967

records in DNS, and refusing to connect to IPv6 addresses specified in

968

URLs. Conversely, with @samp{--inet6-only} or @samp{-6}, Wget will

969

only connect to IPv6 hosts and ignore A records and IPv4 addresses.

970

971

Neither options should be needed normally. By default, an IPv6-aware

972

Wget will use the address family specified by the host's DNS record.

973

If the DNS responds with both IPv4 and IPv6 addresses, Wget will try

974

them in sequence until it finds one it can connect to. (Also see

975

@code{--prefer-family} option described below.)

976

977

These options can be used to deliberately force the use of IPv4 or

978

IPv6 address families on dual family systems, usually to aid debugging

979

or to deal with broken network configuration. Only one of

980

@samp{--inet6-only} and @samp{--inet4-only} may be specified at the

981

same time. Neither option is available in Wget compiled without IPv6

982

support.

983

984

@item --prefer-family=none/IPv4/IPv6

985

When given a choice of several addresses, connect to the addresses

986

with specified address family first. The address order returned by

987

DNS is used without change by default.

988

989

This avoids spurious errors and connect attempts when accessing hosts

990

that resolve to both IPv6 and IPv4 addresses from IPv4 networks. For

991

example, @samp{www.kame.net} resolves to

992

@samp{2001:200:0:8002:203:47ff:fea5:3085} and to

993

@samp{203.178.141.194}. When the preferred family is @code{IPv4}, the

994

IPv4 address is used first; when the preferred family is @code{IPv6},

995

the IPv6 address is used first; if the specified value is @code{none},

996

the address order returned by DNS is used without change.

997

998

Unlike @samp{-4} and @samp{-6}, this option doesn't inhibit access to

999

any address family, it only changes the @emph{order} in which the

1000

addresses are accessed. Also note that the reordering performed by

1001

this option is @dfn{stable}---it doesn't affect order of addresses of

1002

the same family. That is, the relative order of all IPv4 addresses

1003

and of all IPv6 addresses remains intact in all cases.

1004

1005

@item --retry-connrefused

1006

Consider ``connection refused'' a transient error and try again.

1007

Normally Wget gives up on a URL when it is unable to connect to the

1008

site because failure to connect is taken as a sign that the server is

1009

not running at all and that retries would not help. This option is

1010

for mirroring unreliable sites whose servers tend to disappear for

1011

short periods of time.

1012

1013

@cindex user

1014

@cindex password

1015

@cindex authentication

1016

@item --user=@var{user}

1017

@itemx --password=@var{password}

1018

Specify the username @var{user} and password @var{password} for both

1019

@sc{ftp} and @sc{http} file retrieval. These parameters can be overridden

1020

using the @samp{--ftp-user} and @samp{--ftp-password} options for

1021

@sc{ftp} connections and the @samp{--http-user} and @samp{--http-password}

1022

options for @sc{http} connections.

1023

1024

@item --ask-password

1025

Prompt for a password for each connection established. Cannot be specified

1026

when @samp{--password} is being used, because they are mutually exclusive.

1027

1028

@cindex iri support

1029

@cindex idn support

1030

@item --no-iri

1031

1032

Turn off internationalized URI (IRI) support. Use @samp{--iri} to

1033

turn it on. IRI support is activated by default.

1034

1035

You can set the default state of IRI support using the @code{iri}

1036

command in @file{.wgetrc}. That setting may be overridden from the

1037

command line.

1038

1039

@cindex local encoding

1040

@item --local-encoding=@var{encoding}

1041

1042

Force Wget to use @var{encoding} as the default system encoding. That affects

1043

how Wget converts URLs specified as arguments from locale to @sc{utf-8} for

1044

IRI support.

1045

1046

Wget use the function @code{nl_langinfo()} and then the @code{CHARSET}

1047

environment variable to get the locale. If it fails, @sc{ascii} is used.

1048

1049

You can set the default local encoding using the @code{local_encoding}

1050

command in @file{.wgetrc}. That setting may be overridden from the

1051

command line.

1052

1053

@cindex remote encoding

1054

@item --remote-encoding=@var{encoding}

1055

1056

Force Wget to use @var{encoding} as the default remote server encoding.

1057

That affects how Wget converts URIs found in files from remote encoding

1058

to @sc{utf-8} during a recursive fetch. This options is only useful for

1059

IRI support, for the interpretation of non-@sc{ascii} characters.

1060

1061

For HTTP, remote encoding can be found in HTTP @code{Content-Type}

1062

header and in HTML @code{Content-Type http-equiv} meta tag.

1063

1064

You can set the default encoding using the @code{remoteencoding}

1065

command in @file{.wgetrc}. That setting may be overridden from the

1066

command line.

1067

@end table

1068

1069

@node Directory Options, HTTP Options, Download Options, Invoking

1070

@section Directory Options

1071

1072

@table @samp

1073

@item -nd

1074

@itemx --no-directories

1075

Do not create a hierarchy of directories when retrieving recursively.

1076

With this option turned on, all files will get saved to the current

1077

directory, without clobbering (if a name shows up more than once, the

1078

filenames will get extensions @samp{.n}).

1079

1080

@item -x

1081

@itemx --force-directories

1082

The opposite of @samp{-nd}---create a hierarchy of directories, even if

1083

one would not have been created otherwise. E.g. @samp{wget -x

1084

http://fly.srk.fer.hr/robots.txt} will save the downloaded file to

1085

@file{fly.srk.fer.hr/robots.txt}.

1086

1087

@item -nH

1088

@itemx --no-host-directories

1089

Disable generation of host-prefixed directories. By default, invoking

1090

Wget with @samp{-r http://fly.srk.fer.hr/} will create a structure of

1091

directories beginning with @file{fly.srk.fer.hr/}. This option disables

1092

such behavior.

1093

1094

@item --protocol-directories

1095

Use the protocol name as a directory component of local file names. For

1096

example, with this option, @samp{wget -r http://@var{host}} will save to

1097

@samp{http/@var{host}/...} rather than just to @samp{@var{host}/...}.

1098

1099

@cindex cut directories

1100

@item --cut-dirs=@var{number}

1101

Ignore @var{number} directory components. This is useful for getting a

1102

fine-grained control over the directory where recursive retrieval will

1103

be saved.

1104

1105

Take, for example, the directory at

1106

@samp{ftp://ftp.xemacs.org/pub/xemacs/}. If you retrieve it with

1107

@samp{-r}, it will be saved locally under

1108

@file{ftp.xemacs.org/pub/xemacs/}. While the @samp{-nH} option can

1109

remove the @file{ftp.xemacs.org/} part, you are still stuck with

1110

@file{pub/xemacs}. This is where @samp{--cut-dirs} comes in handy; it

1111

makes Wget not ``see'' @var{number} remote directory components. Here

1112

are several examples of how @samp{--cut-dirs} option works.

1113

1114

@example

1115

@group

1116

No options -> ftp.xemacs.org/pub/xemacs/

1117

-nH -> pub/xemacs/

1118

-nH --cut-dirs=1 -> xemacs/

1119

-nH --cut-dirs=2 -> .

1120

1121

--cut-dirs=1 -> ftp.xemacs.org/xemacs/

1122

...

1123

@end group

1124

@end example

1125

1126

If you just want to get rid of the directory structure, this option is

1127

similar to a combination of @samp{-nd} and @samp{-P}. However, unlike

1128

@samp{-nd}, @samp{--cut-dirs} does not lose with subdirectories---for

1129

instance, with @samp{-nH --cut-dirs=1}, a @file{beta/} subdirectory will

1130

be placed to @file{xemacs/beta}, as one would expect.

1131

1132

@cindex directory prefix

1133

@item -P @var{prefix}

1134

@itemx --directory-prefix=@var{prefix}

1135

Set directory prefix to @var{prefix}. The @dfn{directory prefix} is the

1136

directory where all other files and subdirectories will be saved to,

1137

i.e. the top of the retrieval tree. The default is @samp{.} (the

1138

current directory).

1139

@end table

1140

1141

@node HTTP Options, HTTPS (SSL/TLS) Options, Directory Options, Invoking

1142

@section HTTP Options

1143

1144

@table @samp

1145

@cindex default page name

1146

@cindex index.html

1147

@item --default-page=@var{name}

1148

Use @var{name} as the default file name when it isn't known (i.e., for

1149

URLs that end in a slash), instead of @file{index.html}.

1150

1151

@cindex .html extension

1152

@cindex .css extension

1153

@item -E

1154

@itemx --adjust-extension

1155

If a file of type @samp{application/xhtml+xml} or @samp{text/html} is

1156

downloaded and the URL does not end with the regexp

1157

@samp{\.[Hh][Tt][Mm][Ll]?}, this option will cause the suffix @samp{.html}

1158

to be appended to the local filename. This is useful, for instance, when

1159

you're mirroring a remote site that uses @samp{.asp} pages, but you want

1160

the mirrored pages to be viewable on your stock Apache server. Another

1161

good use for this is when you're downloading CGI-generated materials. A URL

1162

like @samp{http://site.com/article.cgi?25} will be saved as

1163

@file{article.cgi?25.html}.

1164

1165

Note that filenames changed in this way will be re-downloaded every time

1166

you re-mirror a site, because Wget can't tell that the local

1167

@file{@var{X}.html} file corresponds to remote URL @samp{@var{X}} (since

1168

it doesn't yet know that the URL produces output of type

1169

@samp{text/html} or @samp{application/xhtml+xml}. To prevent this

1170

re-downloading, you must use @samp{-k} and @samp{-K} so that the original

1171

version of the file will be saved as @file{@var{X}.orig} (@pxref{Recursive

1172

Retrieval Options}).

1173

1174

As of version 1.12, Wget will also ensure that any downloaded files of

1175

type @samp{text/css} end in the suffix @samp{.css}, and the option was

1176

renamed from @samp{--html-extension}, to better reflect its new

1177

behavior. The old option name is still acceptable, but should now be

1178

considered deprecated.

1179

1180

At some point in the future, this option may well be expanded to

1181

include suffixes for other types of content, including content types

1182

that are not parsed by Wget.

1183

1184

@cindex http user

1185

@cindex http password

1186

@cindex authentication

1187

@item --http-user=@var{user}

1188

@itemx --http-password=@var{password}

1189

Specify the username @var{user} and password @var{password} on an

1190

@sc{http} server. According to the type of the challenge, Wget will

1191

encode them using either the @code{basic} (insecure),

1192

the @code{digest}, or the Windows @code{NTLM} authentication scheme.

1193

1194

Another way to specify username and password is in the @sc{url} itself

1195

(@pxref{URL Format}). Either method reveals your password to anyone who

1196

bothers to run @code{ps}. To prevent the passwords from being seen,

1197

store them in @file{.wgetrc} or @file{.netrc}, and make sure to protect

1198

those files from other users with @code{chmod}. If the passwords are

1199

really important, do not leave them lying in those files either---edit

1200

the files and delete them after Wget has started the download.

1201

1202

@iftex

1203

For more information about security issues with Wget, @xref{Security

1204

Considerations}.

1205

@end iftex

1206

1207

@cindex Keep-Alive, turning off

1208

@cindex Persistent Connections, disabling

1209

@item --no-http-keep-alive

1210

Turn off the ``keep-alive'' feature for HTTP downloads. Normally, Wget

1211

asks the server to keep the connection open so that, when you download

1212

more than one document from the same server, they get transferred over

1213

the same TCP connection. This saves time and at the same time reduces

1214

the load on the server.

1215

1216

This option is useful when, for some reason, persistent (keep-alive)

1217

connections don't work for you, for example due to a server bug or due

1218

to the inability of server-side scripts to cope with the connections.

1219

1220

@cindex proxy

1221

@cindex cache

1222

@item --no-cache

1223

Disable server-side cache. In this case, Wget will send the remote

1224

server an appropriate directive (@samp{Pragma: no-cache}) to get the

1225

file from the remote service, rather than returning the cached version.

1226

This is especially useful for retrieving and flushing out-of-date

1227

documents on proxy servers.

1228

1229

Caching is allowed by default.

1230

1231

@cindex cookies

1232

@item --no-cookies

1233

Disable the use of cookies. Cookies are a mechanism for maintaining

1234

server-side state. The server sends the client a cookie using the

1235

@code{Set-Cookie} header, and the client responds with the same cookie

1236

upon further requests. Since cookies allow the server owners to keep

1237

track of visitors and for sites to exchange this information, some

1238

consider them a breach of privacy. The default is to use cookies;

1239

however, @emph{storing} cookies is not on by default.

1240

1241

@cindex loading cookies

1242

@cindex cookies, loading

1243

@item --load-cookies @var{file}

1244

Load cookies from @var{file} before the first HTTP retrieval.

1245

@var{file} is a textual file in the format originally used by Netscape's

1246

@file{cookies.txt} file.

1247

1248

You will typically use this option when mirroring sites that require

1249

that you be logged in to access some or all of their content. The login

1250

process typically works by the web server issuing an @sc{http} cookie

1251

upon receiving and verifying your credentials. The cookie is then

1252

resent by the browser when accessing that part of the site, and so

1253

proves your identity.

1254

1255

Mirroring such a site requires Wget to send the same cookies your

1256

browser sends when communicating with the site. This is achieved by

1257

@samp{--load-cookies}---simply point Wget to the location of the

1258

@file{cookies.txt} file, and it will send the same cookies your browser

1259

would send in the same situation. Different browsers keep textual

1260

cookie files in different locations:

1261

1262

@table @asis

1263

@item Netscape 4.x.

1264

The cookies are in @file{~/.netscape/cookies.txt}.

1265

1266

@item Mozilla and Netscape 6.x.

1267

Mozilla's cookie file is also named @file{cookies.txt}, located

1268

somewhere under @file{~/.mozilla}, in the directory of your profile.

1269

The full path usually ends up looking somewhat like

1270

@file{~/.mozilla/default/@var{some-weird-string}/cookies.txt}.

1271

1272

@item Internet Explorer.

1273

You can produce a cookie file Wget can use by using the File menu,

1274

Import and Export, Export Cookies. This has been tested with Internet

1275

Explorer 5; it is not guaranteed to work with earlier versions.

1276

1277

@item Other browsers.

1278

If you are using a different browser to create your cookies,

1279

@samp{--load-cookies} will only work if you can locate or produce a

1280

cookie file in the Netscape format that Wget expects.

1281

@end table

1282

1283

If you cannot use @samp{--load-cookies}, there might still be an

1284

alternative. If your browser supports a ``cookie manager'', you can use

1285

it to view the cookies used when accessing the site you're mirroring.

1286

Write down the name and value of the cookie, and manually instruct Wget

1287

to send those cookies, bypassing the ``official'' cookie support:

1288

1289

@example

1290

wget --no-cookies --header "Cookie: @var{name}=@var{value}"

1291

@end example

1292

1293

@cindex saving cookies

1294

@cindex cookies, saving

1295

@item --save-cookies @var{file}

1296

Save cookies to @var{file} before exiting. This will not save cookies

1297

that have expired or that have no expiry time (so-called ``session

1298

cookies''), but also see @samp{--keep-session-cookies}.

1299

1300

@cindex cookies, session

1301

@cindex session cookies

1302

@item --keep-session-cookies

1303

When specified, causes @samp{--save-cookies} to also save session

1304

cookies. Session cookies are normally not saved because they are

1305

meant to be kept in memory and forgotten when you exit the browser.

1306

Saving them is useful on sites that require you to log in or to visit

1307

the home page before you can access some pages. With this option,

1308

multiple Wget runs are considered a single browser session as far as

1309

the site is concerned.

1310

1311

Since the cookie file format does not normally carry session cookies,

1312

Wget marks them with an expiry timestamp of 0. Wget's

1313

@samp{--load-cookies} recognizes those as session cookies, but it might

1314

confuse other browsers. Also note that cookies so loaded will be

1315

treated as other session cookies, which means that if you want

1316

@samp{--save-cookies} to preserve them again, you must use

1317

@samp{--keep-session-cookies} again.

1318

1319

@cindex Content-Length, ignore

1320

@cindex ignore length

1321

@item --ignore-length

1322

Unfortunately, some @sc{http} servers (@sc{cgi} programs, to be more

1323

precise) send out bogus @code{Content-Length} headers, which makes Wget

1324

go wild, as it thinks not all the document was retrieved. You can spot

1325

this syndrome if Wget retries getting the same document again and again,

1326

each time claiming that the (otherwise normal) connection has closed on

1327

the very same byte.

1328

1329

With this option, Wget will ignore the @code{Content-Length} header---as

1330

if it never existed.

1331

1332

@cindex header, add

1333

@item --header=@var{header-line}

1334

Send @var{header-line} along with the rest of the headers in each

1335

@sc{http} request. The supplied header is sent as-is, which means it

1336

must contain name and value separated by colon, and must not contain

1337

newlines.

1338

1339

You may define more than one additional header by specifying

1340

@samp{--header} more than once.

1341

1342

@example

1343

@group

1344

wget --header='Accept-Charset: iso-8859-2' \

1345

--header='Accept-Language: hr' \

1346

http://fly.srk.fer.hr/

1347

@end group

1348

@end example

1349

1350

Specification of an empty string as the header value will clear all

1351

previous user-defined headers.

1352

1353

As of Wget 1.10, this option can be used to override headers otherwise

1354

generated automatically. This example instructs Wget to connect to

1355

localhost, but to specify @samp{foo.bar} in the @code{Host} header:

1356

1357

@example

1358

wget --header="Host: foo.bar" http://localhost/

1359

@end example

1360

1361

In versions of Wget prior to 1.10 such use of @samp{--header} caused

1362

sending of duplicate headers.

1363

1364

@cindex redirect

1365

@item --max-redirect=@var{number}

1366

Specifies the maximum number of redirections to follow for a resource.

1367

The default is 20, which is usually far more than necessary. However, on

1368

those occasions where you want to allow more (or fewer), this is the

1369

option to use.

1370

1371

@cindex proxy user

1372

@cindex proxy password

1373

@cindex proxy authentication

1374

@item --proxy-user=@var{user}

1375

@itemx --proxy-password=@var{password}

1376

Specify the username @var{user} and password @var{password} for

1377

authentication on a proxy server. Wget will encode them using the

1378

@code{basic} authentication scheme.

1379

1380

Security considerations similar to those with @samp{--http-password}

1381

pertain here as well.

1382

1383

@cindex http referer

1384

@cindex referer, http

1385

@item --referer=@var{url}

1386

Include `Referer: @var{url}' header in HTTP request. Useful for

1387

retrieving documents with server-side processing that assume they are

1388

always being retrieved by interactive web browsers and only come out

1389

properly when Referer is set to one of the pages that point to them.

1390

1391

@cindex server response, save

1392

@item --save-headers

1393

Save the headers sent by the @sc{http} server to the file, preceding the

1394

actual contents, with an empty line as the separator.

1395

1396

@cindex user-agent

1397

@item -U @var{agent-string}

1398

@itemx --user-agent=@var{agent-string}

1399

Identify as @var{agent-string} to the @sc{http} server.

1400

1401

The @sc{http} protocol allows the clients to identify themselves using a

1402

@code{User-Agent} header field. This enables distinguishing the

1403

@sc{www} software, usually for statistical purposes or for tracing of

1404

protocol violations. Wget normally identifies as

1405

@samp{Wget/@var{version}}, @var{version} being the current version

1406

number of Wget.

1407

1408

However, some sites have been known to impose the policy of tailoring

1409

the output according to the @code{User-Agent}-supplied information.

1410

While this is not such a bad idea in theory, it has been abused by

1411

servers denying information to clients other than (historically)

1412

Netscape or, more frequently, Microsoft Internet Explorer. This

1413

option allows you to change the @code{User-Agent} line issued by Wget.

1414

Use of this option is discouraged, unless you really know what you are

1415

doing.

1416

1417

Specifying empty user agent with @samp{--user-agent=""} instructs Wget

1418

not to send the @code{User-Agent} header in @sc{http} requests.

1419

1420

@cindex POST

1421

@item --post-data=@var{string}

1422

@itemx --post-file=@var{file}

1423

Use POST as the method for all HTTP requests and send the specified

1424

data in the request body. @samp{--post-data} sends @var{string} as

1425

data, whereas @samp{--post-file} sends the contents of @var{file}.

1426

Other than that, they work in exactly the same way. In particular,

1427

they @emph{both} expect content of the form @code{key1=value1&key2=value2},

1428

with percent-encoding for special characters; the only difference is

1429

that one expects its content as a command-line paramter and the other

1430

accepts its content from a file. In particular, @samp{--post-file} is

1431

@emph{not} for transmitting files as form attachments: those must

1432

appear as @code{key=value} data (with appropriate percent-coding) just

1433

like everything else. Wget does not currently support

1434

@code{multipart/form-data} for transmitting POST data; only

1435

@code{application/x-www-form-urlencoded}. Only one of

1436

@samp{--post-data} and @samp{--post-file} should be specified.

1437

1438

Please be aware that Wget needs to know the size of the POST data in

1439

advance. Therefore the argument to @code{--post-file} must be a regular

1440

file; specifying a FIFO or something like @file{/dev/stdin} won't work.

1441

It's not quite clear how to work around this limitation inherent in

1442

HTTP/1.0. Although HTTP/1.1 introduces @dfn{chunked} transfer that

1443

doesn't require knowing the request length in advance, a client can't

1444

use chunked unless it knows it's talking to an HTTP/1.1 server. And it

1445

can't know that until it receives a response, which in turn requires the

1446

request to have been completed -- a chicken-and-egg problem.

1447

1448

Note: if Wget is redirected after the POST request is completed, it

1449

will not send the POST data to the redirected URL. This is because

1450

URLs that process POST often respond with a redirection to a regular

1451

page, which does not desire or accept POST. It is not completely

1452

clear that this behavior is optimal; if it doesn't work out, it might

1453

be changed in the future.

1454

1455

This example shows how to log to a server using POST and then proceed to

1456

download the desired pages, presumably only accessible to authorized

1457

users:

1458

1459

@example

1460

@group

1461

# @r{Log in to the server. This can be done only once.}

1462

wget --save-cookies cookies.txt \

1463

--post-data 'user=foo&password=bar' \

1464

http://server.com/auth.php

1465

1466

# @r{Now grab the page or pages we care about.}

1467

wget --load-cookies cookies.txt \

1468

-p http://server.com/interesting/article.php

1469

@end group

1470

@end example

1471

1472

If the server is using session cookies to track user authentication,

1473

the above will not work because @samp{--save-cookies} will not save

1474

them (and neither will browsers) and the @file{cookies.txt} file will

1475

be empty. In that case use @samp{--keep-session-cookies} along with

1476

@samp{--save-cookies} to force saving of session cookies.

1477

1478

@cindex Content-Disposition

1479

@item --content-disposition

1480

1481

If this is set to on, experimental (not fully-functional) support for

1482

@code{Content-Disposition} headers is enabled. This can currently result in

1483

extra round-trips to the server for a @code{HEAD} request, and is known

1484

to suffer from a few bugs, which is why it is not currently enabled by default.

1485

1486

This option is useful for some file-downloading CGI programs that use

1487

@code{Content-Disposition} headers to describe what the name of a

1488

downloaded file should be.

1489

1490

@cindex authentication

1491

@item --auth-no-challenge

1492

1493

If this option is given, Wget will send Basic HTTP authentication

1494

information (plaintext username and password) for all requests, just

1495

like Wget 1.10.2 and prior did by default.

1496

1497

Use of this option is not recommended, and is intended only to support

1498

some few obscure servers, which never send HTTP authentication

1499

challenges, but accept unsolicited auth info, say, in addition to

1500

form-based authentication.

1501

1502

@end table

1503

1504

@node HTTPS (SSL/TLS) Options, FTP Options, HTTP Options, Invoking

1505

@section HTTPS (SSL/TLS) Options

1506

1507

@cindex SSL

1508

To support encrypted HTTP (HTTPS) downloads, Wget must be compiled

1509

with an external SSL library, currently OpenSSL. If Wget is compiled

1510

without SSL support, none of these options are available.

1511

1512

@table @samp

1513

@cindex SSL protocol, choose

1514

@item --secure-protocol=@var{protocol}

1515

Choose the secure protocol to be used. Legal values are @samp{auto},

1516

@samp{SSLv2}, @samp{SSLv3}, and @samp{TLSv1}. If @samp{auto} is used,

1517

the SSL library is given the liberty of choosing the appropriate

1518

protocol automatically, which is achieved by sending an SSLv2 greeting

1519

and announcing support for SSLv3 and TLSv1. This is the default.

1520

1521

Specifying @samp{SSLv2}, @samp{SSLv3}, or @samp{TLSv1} forces the use

1522

of the corresponding protocol. This is useful when talking to old and

1523

buggy SSL server implementations that make it hard for OpenSSL to

1524

choose the correct protocol version. Fortunately, such servers are

1525

quite rare.

1526

1527

@cindex SSL certificate, check

1528

@item --no-check-certificate

1529

Don't check the server certificate against the available certificate

1530

authorities. Also don't require the URL host name to match the common

1531

name presented by the certificate.

1532

1533

As of Wget 1.10, the default is to verify the server's certificate

1534

against the recognized certificate authorities, breaking the SSL

1535

handshake and aborting the download if the verification fails.

1536

Although this provides more secure downloads, it does break

1537

interoperability with some sites that worked with previous Wget

1538

versions, particularly those using self-signed, expired, or otherwise

1539

invalid certificates. This option forces an ``insecure'' mode of

1540

operation that turns the certificate verification errors into warnings

1541

and allows you to proceed.

1542

1543

If you encounter ``certificate verification'' errors or ones saying

1544

that ``common name doesn't match requested host name'', you can use

1545

this option to bypass the verification and proceed with the download.

1546

@emph{Only use this option if you are otherwise convinced of the

1547

site's authenticity, or if you really don't care about the validity of

1548

its certificate.} It is almost always a bad idea not to check the

1549

certificates when transmitting confidential or important data.

1550

1551

@cindex SSL certificate

1552

@item --certificate=@var{file}

1553

Use the client certificate stored in @var{file}. This is needed for

1554

servers that are configured to require certificates from the clients

1555

that connect to them. Normally a certificate is not required and this

1556

switch is optional.

1557

1558

@cindex SSL certificate type, specify

1559

@item --certificate-type=@var{type}

1560

Specify the type of the client certificate. Legal values are

1561

@samp{PEM} (assumed by default) and @samp{DER}, also known as

1562

@samp{ASN1}.

1563

1564

@item --private-key=@var{file}

1565

Read the private key from @var{file}. This allows you to provide the

1566

private key in a file separate from the certificate.

1567

1568

@item --private-key-type=@var{type}

1569

Specify the type of the private key. Accepted values are @samp{PEM}

1570

(the default) and @samp{DER}.

1571

1572

@item --ca-certificate=@var{file}

1573

Use @var{file} as the file with the bundle of certificate authorities

1574

(``CA'') to verify the peers. The certificates must be in PEM format.

1575

1576

Without this option Wget looks for CA certificates at the

1577

system-specified locations, chosen at OpenSSL installation time.

1578

1579

@cindex SSL certificate authority

1580

@item --ca-directory=@var{directory}

1581

Specifies directory containing CA certificates in PEM format. Each

1582

file contains one CA certificate, and the file name is based on a hash

1583

value derived from the certificate. This is achieved by processing a

1584

certificate directory with the @code{c_rehash} utility supplied with

1585

OpenSSL. Using @samp{--ca-directory} is more efficient than

1586

@samp{--ca-certificate} when many certificates are installed because

1587

it allows Wget to fetch certificates on demand.

1588

1589

Without this option Wget looks for CA certificates at the

1590

system-specified locations, chosen at OpenSSL installation time.

1591

1592

@cindex entropy, specifying source of

1593

@cindex randomness, specifying source of

1594

@item --random-file=@var{file}

1595

Use @var{file} as the source of random data for seeding the

1596

pseudo-random number generator on systems without @file{/dev/random}.

1597

1598

On such systems the SSL library needs an external source of randomness

1599

to initialize. Randomness may be provided by EGD (see

1600

@samp{--egd-file} below) or read from an external source specified by

1601

the user. If this option is not specified, Wget looks for random data

1602

in @code{$RANDFILE} or, if that is unset, in @file{$HOME/.rnd}. If

1603

none of those are available, it is likely that SSL encryption will not

1604

be usable.

1605

1606

If you're getting the ``Could not seed OpenSSL PRNG; disabling SSL.''

1607

error, you should provide random data using some of the methods

1608

described above.

1609

1610

@cindex EGD

1611

@item --egd-file=@var{file}

1612

Use @var{file} as the EGD socket. EGD stands for @dfn{Entropy

1613

Gathering Daemon}, a user-space program that collects data from

1614

various unpredictable system sources and makes it available to other

1615

programs that might need it. Encryption software, such as the SSL

1616

library, needs sources of non-repeating randomness to seed the random

1617

number generator used to produce cryptographically strong keys.

1618

1619

OpenSSL allows the user to specify his own source of entropy using the

1620

@code{RAND_FILE} environment variable. If this variable is unset, or

1621

if the specified file does not produce enough randomness, OpenSSL will

1622

read random data from EGD socket specified using this option.

1623

1624

If this option is not specified (and the equivalent startup command is

1625

not used), EGD is never contacted. EGD is not needed on modern Unix

1626

systems that support @file{/dev/random}.

1627

@end table

1628

1629

@node FTP Options, Recursive Retrieval Options, HTTPS (SSL/TLS) Options, Invoking

1630

@section FTP Options

1631

1632

@table @samp

1633

@cindex ftp user

1634

@cindex ftp password

1635

@cindex ftp authentication

1636

@item --ftp-user=@var{user}

1637

@itemx --ftp-password=@var{password}

1638

Specify the username @var{user} and password @var{password} on an

1639

@sc{ftp} server. Without this, or the corresponding startup option,

1640

the password defaults to @samp{-wget@@}, normally used for anonymous

1641

FTP.

1642

1643

Another way to specify username and password is in the @sc{url} itself

1644

(@pxref{URL Format}). Either method reveals your password to anyone who

1645

bothers to run @code{ps}. To prevent the passwords from being seen,

1646

store them in @file{.wgetrc} or @file{.netrc}, and make sure to protect

1647

those files from other users with @code{chmod}. If the passwords are

1648

really important, do not leave them lying in those files either---edit

1649

the files and delete them after Wget has started the download.

1650

1651

@iftex

1652

For more information about security issues with Wget, @xref{Security

1653

Considerations}.

1654

@end iftex

1655

1656

@cindex .listing files, removing

1657

@item --no-remove-listing

1658

Don't remove the temporary @file{.listing} files generated by @sc{ftp}

1659

retrievals. Normally, these files contain the raw directory listings

1660

received from @sc{ftp} servers. Not removing them can be useful for

1661

debugging purposes, or when you want to be able to easily check on the

1662

contents of remote server directories (e.g. to verify that a mirror

1663

you're running is complete).

1664

1665

Note that even though Wget writes to a known filename for this file,

1666

this is not a security hole in the scenario of a user making

1667

@file{.listing} a symbolic link to @file{/etc/passwd} or something and

1668

asking @code{root} to run Wget in his or her directory. Depending on

1669

the options used, either Wget will refuse to write to @file{.listing},

1670

making the globbing/recursion/time-stamping operation fail, or the

1671

symbolic link will be deleted and replaced with the actual

1672

@file{.listing} file, or the listing will be written to a

1673

@file{.listing.@var{number}} file.

1674

1675

Even though this situation isn't a problem, though, @code{root} should

1676

never run Wget in a non-trusted user's directory. A user could do

1677

something as simple as linking @file{index.html} to @file{/etc/passwd}

1678

and asking @code{root} to run Wget with @samp{-N} or @samp{-r} so the file

1679

will be overwritten.

1680

1681

@cindex globbing, toggle

1682

@item --no-glob

1683

Turn off @sc{ftp} globbing. Globbing refers to the use of shell-like

1684

special characters (@dfn{wildcards}), like @samp{*}, @samp{?}, @samp{[}

1685

and @samp{]} to retrieve more than one file from the same directory at

1686

once, like:

1687

1688

@example

1689

wget ftp://gnjilux.srk.fer.hr/*.msg

1690

@end example

1691

1692

By default, globbing will be turned on if the @sc{url} contains a

1693

globbing character. This option may be used to turn globbing on or off

1694

permanently.

1695

1696

You may have to quote the @sc{url} to protect it from being expanded by

1697

your shell. Globbing makes Wget look for a directory listing, which is

1698

system-specific. This is why it currently works only with Unix @sc{ftp}

1699

servers (and the ones emulating Unix @code{ls} output).

1700

1701

@cindex passive ftp

1702

@item --no-passive-ftp

1703

Disable the use of the @dfn{passive} FTP transfer mode. Passive FTP

1704

mandates that the client connect to the server to establish the data

1705

connection rather than the other way around.

1706

1707

If the machine is connected to the Internet directly, both passive and

1708

active FTP should work equally well. Behind most firewall and NAT

1709

configurations passive FTP has a better chance of working. However,

1710

in some rare firewall configurations, active FTP actually works when

1711

passive FTP doesn't. If you suspect this to be the case, use this

1712

option, or set @code{passive_ftp=off} in your init file.

1713

1714

@cindex symbolic links, retrieving

1715

@item --retr-symlinks

1716

Usually, when retrieving @sc{ftp} directories recursively and a symbolic

1717

link is encountered, the linked-to file is not downloaded. Instead, a

1718

matching symbolic link is created on the local filesystem. The

1719

pointed-to file will not be downloaded unless this recursive retrieval

1720

would have encountered it separately and downloaded it anyway.

1721

1722

When @samp{--retr-symlinks} is specified, however, symbolic links are

1723

traversed and the pointed-to files are retrieved. At this time, this

1724

option does not cause Wget to traverse symlinks to directories and

1725

recurse through them, but in the future it should be enhanced to do

1726

this.

1727

1728

Note that when retrieving a file (not a directory) because it was

1729

specified on the command-line, rather than because it was recursed to,

1730

this option has no effect. Symbolic links are always traversed in this

1731

case.

1732

@end table

1733

1734

@node Recursive Retrieval Options, Recursive Accept/Reject Options, FTP Options, Invoking

1735

@section Recursive Retrieval Options

1736

1737

@table @samp

1738

@item -r

1739

@itemx --recursive

1740

Turn on recursive retrieving. @xref{Recursive Download}, for more

1741

details.

1742

1743

@item -l @var{depth}

1744

@itemx --level=@var{depth}

1745

Specify recursion maximum depth level @var{depth} (@pxref{Recursive

1746

Download}). The default maximum depth is 5.

1747

1748

@cindex proxy filling

1749

@cindex delete after retrieval

1750

@cindex filling proxy cache

1751

@item --delete-after

1752

This option tells Wget to delete every single file it downloads,

1753

@emph{after} having done so. It is useful for pre-fetching popular

1754

pages through a proxy, e.g.:

1755

1756

@example

1757

wget -r -nd --delete-after http://whatever.com/~popular/page/

1758

@end example

1759

1760

The @samp{-r} option is to retrieve recursively, and @samp{-nd} to not

1761

create directories.

1762

1763

Note that @samp{--delete-after} deletes files on the local machine. It

1764

does not issue the @samp{DELE} command to remote FTP sites, for

1765

instance. Also note that when @samp{--delete-after} is specified,

1766

@samp{--convert-links} is ignored, so @samp{.orig} files are simply not

1767

created in the first place.

1768

1769

@cindex conversion of links

1770

@cindex link conversion

1771

@item -k

1772

@itemx --convert-links

1773

After the download is complete, convert the links in the document to

1774

make them suitable for local viewing. This affects not only the visible

1775

hyperlinks, but any part of the document that links to external content,

1776

such as embedded images, links to style sheets, hyperlinks to non-@sc{html}

1777

content, etc.

1778

1779

Each link will be changed in one of the two ways:

1780

1781

@itemize @bullet

1782

@item

1783

The links to files that have been downloaded by Wget will be changed to

1784

refer to the file they point to as a relative link.

1785

1786

Example: if the downloaded file @file{/foo/doc.html} links to

1787

@file{/bar/img.gif}, also downloaded, then the link in @file{doc.html}

1788

will be modified to point to @samp{../bar/img.gif}. This kind of

1789

transformation works reliably for arbitrary combinations of directories.

1790

1791

@item

1792

The links to files that have not been downloaded by Wget will be changed

1793

to include host name and absolute path of the location they point to.

1794

1795

Example: if the downloaded file @file{/foo/doc.html} links to

1796

@file{/bar/img.gif} (or to @file{../bar/img.gif}), then the link in

1797

@file{doc.html} will be modified to point to

1798

@file{http://@var{hostname}/bar/img.gif}.

1799

@end itemize

1800

1801

Because of this, local browsing works reliably: if a linked file was

1802

downloaded, the link will refer to its local name; if it was not

1803

downloaded, the link will refer to its full Internet address rather than

1804

presenting a broken link. The fact that the former links are converted

1805

to relative links ensures that you can move the downloaded hierarchy to

1806

another directory.

1807

1808

Note that only at the end of the download can Wget know which links have

1809

been downloaded. Because of that, the work done by @samp{-k} will be

1810

performed at the end of all the downloads.

1811

1812

@cindex backing up converted files

1813

@item -K

1814

@itemx --backup-converted

1815

When converting a file, back up the original version with a @samp{.orig}

1816

suffix. Affects the behavior of @samp{-N} (@pxref{HTTP Time-Stamping

1817

Internals}).

1818

1819

@item -m

1820

@itemx --mirror

1821

Turn on options suitable for mirroring. This option turns on recursion

1822

and time-stamping, sets infinite recursion depth and keeps @sc{ftp}

1823

directory listings. It is currently equivalent to

1824

@samp{-r -N -l inf --no-remove-listing}.

1825

1826

@cindex page requisites

1827

@cindex required images, downloading

1828

@item -p

1829

@itemx --page-requisites

1830

This option causes Wget to download all the files that are necessary to

1831

properly display a given @sc{html} page. This includes such things as

1832

inlined images, sounds, and referenced stylesheets.

1833

1834

Ordinarily, when downloading a single @sc{html} page, any requisite documents

1835

that may be needed to display it properly are not downloaded. Using

1836

@samp{-r} together with @samp{-l} can help, but since Wget does not

1837

ordinarily distinguish between external and inlined documents, one is

1838

generally left with ``leaf documents'' that are missing their

1839

requisites.

1840

1841

For instance, say document @file{1.html} contains an @code{<IMG>} tag

1842

referencing @file{1.gif} and an @code{<A>} tag pointing to external

1843

document @file{2.html}. Say that @file{2.html} is similar but that its

1844

image is @file{2.gif} and it links to @file{3.html}. Say this

1845

continues up to some arbitrarily high number.

1846

1847

If one executes the command:

1848

1849

@example

1850

wget -r -l 2 http://@var{site}/1.html

1851

@end example

1852

1853

then @file{1.html}, @file{1.gif}, @file{2.html}, @file{2.gif}, and

1854

@file{3.html} will be downloaded. As you can see, @file{3.html} is

1855

without its requisite @file{3.gif} because Wget is simply counting the

1856

number of hops (up to 2) away from @file{1.html} in order to determine

1857

where to stop the recursion. However, with this command:

1858

1859

@example

1860

wget -r -l 2 -p http://@var{site}/1.html

1861

@end example

1862

1863

all the above files @emph{and} @file{3.html}'s requisite @file{3.gif}

1864

will be downloaded. Similarly,

1865

1866

@example

1867

wget -r -l 1 -p http://@var{site}/1.html

1868

@end example

1869

1870

will cause @file{1.html}, @file{1.gif}, @file{2.html}, and @file{2.gif}

1871

to be downloaded. One might think that:

1872

1873

@example

1874

wget -r -l 0 -p http://@var{site}/1.html

1875

@end example

1876

1877

would download just @file{1.html} and @file{1.gif}, but unfortunately

1878

this is not the case, because @samp{-l 0} is equivalent to

1879

@samp{-l inf}---that is, infinite recursion. To download a single @sc{html}

1880

page (or a handful of them, all specified on the command-line or in a

1881

@samp{-i} @sc{url} input file) and its (or their) requisites, simply leave off

1882

@samp{-r} and @samp{-l}:

1883

1884

@example

1885

wget -p http://@var{site}/1.html

1886

@end example

1887

1888

Note that Wget will behave as if @samp{-r} had been specified, but only

1889

that single page and its requisites will be downloaded. Links from that

1890

page to external documents will not be followed. Actually, to download

1891

a single page and all its requisites (even if they exist on separate

1892

websites), and make sure the lot displays properly locally, this author

1893

likes to use a few options in addition to @samp{-p}:

1894

1895

@example

1896

wget -E -H -k -K -p http://@var{site}/@var{document}

1897

@end example

1898

1899

To finish off this topic, it's worth knowing that Wget's idea of an

1900

external document link is any URL specified in an @code{<A>} tag, an

1901

@code{<AREA>} tag, or a @code{<LINK>} tag other than @code{<LINK

1902

REL="stylesheet">}.

1903

1904

@cindex @sc{html} comments

1905

@cindex comments, @sc{html}

1906

@item --strict-comments

1907

Turn on strict parsing of @sc{html} comments. The default is to terminate

1908

comments at the first occurrence of @samp{-->}.

1909

1910

According to specifications, @sc{html} comments are expressed as @sc{sgml}

1911

@dfn{declarations}. Declaration is special markup that begins with

1912

@samp{<!} and ends with @samp{>}, such as @samp{<!DOCTYPE ...>}, that

1913

may contain comments between a pair of @samp{--} delimiters. @sc{html}

1914

comments are ``empty declarations'', @sc{sgml} declarations without any

1915

non-comment text. Therefore, @samp{} is a valid comment, and

1916

so is @samp{}, but @samp{} is not.

1917

1918

On the other hand, most @sc{html} writers don't perceive comments as anything

1919

other than text delimited with @samp{}, which is not

1920

quite the same. For example, something like @samp{}

1921

works as a valid comment as long as the number of dashes is a multiple

1922

of four (!). If not, the comment technically lasts until the next

1923

@samp{--}, which may be at the other end of the document. Because of

1924

this, many popular browsers completely ignore the specification and

1925

implement what users have come to expect: comments delimited with

1926

@samp{}.

1927

1928

Until version 1.9, Wget interpreted comments strictly, which resulted in

1929

missing links in many web pages that displayed fine in browsers, but had

1930

the misfortune of containing non-compliant comments. Beginning with

1931

version 1.9, Wget has joined the ranks of clients that implements

1932

``naive'' comments, terminating each comment at the first occurrence of

1933

@samp{-->}.

1934

1935

If, for whatever reason, you want strict comment parsing, use this

1936

option to turn it on.

1937

@end table

1938

1939

@node Recursive Accept/Reject Options, Exit Status, Recursive Retrieval Options, Invoking

1940

@section Recursive Accept/Reject Options

1941

1942

@table @samp

1943

@item -A @var{acclist} --accept @var{acclist}

1944

@itemx -R @var{rejlist} --reject @var{rejlist}

1945

Specify comma-separated lists of file name suffixes or patterns to

1946

accept or reject (@pxref{Types of Files}). Note that if

1947

any of the wildcard characters, @samp{*}, @samp{?}, @samp{[} or

1948

@samp{]}, appear in an element of @var{acclist} or @var{rejlist},

1949

it will be treated as a pattern, rather than a suffix.

1950

1951

@item -D @var{domain-list}

1952

@itemx --domains=@var{domain-list}

1953

Set domains to be followed. @var{domain-list} is a comma-separated list

1954

of domains. Note that it does @emph{not} turn on @samp{-H}.

1955

1956

@item --exclude-domains @var{domain-list}

1957

Specify the domains that are @emph{not} to be followed.

1958

(@pxref{Spanning Hosts}).

1959

1960

@cindex follow FTP links

1961

@item --follow-ftp

1962

Follow @sc{ftp} links from @sc{html} documents. Without this option,

1963

Wget will ignore all the @sc{ftp} links.

1964

1965

@cindex tag-based recursive pruning

1966

@item --follow-tags=@var{list}

1967

Wget has an internal table of @sc{html} tag / attribute pairs that it

1968

considers when looking for linked documents during a recursive

1969

retrieval. If a user wants only a subset of those tags to be

1970

considered, however, he or she should be specify such tags in a

1971

comma-separated @var{list} with this option.

1972

1973

@item --ignore-tags=@var{list}

1974

This is the opposite of the @samp{--follow-tags} option. To skip

1975

certain @sc{html} tags when recursively looking for documents to download,

1976

specify them in a comma-separated @var{list}.

1977

1978

In the past, this option was the best bet for downloading a single page

1979

and its requisites, using a command-line like:

1980

1981

@example

1982

wget --ignore-tags=a,area -H -k -K -r http://@var{site}/@var{document}

1983

@end example

1984

1985

However, the author of this option came across a page with tags like

1986

@code{<LINK REL="home" HREF="/">} and came to the realization that

1987

specifying tags to ignore was not enough. One can't just tell Wget to

1988

ignore @code{<LINK>}, because then stylesheets will not be downloaded.

1989

Now the best bet for downloading a single page and its requisites is the

1990

dedicated @samp{--page-requisites} option.

1991

1992

@cindex case fold

1993

@cindex ignore case

1994

@item --ignore-case

1995

Ignore case when matching files and directories. This influences the

1996

behavior of -R, -A, -I, and -X options, as well as globbing

1997

implemented when downloading from FTP sites. For example, with this

1998

option, @samp{-A *.txt} will match @samp{file1.txt}, but also

1999

@samp{file2.TXT}, @samp{file3.TxT}, and so on.

2000

2001

@item -H

2002

@itemx --span-hosts

2003

Enable spanning across hosts when doing recursive retrieving

2004

(@pxref{Spanning Hosts}).

2005

2006

@item -L

2007

@itemx --relative

2008

Follow relative links only. Useful for retrieving a specific home page

2009

without any distractions, not even those from the same hosts

2010

(@pxref{Relative Links}).

2011

2012

@item -I @var{list}

2013

@itemx --include-directories=@var{list}

2014

Specify a comma-separated list of directories you wish to follow when

2015

downloading (@pxref{Directory-Based Limits}). Elements

2016

of @var{list} may contain wildcards.

2017

2018

@item -X @var{list}

2019

@itemx --exclude-directories=@var{list}

2020

Specify a comma-separated list of directories you wish to exclude from

2021

download (@pxref{Directory-Based Limits}). Elements of

2022

@var{list} may contain wildcards.

2023

2024

@item -np

2025

@item --no-parent

2026

Do not ever ascend to the parent directory when retrieving recursively.

2027

This is a useful option, since it guarantees that only the files

2028

@emph{below} a certain hierarchy will be downloaded.

2029

@xref{Directory-Based Limits}, for more details.

2030

@end table

2031

2032

@c man end

2033

2034

@node Exit Status, , Recursive Accept/Reject Options, Invoking

2035

@section Exit Status

2036

2037

@c man begin EXITSTATUS

2038

2039

Wget may return one of several error codes if it encounters problems.

2040

2041

2042

@table @asis

2043

@item 0

2044

No problems occurred.

2045

2046

@item 1

2047

Generic error code.

2048

2049

@item 2

2050

Parse error---for instance, when parsing command-line options, the

2051

@samp{.wgetrc} or @samp{.netrc}...

2052

2053

@item 3

2054

File I/O error.

2055

2056

@item 4

2057

Network failure.

2058

2059

@item 5

2060

SSL verification failure.

2061

2062

@item 6

2063

Username/password authentication failure.

2064

2065

@item 7

2066

Protocol errors.

2067

2068

@item 8

2069

Server issued an error response.

2070

@end table

2071

2072

2073

With the exceptions of 0 and 1, the lower-numbered exit codes take

2074

precedence over higher-numbered ones, when multiple types of errors

2075

are encountered.

2076

2077

In versions of Wget prior to 1.12, Wget's exit status tended to be

2078

unhelpful and inconsistent. Recursive downloads would virtually always

2079

return 0 (success), regardless of any issues encountered, and

2080

non-recursive fetches only returned the status corresponding to the

2081

most recently-attempted download.

2082

2083

@c man end

2084

2085

@node Recursive Download, Following Links, Invoking, Top

2086

@chapter Recursive Download

2087

@cindex recursion

2088

@cindex retrieving

2089

@cindex recursive download

2090

2091

GNU Wget is capable of traversing parts of the Web (or a single

2092

@sc{http} or @sc{ftp} server), following links and directory structure.

2093

We refer to this as to @dfn{recursive retrieval}, or @dfn{recursion}.

2094

2095

With @sc{http} @sc{url}s, Wget retrieves and parses the @sc{html} or

2096

@sc{css} from the given @sc{url}, retrieving the files the document

2097

refers to, through markup like @code{href} or @code{src}, or @sc{css}

2098

@sc{uri} values specified using the @samp{url()} functional notation.

2099

If the freshly downloaded file is also of type @code{text/html},

2100

@code{application/xhtml+xml}, or @code{text/css}, it will be parsed

2101

and followed further.

2102

2103

Recursive retrieval of @sc{http} and @sc{html}/@sc{css} content is

2104

@dfn{breadth-first}. This means that Wget first downloads the requested

2105

document, then the documents linked from that document, then the

2106

documents linked by them, and so on. In other words, Wget first

2107

downloads the documents at depth 1, then those at depth 2, and so on

2108

until the specified maximum depth.

2109

2110

The maximum @dfn{depth} to which the retrieval may descend is specified

2111

with the @samp{-l} option. The default maximum depth is five layers.

2112

2113

When retrieving an @sc{ftp} @sc{url} recursively, Wget will retrieve all

2114

the data from the given directory tree (including the subdirectories up

2115

to the specified depth) on the remote server, creating its mirror image

2116

locally. @sc{ftp} retrieval is also limited by the @code{depth}

2117

parameter. Unlike @sc{http} recursion, @sc{ftp} recursion is performed

2118

depth-first.

2119

2120

By default, Wget will create a local directory tree, corresponding to

2121

the one found on the remote server.

2122

2123

Recursive retrieving can find a number of applications, the most

2124

important of which is mirroring. It is also useful for @sc{www}

2125

presentations, and any other opportunities where slow network

2126

connections should be bypassed by storing the files locally.

2127

2128

You should be warned that recursive downloads can overload the remote

2129

servers. Because of that, many administrators frown upon them and may

2130

ban access from your site if they detect very fast downloads of big

2131

amounts of content. When downloading from Internet servers, consider

2132

using the @samp{-w} option to introduce a delay between accesses to the

2133

server. The download will take a while longer, but the server

2134

administrator will not be alarmed by your rudeness.

2135

2136

Of course, recursive download may cause problems on your machine. If

2137

left to run unchecked, it can easily fill up the disk. If downloading

2138

from local network, it can also take bandwidth on the system, as well as

2139

consume memory and CPU.

2140

2141

Try to specify the criteria that match the kind of download you are

2142

trying to achieve. If you want to download only one page, use

2143

@samp{--page-requisites} without any additional recursion. If you want

2144

to download things under one directory, use @samp{-np} to avoid

2145

downloading things from other directories. If you want to download all

2146

the files from one directory, use @samp{-l 1} to make sure the recursion

2147

depth never exceeds one. @xref{Following Links}, for more information

2148

about this.

2149

2150

Recursive retrieval should be used with care. Don't say you were not

2151

warned.

2152

2153

@node Following Links, Time-Stamping, Recursive Download, Top

2154

@chapter Following Links

2155

@cindex links

2156

@cindex following links

2157

2158

When retrieving recursively, one does not wish to retrieve loads of

2159

unnecessary data. Most of the time the users bear in mind exactly what

2160

they want to download, and want Wget to follow only specific links.

2161

2162

For example, if you wish to download the music archive from

2163

@samp{fly.srk.fer.hr}, you will not want to download all the home pages

2164

that happen to be referenced by an obscure part of the archive.

2165

2166

Wget possesses several mechanisms that allows you to fine-tune which

2167

links it will follow.

2168

2169

@menu

2170

* Spanning Hosts:: (Un)limiting retrieval based on host name.

2171

* Types of Files:: Getting only certain files.

2172

* Directory-Based Limits:: Getting only certain directories.

2173

* Relative Links:: Follow relative links only.

2174

* FTP Links:: Following FTP links.

2175

@end menu

2176

2177

@node Spanning Hosts, Types of Files, Following Links, Following Links

2178

@section Spanning Hosts

2179

@cindex spanning hosts

2180

@cindex hosts, spanning

2181

2182

Wget's recursive retrieval normally refuses to visit hosts different

2183

than the one you specified on the command line. This is a reasonable

2184

default; without it, every retrieval would have the potential to turn

2185

your Wget into a small version of google.

2186

2187

However, visiting different hosts, or @dfn{host spanning,} is sometimes

2188

a useful option. Maybe the images are served from a different server.

2189

Maybe you're mirroring a site that consists of pages interlinked between

2190

three servers. Maybe the server has two equivalent names, and the @sc{html}

2191

pages refer to both interchangeably.

2192

2193

@table @asis

2194

@item Span to any host---@samp{-H}

2195

2196

The @samp{-H} option turns on host spanning, thus allowing Wget's

2197

recursive run to visit any host referenced by a link. Unless sufficient

2198

recursion-limiting criteria are applied depth, these foreign hosts will

2199

typically link to yet more hosts, and so on until Wget ends up sucking

2200

up much more data than you have intended.

2201

2202

@item Limit spanning to certain domains---@samp{-D}

2203

2204

The @samp{-D} option allows you to specify the domains that will be

2205

followed, thus limiting the recursion only to the hosts that belong to

2206

these domains. Obviously, this makes sense only in conjunction with

2207

@samp{-H}. A typical example would be downloading the contents of

2208

@samp{www.server.com}, but allowing downloads from

2209

@samp{images.server.com}, etc.:

2210

2211

@example

2212

wget -rH -Dserver.com http://www.server.com/

2213

@end example

2214

2215

You can specify more than one address by separating them with a comma,

2216

e.g. @samp{-Ddomain1.com,domain2.com}.

2217

2218

@item Keep download off certain domains---@samp{--exclude-domains}

2219

2220

If there are domains you want to exclude specifically, you can do it

2221

with @samp{--exclude-domains}, which accepts the same type of arguments

2222

of @samp{-D}, but will @emph{exclude} all the listed domains. For

2223

example, if you want to download all the hosts from @samp{foo.edu}

2224

domain, with the exception of @samp{sunsite.foo.edu}, you can do it like

2225

this:

2226

2227

@example

2228

wget -rH -Dfoo.edu --exclude-domains sunsite.foo.edu \

2229

http://www.foo.edu/

2230

@end example

2231

2232

@end table

2233

2234

@node Types of Files, Directory-Based Limits, Spanning Hosts, Following Links

2235

@section Types of Files

2236

@cindex types of files

2237

2238

When downloading material from the web, you will often want to restrict

2239

the retrieval to only certain file types. For example, if you are

2240

interested in downloading @sc{gif}s, you will not be overjoyed to get

2241

loads of PostScript documents, and vice versa.

2242

2243

Wget offers two options to deal with this problem. Each option

2244

description lists a short name, a long name, and the equivalent command

2245

in @file{.wgetrc}.

2246

2247

@cindex accept wildcards

2248

@cindex accept suffixes

2249

@cindex wildcards, accept

2250

@cindex suffixes, accept

2251

@table @samp

2252

@item -A @var{acclist}

2253

@itemx --accept @var{acclist}

2254

@itemx accept = @var{acclist}

2255

The argument to @samp{--accept} option is a list of file suffixes or

2256

patterns that Wget will download during recursive retrieval. A suffix

2257

is the ending part of a file, and consists of ``normal'' letters,

2258

e.g. @samp{gif} or @samp{.jpg}. A matching pattern contains shell-like

2259

wildcards, e.g. @samp{books*} or @samp{zelazny*196[0-9]*}.

2260

2261

So, specifying @samp{wget -A gif,jpg} will make Wget download only the

2262

files ending with @samp{gif} or @samp{jpg}, i.e. @sc{gif}s and

2263

@sc{jpeg}s. On the other hand, @samp{wget -A "zelazny*196[0-9]*"} will

2264

download only files beginning with @samp{zelazny} and containing numbers

2265

from 1960 to 1969 anywhere within. Look up the manual of your shell for

2266

a description of how pattern matching works.

2267

2268

Of course, any number of suffixes and patterns can be combined into a

2269

comma-separated list, and given as an argument to @samp{-A}.

2270

2271

@cindex reject wildcards

2272

@cindex reject suffixes

2273

@cindex wildcards, reject

2274

@cindex suffixes, reject

2275

@item -R @var{rejlist}

2276

@itemx --reject @var{rejlist}

2277

@itemx reject = @var{rejlist}

2278

The @samp{--reject} option works the same way as @samp{--accept}, only

2279

its logic is the reverse; Wget will download all files @emph{except} the

2280

ones matching the suffixes (or patterns) in the list.

2281

2282

So, if you want to download a whole page except for the cumbersome

2283

@sc{mpeg}s and @sc{.au} files, you can use @samp{wget -R mpg,mpeg,au}.

2284

Analogously, to download all files except the ones beginning with

2285

@samp{bjork}, use @samp{wget -R "bjork*"}. The quotes are to prevent

2286

expansion by the shell.

2287

@end table

2288

2289

@noindent

2290

The @samp{-A} and @samp{-R} options may be combined to achieve even

2291

better fine-tuning of which files to retrieve. E.g. @samp{wget -A

2292

"*zelazny*" -R .ps} will download all the files having @samp{zelazny} as

2293

a part of their name, but @emph{not} the PostScript files.

2294

2295

Note that these two options do not affect the downloading of @sc{html}

2296

files (as determined by a @samp{.htm} or @samp{.html} filename

2297

prefix). This behavior may not be desirable for all users, and may be

2298

changed for future versions of Wget.

2299

2300

Note, too, that query strings (strings at the end of a URL beginning

2301

with a question mark (@samp{?}) are not included as part of the

2302

filename for accept/reject rules, even though these will actually

2303

contribute to the name chosen for the local file. It is expected that

2304

a future version of Wget will provide an option to allow matching

2305

against query strings.

2306

2307

Finally, it's worth noting that the accept/reject lists are matched

2308

@emph{twice} against downloaded files: once against the URL's filename

2309

portion, to determine if the file should be downloaded in the first

2310

place; then, after it has been accepted and successfully downloaded,

2311

the local file's name is also checked against the accept/reject lists

2312

to see if it should be removed. The rationale was that, since

2313

@samp{.htm} and @samp{.html} files are always downloaded regardless of

2314

accept/reject rules, they should be removed @emph{after} being

2315

downloaded and scanned for links, if they did match the accept/reject

2316

lists. However, this can lead to unexpected results, since the local

2317

filenames can differ from the original URL filenames in the following

2318

ways, all of which can change whether an accept/reject rule matches:

2319

2320

@itemize @bullet

2321

@item

2322

If the local file already exists and @samp{--no-directories} was

2323

specified, a numeric suffix will be appended to the original name.

2324

@item

2325

If @samp{--adjust-extension} was specified, the local filename might have

2326

@samp{.html} appended to it. If Wget is invoked with @samp{-E -A.php},

2327

a filename such as @samp{index.php} will match be accepted, but upon

2328

download will be named @samp{index.php.html}, which no longer matches,

2329

and so the file will be deleted.

2330

@item

2331

Query strings do not contribute to URL matching, but are included in

2332

local filenames, and so @emph{do} contribute to filename matching.

2333

@end itemize

2334

2335

@noindent

2336

This behavior, too, is considered less-than-desirable, and may change

2337

in a future version of Wget.

2338

2339

@node Directory-Based Limits, Relative Links, Types of Files, Following Links

2340

@section Directory-Based Limits

2341

@cindex directories

2342

@cindex directory limits

2343

2344

Regardless of other link-following facilities, it is often useful to

2345

place the restriction of what files to retrieve based on the directories

2346

those files are placed in. There can be many reasons for this---the

2347

home pages may be organized in a reasonable directory structure; or some

2348

directories may contain useless information, e.g. @file{/cgi-bin} or

2349

@file{/dev} directories.

2350

2351

Wget offers three different options to deal with this requirement. Each

2352

option description lists a short name, a long name, and the equivalent

2353

command in @file{.wgetrc}.

2354

2355

@cindex directories, include

2356

@cindex include directories

2357

@cindex accept directories

2358

@table @samp

2359

@item -I @var{list}

2360

@itemx --include @var{list}

2361

@itemx include_directories = @var{list}

2362

@samp{-I} option accepts a comma-separated list of directories included

2363

in the retrieval. Any other directories will simply be ignored. The

2364

directories are absolute paths.

2365

2366

So, if you wish to download from @samp{http://host/people/bozo/}

2367

following only links to bozo's colleagues in the @file{/people}

2368

directory and the bogus scripts in @file{/cgi-bin}, you can specify:

2369

2370

@example

2371

wget -I /people,/cgi-bin http://host/people/bozo/

2372

@end example

2373

2374

@cindex directories, exclude

2375

@cindex exclude directories

2376

@cindex reject directories

2377

@item -X @var{list}

2378

@itemx --exclude @var{list}

2379

@itemx exclude_directories = @var{list}

2380

@samp{-X} option is exactly the reverse of @samp{-I}---this is a list of

2381

directories @emph{excluded} from the download. E.g. if you do not want

2382

Wget to download things from @file{/cgi-bin} directory, specify @samp{-X

2383

/cgi-bin} on the command line.

2384

2385

The same as with @samp{-A}/@samp{-R}, these two options can be combined

2386

to get a better fine-tuning of downloading subdirectories. E.g. if you

2387

want to load all the files from @file{/pub} hierarchy except for

2388

@file{/pub/worthless}, specify @samp{-I/pub -X/pub/worthless}.

2389

2390

@cindex no parent

2391

@item -np

2392

@itemx --no-parent

2393

@itemx no_parent = on

2394

The simplest, and often very useful way of limiting directories is

2395

disallowing retrieval of the links that refer to the hierarchy

2396

@dfn{above} than the beginning directory, i.e. disallowing ascent to the

2397

parent directory/directories.

2398

2399

The @samp{--no-parent} option (short @samp{-np}) is useful in this case.

2400

Using it guarantees that you will never leave the existing hierarchy.

2401

Supposing you issue Wget with:

2402

2403

@example

2404

wget -r --no-parent http://somehost/~luzer/my-archive/

2405

@end example

2406

2407

You may rest assured that none of the references to

2408

@file{/~his-girls-homepage/} or @file{/~luzer/all-my-mpegs/} will be

2409

followed. Only the archive you are interested in will be downloaded.

2410

Essentially, @samp{--no-parent} is similar to

2411

@samp{-I/~luzer/my-archive}, only it handles redirections in a more

2412

intelligent fashion.

2413

2414

@strong{Note} that, for HTTP (and HTTPS), the trailing slash is very

2415

important to @samp{--no-parent}. HTTP has no concept of a ``directory''---Wget

2416

relies on you to indicate what's a directory and what isn't. In

2417

@samp{http://foo/bar/}, Wget will consider @samp{bar} to be a

2418

directory, while in @samp{http://foo/bar} (no trailing slash),

2419

@samp{bar} will be considered a filename (so @samp{--no-parent} would be

2420

meaningless, as its parent is @samp{/}).

2421

@end table

2422

2423

@node Relative Links, FTP Links, Directory-Based Limits, Following Links

2424

@section Relative Links

2425

@cindex relative links

2426

2427

When @samp{-L} is turned on, only the relative links are ever followed.

2428

Relative links are here defined those that do not refer to the web

2429

server root. For example, these links are relative:

2430

2431

@example

2432

2433

2434

2435

@end example

2436

2437

These links are not relative:

2438

2439

@example

2440

2441

2442

2443

@end example

2444

2445

Using this option guarantees that recursive retrieval will not span

2446

hosts, even without @samp{-H}. In simple cases it also allows downloads

2447

to ``just work'' without having to convert links.

2448

2449

This option is probably not very useful and might be removed in a future

2450

release.

2451

2452

@node FTP Links, , Relative Links, Following Links

2453

@section Following FTP Links

2454

@cindex following ftp links

2455

2456

The rules for @sc{ftp} are somewhat specific, as it is necessary for

2457

them to be. @sc{ftp} links in @sc{html} documents are often included

2458

for purposes of reference, and it is often inconvenient to download them

2459

by default.

2460

2461

To have @sc{ftp} links followed from @sc{html} documents, you need to

2462

specify the @samp{--follow-ftp} option. Having done that, @sc{ftp}

2463

links will span hosts regardless of @samp{-H} setting. This is logical,

2464

as @sc{ftp} links rarely point to the same host where the @sc{http}

2465

server resides. For similar reasons, the @samp{-L} options has no

2466

effect on such downloads. On the other hand, domain acceptance

2467

(@samp{-D}) and suffix rules (@samp{-A} and @samp{-R}) apply normally.

2468

2469

Also note that followed links to @sc{ftp} directories will not be

2470

retrieved recursively further.

2471

2472

@node Time-Stamping, Startup File, Following Links, Top

2473

@chapter Time-Stamping

2474

@cindex time-stamping

2475

@cindex timestamping

2476

@cindex updating the archives

2477

@cindex incremental updating

2478

2479

One of the most important aspects of mirroring information from the

2480

Internet is updating your archives.

2481

2482

Downloading the whole archive again and again, just to replace a few

2483

changed files is expensive, both in terms of wasted bandwidth and money,

2484

and the time to do the update. This is why all the mirroring tools

2485

offer the option of incremental updating.

2486

2487

Such an updating mechanism means that the remote server is scanned in

2488

search of @dfn{new} files. Only those new files will be downloaded in

2489

the place of the old ones.

2490

2491

A file is considered new if one of these two conditions are met:

2492

2493

@enumerate

2494

@item

2495

A file of that name does not already exist locally.

2496

2497

@item

2498

A file of that name does exist, but the remote file was modified more

2499

recently than the local file.

2500

@end enumerate

2501

2502

To implement this, the program needs to be aware of the time of last

2503

modification of both local and remote files. We call this information the

2504

@dfn{time-stamp} of a file.

2505

2506

The time-stamping in GNU Wget is turned on using @samp{--timestamping}

2507

(@samp{-N}) option, or through @code{timestamping = on} directive in

2508

@file{.wgetrc}. With this option, for each file it intends to download,

2509

Wget will check whether a local file of the same name exists. If it

2510

does, and the remote file is not newer, Wget will not download it.

2511

2512

If the local file does not exist, or the sizes of the files do not

2513

match, Wget will download the remote file no matter what the time-stamps

2514

say.

2515

2516

@menu

2517

* Time-Stamping Usage::

2518

* HTTP Time-Stamping Internals::

2519

* FTP Time-Stamping Internals::

2520

@end menu

2521

2522

@node Time-Stamping Usage, HTTP Time-Stamping Internals, Time-Stamping, Time-Stamping

2523

@section Time-Stamping Usage

2524

@cindex time-stamping usage

2525

@cindex usage, time-stamping

2526

2527

The usage of time-stamping is simple. Say you would like to download a

2528

file so that it keeps its date of modification.

2529

2530

@example

2531

wget -S http://www.gnu.ai.mit.edu/

2532

@end example

2533

2534

A simple @code{ls -l} shows that the time stamp on the local file equals

2535

the state of the @code{Last-Modified} header, as returned by the server.

2536

As you can see, the time-stamping info is preserved locally, even

2537

without @samp{-N} (at least for @sc{http}).

2538

2539

Several days later, you would like Wget to check if the remote file has

2540

changed, and download it if it has.

2541

2542

@example

2543

wget -N http://www.gnu.ai.mit.edu/

2544

@end example

2545

2546

Wget will ask the server for the last-modified date. If the local file

2547

has the same timestamp as the server, or a newer one, the remote file

2548

will not be re-fetched. However, if the remote file is more recent,

2549

Wget will proceed to fetch it.

2550

2551

The same goes for @sc{ftp}. For example:

2552

2553

@example

2554

wget "ftp://ftp.ifi.uio.no/pub/emacs/gnus/*"

2555

@end example

2556

2557

(The quotes around that URL are to prevent the shell from trying to

2558

interpret the @samp{*}.)

2559

2560

After download, a local directory listing will show that the timestamps

2561

match those on the remote server. Reissuing the command with @samp{-N}

2562

will make Wget re-fetch @emph{only} the files that have been modified

2563

since the last download.

2564

2565

If you wished to mirror the GNU archive every week, you would use a

2566

command like the following, weekly:

2567

2568

@example

2569

wget --timestamping -r ftp://ftp.gnu.org/pub/gnu/

2570

@end example

2571

2572

Note that time-stamping will only work for files for which the server

2573

gives a timestamp. For @sc{http}, this depends on getting a

2574

@code{Last-Modified} header. For @sc{ftp}, this depends on getting a

2575

directory listing with dates in a format that Wget can parse

2576

(@pxref{FTP Time-Stamping Internals}).

2577

2578

@node HTTP Time-Stamping Internals, FTP Time-Stamping Internals, Time-Stamping Usage, Time-Stamping

2579

@section HTTP Time-Stamping Internals

2580

@cindex http time-stamping

2581

2582

Time-stamping in @sc{http} is implemented by checking of the

2583

@code{Last-Modified} header. If you wish to retrieve the file

2584

@file{foo.html} through @sc{http}, Wget will check whether

2585

@file{foo.html} exists locally. If it doesn't, @file{foo.html} will be

2586

retrieved unconditionally.

2587

2588

If the file does exist locally, Wget will first check its local

2589

time-stamp (similar to the way @code{ls -l} checks it), and then send a

2590

@code{HEAD} request to the remote server, demanding the information on

2591

the remote file.

2592

2593

The @code{Last-Modified} header is examined to find which file was

2594

modified more recently (which makes it ``newer''). If the remote file

2595

is newer, it will be downloaded; if it is older, Wget will give

2596

up.@footnote{As an additional check, Wget will look at the

2597

@code{Content-Length} header, and compare the sizes; if they are not the

2598

same, the remote file will be downloaded no matter what the time-stamp

2599

says.}

2600

2601

When @samp{--backup-converted} (@samp{-K}) is specified in conjunction

2602

with @samp{-N}, server file @samp{@var{X}} is compared to local file

2603

@samp{@var{X}.orig}, if extant, rather than being compared to local file

2604

@samp{@var{X}}, which will always differ if it's been converted by

2605

@samp{--convert-links} (@samp{-k}).

2606

2607

Arguably, @sc{http} time-stamping should be implemented using the

2608

@code{If-Modified-Since} request.

2609

2610

@node FTP Time-Stamping Internals, , HTTP Time-Stamping Internals, Time-Stamping

2611

@section FTP Time-Stamping Internals

2612

@cindex ftp time-stamping

2613

2614

In theory, @sc{ftp} time-stamping works much the same as @sc{http}, only

2615

@sc{ftp} has no headers---time-stamps must be ferreted out of directory

2616

listings.

2617

2618

If an @sc{ftp} download is recursive or uses globbing, Wget will use the

2619

@sc{ftp} @code{LIST} command to get a file listing for the directory

2620

containing the desired file(s). It will try to analyze the listing,

2621

treating it like Unix @code{ls -l} output, extracting the time-stamps.

2622

The rest is exactly the same as for @sc{http}. Note that when

2623

retrieving individual files from an @sc{ftp} server without using

2624

globbing or recursion, listing files will not be downloaded (and thus

2625

files will not be time-stamped) unless @samp{-N} is specified.

2626

2627

Assumption that every directory listing is a Unix-style listing may

2628

sound extremely constraining, but in practice it is not, as many

2629

non-Unix @sc{ftp} servers use the Unixoid listing format because most

2630

(all?) of the clients understand it. Bear in mind that @sc{rfc959}

2631

defines no standard way to get a file list, let alone the time-stamps.

2632

We can only hope that a future standard will define this.

2633

2634

Another non-standard solution includes the use of @code{MDTM} command

2635

that is supported by some @sc{ftp} servers (including the popular

2636

@code{wu-ftpd}), which returns the exact time of the specified file.

2637

Wget may support this command in the future.

2638

2639

@node Startup File, Examples, Time-Stamping, Top

2640

@chapter Startup File

2641

@cindex startup file

2642

@cindex wgetrc

2643

@cindex .wgetrc

2644

@cindex startup

2645

@cindex .netrc

2646

2647

Once you know how to change default settings of Wget through command

2648

line arguments, you may wish to make some of those settings permanent.

2649

You can do that in a convenient way by creating the Wget startup

2650

file---@file{.wgetrc}.

2651

2652

Besides @file{.wgetrc} is the ``main'' initialization file, it is

2653

convenient to have a special facility for storing passwords. Thus Wget

2654

reads and interprets the contents of @file{$HOME/.netrc}, if it finds

2655

it. You can find @file{.netrc} format in your system manuals.

2656

2657

Wget reads @file{.wgetrc} upon startup, recognizing a limited set of

2658

commands.

2659

2660

@menu

2661

* Wgetrc Location:: Location of various wgetrc files.

2662

* Wgetrc Syntax:: Syntax of wgetrc.

2663

* Wgetrc Commands:: List of available commands.

2664

* Sample Wgetrc:: A wgetrc example.

2665

@end menu

2666

2667

@node Wgetrc Location, Wgetrc Syntax, Startup File, Startup File

2668

@section Wgetrc Location

2669

@cindex wgetrc location

2670

@cindex location of wgetrc

2671

2672

When initializing, Wget will look for a @dfn{global} startup file,

2673

@file{/etc/wgetrc} by default and read commands from there, if it exists.

2674

2675

Then it will look for the user's file. If the environmental variable

2676

@code{WGETRC} is set, Wget will try to load that file. Failing that, no

2677

further attempts will be made.

2678

2679

If @code{WGETRC} is not set, Wget will try to load @file{$HOME/.wgetrc}.

2680

2681

The fact that user's settings are loaded after the system-wide ones

2682

means that in case of collision user's wgetrc @emph{overrides} the

2683

system-wide wgetrc (in @file{/etc/wgetrc} by default).

2684

Fascist admins, away!

2685

2686

@node Wgetrc Syntax, Wgetrc Commands, Wgetrc Location, Startup File

2687

@section Wgetrc Syntax

2688

@cindex wgetrc syntax

2689

@cindex syntax of wgetrc

2690

2691

The syntax of a wgetrc command is simple:

2692

2693

@example

2694

variable = value

2695

@end example

2696

2697

The @dfn{variable} will also be called @dfn{command}. Valid

2698

@dfn{values} are different for different commands.

2699

2700

The commands are case-insensitive and underscore-insensitive. Thus

2701

@samp{DIr__PrefiX} is the same as @samp{dirprefix}. Empty lines, lines

2702

beginning with @samp{#} and lines containing white-space only are

2703

discarded.

2704

2705

Commands that expect a comma-separated list will clear the list on an

2706

empty command. So, if you wish to reset the rejection list specified in

2707

global @file{wgetrc}, you can do it with:

2708

2709

@example

2710

reject =

2711

@end example

2712

2713

@node Wgetrc Commands, Sample Wgetrc, Wgetrc Syntax, Startup File

2714

@section Wgetrc Commands

2715

@cindex wgetrc commands

2716

2717

The complete set of commands is listed below. Legal values are listed

2718

after the @samp{=}. Simple Boolean values can be set or unset using

2719

@samp{on} and @samp{off} or @samp{1} and @samp{0}.

2720

2721

Some commands take pseudo-arbitrary values. @var{address} values can be

2722

hostnames or dotted-quad IP addresses. @var{n} can be any positive

2723

integer, or @samp{inf} for infinity, where appropriate. @var{string}

2724

values can be any non-empty string.

2725

2726

Most of these commands have direct command-line equivalents. Also, any

2727

wgetrc command can be specified on the command line using the

2728

@samp{--execute} switch (@pxref{Basic Startup Options}.)

2729

2730

@table @asis

2731

@item accept/reject = @var{string}

2732

Same as @samp{-A}/@samp{-R} (@pxref{Types of Files}).

2733

2734

@item add_hostdir = on/off

2735

Enable/disable host-prefixed file names. @samp{-nH} disables it.

2736

2737

@item ask_password = on/off

2738

Prompt for a password for each connection established. Cannot be specified

2739

when @samp{--password} is being used, because they are mutually

2740

exclusive. Equivalent to @samp{--ask-password}.

2741

2742

@item auth_no_challenge = on/off

2743

If this option is given, Wget will send Basic HTTP authentication

2744

information (plaintext username and password) for all requests. See

2745

@samp{--auth-no-challenge}.

2746

2747

@item background = on/off

2748

Enable/disable going to background---the same as @samp{-b} (which

2749

enables it).

2750

2751

@item backup_converted = on/off

2752

Enable/disable saving pre-converted files with the suffix

2753

@samp{.orig}---the same as @samp{-K} (which enables it).

2754

2755

@c @item backups = @var{number}

2756

@c #### Document me!

2757

2758

@item base = @var{string}

2759

Consider relative @sc{url}s in input files (specified via the

2760

@samp{input} command or the @samp{--input-file}/@samp{-i} option,

2761

together with @samp{force_html} or @samp{--force-html})

2762

as being relative to @var{string}---the same as @samp{--base=@var{string}}.

2763

2764

@item bind_address = @var{address}

2765

Bind to @var{address}, like the @samp{--bind-address=@var{address}}.

2766

2767

@item ca_certificate = @var{file}

2768

Set the certificate authority bundle file to @var{file}. The same

2769

as @samp{--ca-certificate=@var{file}}.

2770

2771

@item ca_directory = @var{directory}

2772

Set the directory used for certificate authorities. The same as

2773

@samp{--ca-directory=@var{directory}}.

2774

2775

@item cache = on/off

2776

When set to off, disallow server-caching. See the @samp{--no-cache}

2777

option.

2778

2779

@item certificate = @var{file}

2780

Set the client certificate file name to @var{file}. The same as

2781

@samp{--certificate=@var{file}}.

2782

2783

@item certificate_type = @var{string}

2784

Specify the type of the client certificate, legal values being

2785

@samp{PEM} (the default) and @samp{DER} (aka ASN1). The same as

2786

@samp{--certificate-type=@var{string}}.

2787

2788

@item check_certificate = on/off

2789

If this is set to off, the server certificate is not checked against

2790

the specified client authorities. The default is ``on''. The same as

2791

@samp{--check-certificate}.

2792

2793

@item connect_timeout = @var{n}

2794

Set the connect timeout---the same as @samp{--connect-timeout}.

2795

2796

@item content_disposition = on/off

2797

Turn on recognition of the (non-standard) @samp{Content-Disposition}

2798

HTTP header---if set to @samp{on}, the same as @samp{--content-disposition}.

2799

2800

@item continue = on/off

2801

If set to on, force continuation of preexistent partially retrieved

2802

files. See @samp{-c} before setting it.

2803

2804

@item convert_links = on/off

2805

Convert non-relative links locally. The same as @samp{-k}.

2806

2807

@item cookies = on/off

2808

When set to off, disallow cookies. See the @samp{--cookies} option.

2809

2810

@item cut_dirs = @var{n}

2811

Ignore @var{n} remote directory components. Equivalent to

2812

@samp{--cut-dirs=@var{n}}.

2813

2814

@item debug = on/off

2815

Debug mode, same as @samp{-d}.

2816

2817

@item default_page = @var{string}

2818

Default page name---the same as @samp{--default-page=@var{string}}.

2819

2820

@item delete_after = on/off

2821

Delete after download---the same as @samp{--delete-after}.

2822

2823

@item dir_prefix = @var{string}

2824

Top of directory tree---the same as @samp{-P @var{string}}.

2825

2826

@item dirstruct = on/off

2827

Turning dirstruct on or off---the same as @samp{-x} or @samp{-nd},

2828

respectively.

2829

2830

@item dns_cache = on/off

2831

Turn DNS caching on/off. Since DNS caching is on by default, this

2832

option is normally used to turn it off and is equivalent to

2833

@samp{--no-dns-cache}.

2834

2835

@item dns_timeout = @var{n}

2836

Set the DNS timeout---the same as @samp{--dns-timeout}.

2837

2838

@item domains = @var{string}

2839

Same as @samp{-D} (@pxref{Spanning Hosts}).

2840

2841

@item dot_bytes = @var{n}

2842

Specify the number of bytes ``contained'' in a dot, as seen throughout

2843

the retrieval (1024 by default). You can postfix the value with

2844

@samp{k} or @samp{m}, representing kilobytes and megabytes,

2845

respectively. With dot settings you can tailor the dot retrieval to

2846

suit your needs, or you can use the predefined @dfn{styles}

2847

(@pxref{Download Options}).

2848

2849

@item dot_spacing = @var{n}

2850

Specify the number of dots in a single cluster (10 by default).

2851

2852

@item dots_in_line = @var{n}

2853

Specify the number of dots that will be printed in each line throughout

2854

the retrieval (50 by default).

2855

2856

@item egd_file = @var{file}

2857

Use @var{string} as the EGD socket file name. The same as

2858

@samp{--egd-file=@var{file}}.

2859

2860

@item exclude_directories = @var{string}

2861

Specify a comma-separated list of directories you wish to exclude from

2862

download---the same as @samp{-X @var{string}} (@pxref{Directory-Based

2863

Limits}).

2864

2865

@item exclude_domains = @var{string}

2866

Same as @samp{--exclude-domains=@var{string}} (@pxref{Spanning

2867

Hosts}).

2868

2869

@item follow_ftp = on/off

2870

Follow @sc{ftp} links from @sc{html} documents---the same as

2871

@samp{--follow-ftp}.

2872

2873

@item follow_tags = @var{string}

2874

Only follow certain @sc{html} tags when doing a recursive retrieval,

2875

just like @samp{--follow-tags=@var{string}}.

2876

2877

@item force_html = on/off

2878

If set to on, force the input filename to be regarded as an @sc{html}

2879

document---the same as @samp{-F}.

2880

2881

@item ftp_password = @var{string}

2882

Set your @sc{ftp} password to @var{string}. Without this setting, the

2883

password defaults to @samp{-wget@@}, which is a useful default for

2884

anonymous @sc{ftp} access.

2885

2886

This command used to be named @code{passwd} prior to Wget 1.10.

2887

2888

@item ftp_proxy = @var{string}

2889

Use @var{string} as @sc{ftp} proxy, instead of the one specified in

2890

environment.

2891

2892

@item ftp_user = @var{string}

2893

Set @sc{ftp} user to @var{string}.

2894

2895

This command used to be named @code{login} prior to Wget 1.10.

2896

2897

@item glob = on/off

2898

Turn globbing on/off---the same as @samp{--glob} and @samp{--no-glob}.

2899

2900

@item header = @var{string}

2901

Define a header for HTTP downloads, like using

2902

@samp{--header=@var{string}}.

2903

2904

@item adjust_extension = on/off

2905

Add a @samp{.html} extension to @samp{text/html} or

2906

@samp{application/xhtml+xml} files that lack one, or a @samp{.css}

2907

extension to @samp{text/css} files that lack one, like

2908

@samp{-E}. Previously named @samp{html_extension} (still acceptable,

2909

but deprecated).

2910

2911

@item http_keep_alive = on/off

2912

Turn the keep-alive feature on or off (defaults to on). Turning it

2913

off is equivalent to @samp{--no-http-keep-alive}.

2914

2915

@item http_password = @var{string}

2916

Set @sc{http} password, equivalent to

2917

@samp{--http-password=@var{string}}.

2918

2919

@item http_proxy = @var{string}

2920

Use @var{string} as @sc{http} proxy, instead of the one specified in

2921

environment.

2922

2923

@item http_user = @var{string}

2924

Set @sc{http} user to @var{string}, equivalent to

2925

@samp{--http-user=@var{string}}.

2926

2927

@item https_proxy = @var{string}

2928

Use @var{string} as @sc{https} proxy, instead of the one specified in

2929

environment.

2930

2931

@item ignore_case = on/off

2932

When set to on, match files and directories case insensitively; the

2933

same as @samp{--ignore-case}.

2934

2935

@item ignore_length = on/off

2936

When set to on, ignore @code{Content-Length} header; the same as

2937

@samp{--ignore-length}.

2938

2939

@item ignore_tags = @var{string}

2940

Ignore certain @sc{html} tags when doing a recursive retrieval, like

2941

@samp{--ignore-tags=@var{string}}.

2942

2943

@item include_directories = @var{string}

2944

Specify a comma-separated list of directories you wish to follow when

2945

downloading---the same as @samp{-I @var{string}}.

2946

2947

@item iri = on/off

2948

When set to on, enable internationalized URI (IRI) support; the same as

2949

@samp{--iri}.

2950

2951

@item inet4_only = on/off

2952

Force connecting to IPv4 addresses, off by default. You can put this

2953

in the global init file to disable Wget's attempts to resolve and

2954

connect to IPv6 hosts. Available only if Wget was compiled with IPv6

2955

support. The same as @samp{--inet4-only} or @samp{-4}.

2956

2957

@item inet6_only = on/off

2958

Force connecting to IPv6 addresses, off by default. Available only if

2959

Wget was compiled with IPv6 support. The same as @samp{--inet6-only}

2960

or @samp{-6}.

2961

2962

@item input = @var{file}

2963

Read the @sc{url}s from @var{string}, like @samp{-i @var{file}}.

2964

2965

@item keep_session_cookies = on/off

2966

When specified, causes @samp{save_cookies = on} to also save session

2967

cookies. See @samp{--keep-session-cookies}.

2968

2969

@item limit_rate = @var{rate}

2970

Limit the download speed to no more than @var{rate} bytes per second.

2971

The same as @samp{--limit-rate=@var{rate}}.

2972

2973

@item load_cookies = @var{file}

2974

Load cookies from @var{file}. See @samp{--load-cookies @var{file}}.

2975

2976

@item local_encoding = @var{encoding}

2977

Force Wget to use @var{encoding} as the default system encoding. See

2978

@samp{--local-encoding}.

2979

2980

@item logfile = @var{file}

2981

Set logfile to @var{file}, the same as @samp{-o @var{file}}.

2982

2983

@item max_redirect = @var{number}

2984

Specifies the maximum number of redirections to follow for a resource.

2985

See @samp{--max-redirect=@var{number}}.

2986

2987

@item mirror = on/off

2988

Turn mirroring on/off. The same as @samp{-m}.

2989

2990

@item netrc = on/off

2991

Turn reading netrc on or off.

2992

2993

@item no_clobber = on/off

2994

Same as @samp{-nc}.

2995

2996

@item no_parent = on/off

2997

Disallow retrieving outside the directory hierarchy, like

2998

@samp{--no-parent} (@pxref{Directory-Based Limits}).

2999

3000

@item no_proxy = @var{string}

3001

Use @var{string} as the comma-separated list of domains to avoid in

3002

proxy loading, instead of the one specified in environment.

3003

3004

@item output_document = @var{file}

3005

Set the output filename---the same as @samp{-O @var{file}}.

3006

3007

@item page_requisites = on/off

3008

Download all ancillary documents necessary for a single @sc{html} page to

3009

display properly---the same as @samp{-p}.

3010

3011

@item passive_ftp = on/off

3012

Change setting of passive @sc{ftp}, equivalent to the

3013

@samp{--passive-ftp} option.

3014

3015

@itemx password = @var{string}

3016

Specify password @var{string} for both @sc{ftp} and @sc{http} file retrieval.

3017

This command can be overridden using the @samp{ftp_password} and

3018

@samp{http_password} command for @sc{ftp} and @sc{http} respectively.

3019

3020

@item post_data = @var{string}

3021

Use POST as the method for all HTTP requests and send @var{string} in

3022

the request body. The same as @samp{--post-data=@var{string}}.

3023

3024

@item post_file = @var{file}

3025

Use POST as the method for all HTTP requests and send the contents of

3026

@var{file} in the request body. The same as

3027

@samp{--post-file=@var{file}}.

3028

3029

@item prefer_family = none/IPv4/IPv6

3030

When given a choice of several addresses, connect to the addresses

3031

with specified address family first. The address order returned by

3032

DNS is used without change by default. The same as @samp{--prefer-family},

3033

which see for a detailed discussion of why this is useful.

3034

3035

@item private_key = @var{file}

3036

Set the private key file to @var{file}. The same as

3037

@samp{--private-key=@var{file}}.

3038

3039

@item private_key_type = @var{string}

3040

Specify the type of the private key, legal values being @samp{PEM}

3041

(the default) and @samp{DER} (aka ASN1). The same as

3042

@samp{--private-type=@var{string}}.

3043

3044

@item progress = @var{string}

3045

Set the type of the progress indicator. Legal types are @samp{dot}

3046

and @samp{bar}. Equivalent to @samp{--progress=@var{string}}.

3047

3048

@item protocol_directories = on/off

3049

When set, use the protocol name as a directory component of local file

3050

names. The same as @samp{--protocol-directories}.

3051

3052

@item proxy_password = @var{string}

3053

Set proxy authentication password to @var{string}, like

3054

@samp{--proxy-password=@var{string}}.

3055

3056

@item proxy_user = @var{string}

3057

Set proxy authentication user name to @var{string}, like

3058

@samp{--proxy-user=@var{string}}.

3059

3060

@item quiet = on/off

3061

Quiet mode---the same as @samp{-q}.

3062

3063

@item quota = @var{quota}

3064

Specify the download quota, which is useful to put in the global

3065

@file{wgetrc}. When download quota is specified, Wget will stop

3066

retrieving after the download sum has become greater than quota. The

3067

quota can be specified in bytes (default), kbytes @samp{k} appended) or

3068

mbytes (@samp{m} appended). Thus @samp{quota = 5m} will set the quota

3069

to 5 megabytes. Note that the user's startup file overrides system

3070

settings.

3071

3072

@item random_file = @var{file}

3073

Use @var{file} as a source of randomness on systems lacking

3074

@file{/dev/random}.

3075

3076

@item random_wait = on/off

3077

Turn random between-request wait times on or off. The same as

3078

@samp{--random-wait}.

3079

3080

@item read_timeout = @var{n}

3081

Set the read (and write) timeout---the same as

3082

@samp{--read-timeout=@var{n}}.

3083

3084

@item reclevel = @var{n}

3085

Recursion level (depth)---the same as @samp{-l @var{n}}.

3086

3087

@item recursive = on/off

3088

Recursive on/off---the same as @samp{-r}.

3089

3090

@item referer = @var{string}

3091

Set HTTP @samp{Referer:} header just like

3092

@samp{--referer=@var{string}}. (Note that it was the folks who wrote

3093

the @sc{http} spec who got the spelling of ``referrer'' wrong.)

3094

3095

@item relative_only = on/off

3096

Follow only relative links---the same as @samp{-L} (@pxref{Relative

3097

Links}).

3098

3099

@item remote_encoding = @var{encoding}

3100

Force Wget to use @var{encoding} as the default remote server encoding.

3101

See @samp{--remote-encoding}.

3102

3103

@item remove_listing = on/off

3104

If set to on, remove @sc{ftp} listings downloaded by Wget. Setting it

3105

to off is the same as @samp{--no-remove-listing}.

3106

3107

@item restrict_file_names = unix/windows

3108

Restrict the file names generated by Wget from URLs. See

3109

@samp{--restrict-file-names} for a more detailed description.

3110

3111

@item retr_symlinks = on/off

3112

When set to on, retrieve symbolic links as if they were plain files; the

3113

same as @samp{--retr-symlinks}.

3114

3115

@item retry_connrefused = on/off

3116

When set to on, consider ``connection refused'' a transient

3117

error---the same as @samp{--retry-connrefused}.

3118

3119

@item robots = on/off

3120

Specify whether the norobots convention is respected by Wget, ``on'' by

3121

default. This switch controls both the @file{/robots.txt} and the

3122

@samp{nofollow} aspect of the spec. @xref{Robot Exclusion}, for more

3123

details about this. Be sure you know what you are doing before turning

3124

this off.

3125

3126

@item save_cookies = @var{file}

3127

Save cookies to @var{file}. The same as @samp{--save-cookies

3128

@var{file}}.

3129

3130

@item save_headers = on/off

3131

Same as @samp{--save-headers}.

3132

3133

@item secure_protocol = @var{string}

3134

Choose the secure protocol to be used. Legal values are @samp{auto}

3135

(the default), @samp{SSLv2}, @samp{SSLv3}, and @samp{TLSv1}. The same

3136

as @samp{--secure-protocol=@var{string}}.

3137

3138

@item server_response = on/off

3139

Choose whether or not to print the @sc{http} and @sc{ftp} server

3140

responses---the same as @samp{-S}.

3141

3142

@item span_hosts = on/off

3143

Same as @samp{-H}.

3144

3145

@item spider = on/off

3146

Same as @samp{--spider}.

3147

3148

@item strict_comments = on/off

3149

Same as @samp{--strict-comments}.

3150

3151

@item timeout = @var{n}

3152

Set all applicable timeout values to @var{n}, the same as @samp{-T

3153

@var{n}}.

3154

3155

@item timestamping = on/off

3156

Turn timestamping on/off. The same as @samp{-N} (@pxref{Time-Stamping}).

3157

3158

@item tries = @var{n}

3159

Set number of retries per @sc{url}---the same as @samp{-t @var{n}}.

3160

3161

@item use_proxy = on/off

3162

When set to off, don't use proxy even when proxy-related environment

3163

variables are set. In that case it is the same as using

3164

@samp{--no-proxy}.

3165

3166

@item user = @var{string}

3167

Specify username @var{string} for both @sc{ftp} and @sc{http} file retrieval.

3168

This command can be overridden using the @samp{ftp_user} and

3169

@samp{http_user} command for @sc{ftp} and @sc{http} respectively.

3170

3171

@item user_agent = @var{string}

3172

User agent identification sent to the HTTP Server---the same as

3173

@samp{--user-agent=@var{string}}.

3174

3175

@item verbose = on/off

3176

Turn verbose on/off---the same as @samp{-v}/@samp{-nv}.

3177

3178

@item wait = @var{n}

3179

Wait @var{n} seconds between retrievals---the same as @samp{-w

3180

@var{n}}.

3181

3182

@item wait_retry = @var{n}

3183

Wait up to @var{n} seconds between retries of failed retrievals

3184

only---the same as @samp{--waitretry=@var{n}}. Note that this is

3185

turned on by default in the global @file{wgetrc}.

3186

@end table

3187

3188

@node Sample Wgetrc, , Wgetrc Commands, Startup File

3189

@section Sample Wgetrc

3190

@cindex sample wgetrc

3191

3192

This is the sample initialization file, as given in the distribution.

3193

It is divided in two section---one for global usage (suitable for global

3194

startup file), and one for local usage (suitable for

3195

@file{$HOME/.wgetrc}). Be careful about the things you change.

3196

3197

Note that almost all the lines are commented out. For a command to have

3198

any effect, you must remove the @samp{#} character at the beginning of

3199

its line.

3200

3201

@example

3202

@include sample.wgetrc.munged_for_texi_inclusion

3203

@end example

3204

3205

@node Examples, Various, Startup File, Top

3206

@chapter Examples

3207

@cindex examples

3208

3209

@c man begin EXAMPLES

3210

The examples are divided into three sections loosely based on their

3211

complexity.

3212

3213

@menu

3214

* Simple Usage:: Simple, basic usage of the program.

3215

* Advanced Usage:: Advanced tips.

3216

* Very Advanced Usage:: The hairy stuff.

3217

@end menu

3218

3219

@node Simple Usage, Advanced Usage, Examples, Examples

3220

@section Simple Usage

3221

3222

@itemize @bullet

3223

@item

3224

Say you want to download a @sc{url}. Just type:

3225

3226

@example

3227

wget http://fly.srk.fer.hr/

3228

@end example

3229

3230

@item

3231

But what will happen if the connection is slow, and the file is lengthy?

3232

The connection will probably fail before the whole file is retrieved,

3233

more than once. In this case, Wget will try getting the file until it

3234

either gets the whole of it, or exceeds the default number of retries

3235

(this being 20). It is easy to change the number of tries to 45, to

3236

insure that the whole file will arrive safely:

3237

3238

@example

3239

wget --tries=45 http://fly.srk.fer.hr/jpg/flyweb.jpg

3240

@end example

3241

3242

@item

3243

Now let's leave Wget to work in the background, and write its progress

3244

to log file @file{log}. It is tiring to type @samp{--tries}, so we

3245

shall use @samp{-t}.

3246

3247

@example

3248

wget -t 45 -o log http://fly.srk.fer.hr/jpg/flyweb.jpg &

3249

@end example

3250

3251

The ampersand at the end of the line makes sure that Wget works in the

3252

background. To unlimit the number of retries, use @samp{-t inf}.

3253

3254

@item

3255

The usage of @sc{ftp} is as simple. Wget will take care of login and

3256

password.

3257

3258

@example

3259

wget ftp://gnjilux.srk.fer.hr/welcome.msg

3260

@end example

3261

3262

@item

3263

If you specify a directory, Wget will retrieve the directory listing,

3264

parse it and convert it to @sc{html}. Try:

3265

3266

@example

3267

wget ftp://ftp.gnu.org/pub/gnu/

3268

links index.html

3269

@end example

3270

@end itemize

3271

3272

@node Advanced Usage, Very Advanced Usage, Simple Usage, Examples

3273

@section Advanced Usage

3274

3275

@itemize @bullet

3276

@item

3277

You have a file that contains the URLs you want to download? Use the

3278

@samp{-i} switch:

3279

3280

@example

3281

wget -i @var{file}

3282

@end example

3283

3284

If you specify @samp{-} as file name, the @sc{url}s will be read from

3285

standard input.

3286

3287

@item

3288

Create a five levels deep mirror image of the GNU web site, with the

3289

same directory structure the original has, with only one try per

3290

document, saving the log of the activities to @file{gnulog}:

3291

3292

@example

3293

wget -r http://www.gnu.org/ -o gnulog

3294

@end example

3295

3296

@item

3297

The same as the above, but convert the links in the downloaded files to

3298

point to local files, so you can view the documents off-line:

3299

3300

@example

3301

wget --convert-links -r http://www.gnu.org/ -o gnulog

3302

@end example

3303

3304

@item

3305

Retrieve only one @sc{html} page, but make sure that all the elements needed

3306

for the page to be displayed, such as inline images and external style

3307

sheets, are also downloaded. Also make sure the downloaded page

3308

references the downloaded links.

3309

3310

@example

3311

wget -p --convert-links http://www.server.com/dir/page.html

3312

@end example

3313

3314

The @sc{html} page will be saved to @file{www.server.com/dir/page.html}, and

3315

the images, stylesheets, etc., somewhere under @file{www.server.com/},

3316

depending on where they were on the remote server.

3317

3318

@item

3319

The same as the above, but without the @file{www.server.com/} directory.

3320

In fact, I don't want to have all those random server directories

3321

anyway---just save @emph{all} those files under a @file{download/}

3322

subdirectory of the current directory.

3323

3324

@example

3325

wget -p --convert-links -nH -nd -Pdownload \

3326

http://www.server.com/dir/page.html

3327

@end example

3328

3329

@item

3330

Retrieve the index.html of @samp{www.lycos.com}, showing the original

3331

server headers:

3332

3333

@example

3334

wget -S http://www.lycos.com/

3335

@end example

3336

3337

@item

3338

Save the server headers with the file, perhaps for post-processing.

3339

3340

@example

3341

wget --save-headers http://www.lycos.com/

3342

more index.html

3343

@end example

3344

3345

@item

3346

Retrieve the first two levels of @samp{wuarchive.wustl.edu}, saving them

3347

to @file{/tmp}.

3348

3349

@example

3350

wget -r -l2 -P/tmp ftp://wuarchive.wustl.edu/

3351

@end example

3352

3353

@item

3354

You want to download all the @sc{gif}s from a directory on an @sc{http}

3355

server. You tried @samp{wget http://www.server.com/dir/*.gif}, but that

3356

didn't work because @sc{http} retrieval does not support globbing. In

3357

that case, use:

3358

3359

@example

3360

wget -r -l1 --no-parent -A.gif http://www.server.com/dir/

3361

@end example

3362

3363

More verbose, but the effect is the same. @samp{-r -l1} means to

3364

retrieve recursively (@pxref{Recursive Download}), with maximum depth

3365

of 1. @samp{--no-parent} means that references to the parent directory

3366

are ignored (@pxref{Directory-Based Limits}), and @samp{-A.gif} means to

3367

download only the @sc{gif} files. @samp{-A "*.gif"} would have worked

3368

too.

3369

3370

@item

3371

Suppose you were in the middle of downloading, when Wget was

3372

interrupted. Now you do not want to clobber the files already present.

3373

It would be:

3374

3375

@example

3376

wget -nc -r http://www.gnu.org/

3377

@end example

3378

3379

@item

3380

If you want to encode your own username and password to @sc{http} or

3381

@sc{ftp}, use the appropriate @sc{url} syntax (@pxref{URL Format}).

3382

3383

@example

3384

wget ftp://hniksic:mypassword@@unix.server.com/.emacs

3385

@end example

3386

3387

Note, however, that this usage is not advisable on multi-user systems

3388

because it reveals your password to anyone who looks at the output of

3389

@code{ps}.

3390

3391

@cindex redirecting output

3392

@item

3393

You would like the output documents to go to standard output instead of

3394

to files?

3395

3396

@example

3397

wget -O - http://jagor.srce.hr/ http://www.srce.hr/

3398

@end example

3399

3400

You can also combine the two options and make pipelines to retrieve the

3401

documents from remote hotlists:

3402

3403

@example

3404

wget -O - http://cool.list.com/ | wget --force-html -i -

3405

@end example

3406

@end itemize

3407

3408

@node Very Advanced Usage, , Advanced Usage, Examples

3409

@section Very Advanced Usage

3410

3411

@cindex mirroring

3412

@itemize @bullet

3413

@item

3414

If you wish Wget to keep a mirror of a page (or @sc{ftp}

3415

subdirectories), use @samp{--mirror} (@samp{-m}), which is the shorthand

3416

for @samp{-r -l inf -N}. You can put Wget in the crontab file asking it

3417

to recheck a site each Sunday:

3418

3419

@example

3420

crontab

3421

0 0 * * 0 wget --mirror http://www.gnu.org/ -o /home/me/weeklog

3422

@end example

3423

3424

@item

3425

In addition to the above, you want the links to be converted for local

3426

viewing. But, after having read this manual, you know that link

3427

conversion doesn't play well with timestamping, so you also want Wget to

3428

back up the original @sc{html} files before the conversion. Wget invocation

3429

would look like this:

3430

3431

@example

3432

wget --mirror --convert-links --backup-converted \

3433

http://www.gnu.org/ -o /home/me/weeklog

3434

@end example

3435

3436

@item

3437

But you've also noticed that local viewing doesn't work all that well

3438

when @sc{html} files are saved under extensions other than @samp{.html},

3439

perhaps because they were served as @file{index.cgi}. So you'd like

3440

Wget to rename all the files served with content-type @samp{text/html}

3441

or @samp{application/xhtml+xml} to @file{@var{name}.html}.

3442

3443

@example

3444

wget --mirror --convert-links --backup-converted \

3445

--html-extension -o /home/me/weeklog \

3446

http://www.gnu.org/

3447

@end example

3448

3449

Or, with less typing:

3450

3451

@example

3452

wget -m -k -K -E http://www.gnu.org/ -o /home/me/weeklog

3453

@end example

3454

@end itemize

3455

@c man end

3456

3457

@node Various, Appendices, Examples, Top

3458

@chapter Various

3459

@cindex various

3460

3461

This chapter contains all the stuff that could not fit anywhere else.

3462

3463

@menu

3464

* Proxies:: Support for proxy servers.

3465

* Distribution:: Getting the latest version.

3466

* Web Site:: GNU Wget's presence on the World Wide Web.

3467

* Mailing Lists:: Wget mailing list for announcements and discussion.

3468

* Internet Relay Chat:: Wget's presence on IRC.

3469

* Reporting Bugs:: How and where to report bugs.

3470

* Portability:: The systems Wget works on.

3471

* Signals:: Signal-handling performed by Wget.

3472

@end menu

3473

3474

@node Proxies, Distribution, Various, Various

3475

@section Proxies

3476

@cindex proxies

3477

3478

@dfn{Proxies} are special-purpose @sc{http} servers designed to transfer

3479

data from remote servers to local clients. One typical use of proxies

3480

is lightening network load for users behind a slow connection. This is

3481

achieved by channeling all @sc{http} and @sc{ftp} requests through the

3482

proxy which caches the transferred data. When a cached resource is

3483

requested again, proxy will return the data from cache. Another use for

3484

proxies is for companies that separate (for security reasons) their

3485

internal networks from the rest of Internet. In order to obtain

3486

information from the Web, their users connect and retrieve remote data

3487

using an authorized proxy.

3488

3489

Wget supports proxies for both @sc{http} and @sc{ftp} retrievals. The

3490

standard way to specify proxy location, which Wget recognizes, is using

3491

the following environment variables:

3492

3493

@table @code

3494

@item http_proxy

3495

@itemx https_proxy

3496

If set, the @code{http_proxy} and @code{https_proxy} variables should

3497

contain the @sc{url}s of the proxies for @sc{http} and @sc{https}

3498

connections respectively.

3499

3500

@item ftp_proxy

3501

This variable should contain the @sc{url} of the proxy for @sc{ftp}

3502

connections. It is quite common that @code{http_proxy} and

3503

@code{ftp_proxy} are set to the same @sc{url}.

3504

3505

@item no_proxy

3506

This variable should contain a comma-separated list of domain extensions

3507

proxy should @emph{not} be used for. For instance, if the value of

3508

@code{no_proxy} is @samp{.mit.edu}, proxy will not be used to retrieve

3509

documents from MIT.

3510

@end table

3511

3512

In addition to the environment variables, proxy location and settings

3513

may be specified from within Wget itself.

3514

3515

@table @samp

3516

@itemx --no-proxy

3517

@itemx proxy = on/off

3518

This option and the corresponding command may be used to suppress the

3519

use of proxy, even if the appropriate environment variables are set.

3520

3521

@item http_proxy = @var{URL}

3522

@itemx https_proxy = @var{URL}

3523

@itemx ftp_proxy = @var{URL}

3524

@itemx no_proxy = @var{string}

3525

These startup file variables allow you to override the proxy settings

3526

specified by the environment.

3527

@end table

3528

3529

Some proxy servers require authorization to enable you to use them. The

3530

authorization consists of @dfn{username} and @dfn{password}, which must

3531

be sent by Wget. As with @sc{http} authorization, several

3532

authentication schemes exist. For proxy authorization only the

3533

@code{Basic} authentication scheme is currently implemented.

3534

3535

You may specify your username and password either through the proxy

3536

@sc{url} or through the command-line options. Assuming that the

3537

company's proxy is located at @samp{proxy.company.com} at port 8001, a

3538

proxy @sc{url} location containing authorization data might look like

3539

this:

3540

3541

@example

3542

http://hniksic:mypassword@@proxy.company.com:8001/

3543

@end example

3544

3545

Alternatively, you may use the @samp{proxy-user} and

3546

@samp{proxy-password} options, and the equivalent @file{.wgetrc}

3547

settings @code{proxy_user} and @code{proxy_password} to set the proxy

3548

username and password.

3549

3550

@node Distribution, Web Site, Proxies, Various

3551

@section Distribution

3552

@cindex latest version

3553

3554

Like all GNU utilities, the latest version of Wget can be found at the

3555

master GNU archive site ftp.gnu.org, and its mirrors. For example,

3556

Wget @value{VERSION} can be found at

3557

@url{ftp://ftp.gnu.org/pub/gnu/wget/wget-@value{VERSION}.tar.gz}

3558

3559

@node Web Site, Mailing Lists, Distribution, Various

3560

@section Web Site

3561

@cindex web site

3562

3563

The official web site for GNU Wget is at

3564

@url{http://www.gnu.org/software/wget/}. However, most useful

3565

information resides at ``The Wget Wgiki'',

3566

@url{http://wget.addictivecode.org/}.

3567

3568

@node Mailing Lists, Internet Relay Chat, Web Site, Various

3569

@section Mailing Lists

3570

@cindex mailing list

3571

@cindex list

3572

3573

@unnumberedsubsec Primary List

3574

3575

The primary mailinglist for discussion, bug-reports, or questions

3576

about GNU Wget is at @email{bug-wget@@gnu.org}. To subscribe, send an

3577

email to @email{bug-wget-join@@gnu.org}, or visit

3578

@url{http://lists.gnu.org/mailman/listinfo/bug-wget}.

3579

3580

You do not need to subscribe to send a message to the list; however,

3581

please note that unsubscribed messages are moderated, and may take a

3582

while before they hit the list---@strong{usually around a day}. If

3583

you want your message to show up immediately, please subscribe to the

3584

list before posting. Archives for the list may be found at

3585

@url{http://lists.gnu.org/pipermail/bug-wget/}.

3586

3587

An NNTP/Usenettish gateway is also available via

3588

@uref{http://gmane.org/about.php,Gmane}. You can see the Gmane

3589

archives at

3590

@url{http://news.gmane.org/gmane.comp.web.wget.general}. Note that the

3591

Gmane archives conveniently include messages from both the current

3592

list, and the previous one. Messages also show up in the Gmane

3593

archives sooner than they do at @url{lists.gnu.org}.

3594

3595

@unnumberedsubsec Bug Notices List

3596

3597

Additionally, there is the @email{wget-notify@@addictivecode.org} mailing

3598

list. This is a non-discussion list that receives bug report

3599

notifications from the bug-tracker. To subscribe to this list,

3600

send an email to @email{wget-notify-join@@addictivecode.org},

3601

or visit @url{http://addictivecode.org/mailman/listinfo/wget-notify}.

3602

3603

@unnumberedsubsec Obsolete Lists

3604

3605

Previously, the mailing list @email{wget@@sunsite.dk} was used as the

3606

main discussion list, and another list,

3607

@email{wget-patches@@sunsite.dk} was used for submitting and

3608

discussing patches to GNU Wget.

3609

3610

Messages from @email{wget@@sunsite.dk} are archived at

3611

@itemize @tie{}

3612

@item

3613

@url{http://www.mail-archive.com/wget%40sunsite.dk/} and at

3614

@item

3615

@url{http://news.gmane.org/gmane.comp.web.wget.general} (which also

3616

continues to archive the current list, @email{bug-wget@@gnu.org}).

3617

@end itemize

3618

3619

Messages from @email{wget-patches@@sunsite.dk} are archived at

3620

@itemize @tie{}

3621

@item

3622

@url{http://news.gmane.org/gmane.comp.web.wget.patches}.

3623

@end itemize

3624

3625

@node Internet Relay Chat, Reporting Bugs, Mailing Lists, Various

3626

@section Internet Relay Chat

3627

@cindex Internet Relay Chat

3628

@cindex IRC

3629

@cindex #wget

3630

3631

In addition to the mailinglists, we also have a support channel set up

3632

via IRC at @code{irc.freenode.org}, @code{#wget}. Come check it out!

3633

3634

@node Reporting Bugs, Portability, Internet Relay Chat, Various

3635

@section Reporting Bugs

3636

@cindex bugs

3637

@cindex reporting bugs

3638

@cindex bug reports

3639

3640

@c man begin BUGS

3641

You are welcome to submit bug reports via the GNU Wget bug tracker (see

3642

@url{http://wget.addictivecode.org/BugTracker}).

3643

3644

Before actually submitting a bug report, please try to follow a few

3645

simple guidelines.

3646

3647

@enumerate

3648

@item

3649

Please try to ascertain that the behavior you see really is a bug. If

3650

Wget crashes, it's a bug. If Wget does not behave as documented,

3651

it's a bug. If things work strange, but you are not sure about the way

3652

they are supposed to work, it might well be a bug, but you might want to

3653

double-check the documentation and the mailing lists (@pxref{Mailing

3654

Lists}).

3655

3656

@item

3657

Try to repeat the bug in as simple circumstances as possible. E.g. if

3658

Wget crashes while downloading @samp{wget -rl0 -kKE -t5 --no-proxy

3659

http://yoyodyne.com -o /tmp/log}, you should try to see if the crash is

3660

repeatable, and if will occur with a simpler set of options. You might

3661

even try to start the download at the page where the crash occurred to

3662

see if that page somehow triggered the crash.

3663

3664

Also, while I will probably be interested to know the contents of your

3665

@file{.wgetrc} file, just dumping it into the debug message is probably

3666

a bad idea. Instead, you should first try to see if the bug repeats

3667

with @file{.wgetrc} moved out of the way. Only if it turns out that

3668

@file{.wgetrc} settings affect the bug, mail me the relevant parts of

3669

the file.

3670

3671

@item

3672

Please start Wget with @samp{-d} option and send us the resulting

3673

output (or relevant parts thereof). If Wget was compiled without

3674

debug support, recompile it---it is @emph{much} easier to trace bugs

3675

with debug support on.

3676

3677

Note: please make sure to remove any potentially sensitive information

3678

from the debug log before sending it to the bug address. The

3679

@code{-d} won't go out of its way to collect sensitive information,

3680

but the log @emph{will} contain a fairly complete transcript of Wget's

3681

communication with the server, which may include passwords and pieces

3682

of downloaded data. Since the bug address is publically archived, you

3683

may assume that all bug reports are visible to the public.

3684

3685

@item

3686

If Wget has crashed, try to run it in a debugger, e.g. @code{gdb `which

3687

wget` core} and type @code{where} to get the backtrace. This may not

3688

work if the system administrator has disabled core files, but it is

3689

safe to try.

3690

@end enumerate

3691

@c man end

3692

3693

@node Portability, Signals, Reporting Bugs, Various

3694

@section Portability

3695

@cindex portability

3696

@cindex operating systems

3697

3698

Like all GNU software, Wget works on the GNU system. However, since it

3699

uses GNU Autoconf for building and configuring, and mostly avoids using

3700

``special'' features of any particular Unix, it should compile (and

3701

work) on all common Unix flavors.

3702

3703

Various Wget versions have been compiled and tested under many kinds of

3704

Unix systems, including GNU/Linux, Solaris, SunOS 4.x, Mac OS X, OSF

3705

(aka Digital Unix or Tru64), Ultrix, *BSD, IRIX, AIX, and others. Some

3706

of those systems are no longer in widespread use and may not be able to

3707

support recent versions of Wget. If Wget fails to compile on your

3708

system, we would like to know about it.

3709

3710

Thanks to kind contributors, this version of Wget compiles and works

3711

on 32-bit Microsoft Windows platforms. It has been compiled

3712

successfully using MS Visual C++ 6.0, Watcom, Borland C, and GCC

3713

compilers. Naturally, it is crippled of some features available on

3714

Unix, but it should work as a substitute for people stuck with

3715

Windows. Note that Windows-specific portions of Wget are not

3716

guaranteed to be supported in the future, although this has been the

3717

case in practice for many years now. All questions and problems in

3718

Windows usage should be reported to Wget mailing list at

3719

@email{wget@@sunsite.dk} where the volunteers who maintain the

3720

Windows-related features might look at them.

3721

3722

Support for building on MS-DOS via DJGPP has been contributed by Gisle

3723

Vanem; a port to VMS is maintained by Steven Schweda, and is available

3724

at @url{http://antinode.org/}.

3725

3726

@node Signals, , Portability, Various

3727

@section Signals

3728

@cindex signal handling

3729

@cindex hangup

3730

3731

Since the purpose of Wget is background work, it catches the hangup

3732

signal (@code{SIGHUP}) and ignores it. If the output was on standard

3733

output, it will be redirected to a file named @file{wget-log}.

3734

Otherwise, @code{SIGHUP} is ignored. This is convenient when you wish

3735

to redirect the output of Wget after having started it.

3736

3737

@example

3738

$ wget http://www.gnus.org/dist/gnus.tar.gz &

3739

...

3740

$ kill -HUP %%

3741

SIGHUP received, redirecting output to `wget-log'.

3742

@end example

3743

3744

Other than that, Wget will not try to interfere with signals in any way.

3745

@kbd{C-c}, @code{kill -TERM} and @code{kill -KILL} should kill it alike.

3746

3747

@node Appendices, Copying this manual, Various, Top

3748

@chapter Appendices

3749

3750

This chapter contains some references I consider useful.

3751

3752

@menu

3753

* Robot Exclusion:: Wget's support for RES.

3754

* Security Considerations:: Security with Wget.

3755

* Contributors:: People who helped.

3756

@end menu

3757

3758

@node Robot Exclusion, Security Considerations, Appendices, Appendices

3759

@section Robot Exclusion

3760

@cindex robot exclusion

3761

@cindex robots.txt

3762

@cindex server maintenance

3763

3764

It is extremely easy to make Wget wander aimlessly around a web site,

3765

sucking all the available data in progress. @samp{wget -r @var{site}},

3766

and you're set. Great? Not for the server admin.

3767

3768

As long as Wget is only retrieving static pages, and doing it at a

3769

reasonable rate (see the @samp{--wait} option), there's not much of a

3770

problem. The trouble is that Wget can't tell the difference between the

3771

smallest static page and the most demanding CGI. A site I know has a

3772

section handled by a CGI Perl script that converts Info files to @sc{html} on

3773

the fly. The script is slow, but works well enough for human users

3774

viewing an occasional Info file. However, when someone's recursive Wget

3775

download stumbles upon the index page that links to all the Info files

3776

through the script, the system is brought to its knees without providing

3777

anything useful to the user (This task of converting Info files could be

3778

done locally and access to Info documentation for all installed GNU

3779

software on a system is available from the @code{info} command).

3780

3781

To avoid this kind of accident, as well as to preserve privacy for

3782

documents that need to be protected from well-behaved robots, the

3783

concept of @dfn{robot exclusion} was invented. The idea is that

3784

the server administrators and document authors can specify which

3785

portions of the site they wish to protect from robots and those

3786

they will permit access.

3787

3788

The most popular mechanism, and the @i{de facto} standard supported by

3789

all the major robots, is the ``Robots Exclusion Standard'' (RES) written

3790

by Martijn Koster et al. in 1994. It specifies the format of a text

3791

file containing directives that instruct the robots which URL paths to

3792

avoid. To be found by the robots, the specifications must be placed in

3793

@file{/robots.txt} in the server root, which the robots are expected to

3794

download and parse.

3795

3796

Although Wget is not a web robot in the strictest sense of the word, it

3797

can download large parts of the site without the user's intervention to

3798

download an individual page. Because of that, Wget honors RES when

3799

downloading recursively. For instance, when you issue:

3800

3801

@example

3802

wget -r http://www.server.com/

3803

@end example

3804

3805

First the index of @samp{www.server.com} will be downloaded. If Wget

3806

finds that it wants to download more documents from that server, it will

3807

request @samp{http://www.server.com/robots.txt} and, if found, use it

3808

for further downloads. @file{robots.txt} is loaded only once per each

3809

server.

3810

3811

Until version 1.8, Wget supported the first version of the standard,

3812

written by Martijn Koster in 1994 and available at

3813

@url{http://www.robotstxt.org/wc/norobots.html}. As of version 1.8,

3814

Wget has supported the additional directives specified in the internet

3815

draft @samp{<draft-koster-robots-00.txt>} titled ``A Method for Web

3816

Robots Control''. The draft, which has as far as I know never made to

3817

an @sc{rfc}, is available at

3818

@url{http://www.robotstxt.org/wc/norobots-rfc.txt}.

3819

3820

This manual no longer includes the text of the Robot Exclusion Standard.

3821

3822

The second, less known mechanism, enables the author of an individual

3823

document to specify whether they want the links from the file to be

3824

followed by a robot. This is achieved using the @code{META} tag, like

3825

this:

3826

3827

@example

3828

3829

@end example

3830

3831

This is explained in some detail at

3832

@url{http://www.robotstxt.org/wc/meta-user.html}. Wget supports this

3833

method of robot exclusion in addition to the usual @file{/robots.txt}

3834

exclusion.

3835

3836

If you know what you are doing and really really wish to turn off the

3837

robot exclusion, set the @code{robots} variable to @samp{off} in your

3838

@file{.wgetrc}. You can achieve the same effect from the command line

3839

using the @code{-e} switch, e.g. @samp{wget -e robots=off @var{url}...}.

3840

3841

@node Security Considerations, Contributors, Robot Exclusion, Appendices

3842

@section Security Considerations

3843

@cindex security

3844

3845

When using Wget, you must be aware that it sends unencrypted passwords

3846

through the network, which may present a security problem. Here are the

3847

main issues, and some solutions.

3848

3849

@enumerate

3850

@item

3851

The passwords on the command line are visible using @code{ps}. The best

3852

way around it is to use @code{wget -i -} and feed the @sc{url}s to

3853

Wget's standard input, each on a separate line, terminated by @kbd{C-d}.

3854

Another workaround is to use @file{.netrc} to store passwords; however,

3855

storing unencrypted passwords is also considered a security risk.

3856

3857

@item

3858

Using the insecure @dfn{basic} authentication scheme, unencrypted

3859

passwords are transmitted through the network routers and gateways.

3860

3861

@item

3862

The @sc{ftp} passwords are also in no way encrypted. There is no good

3863

solution for this at the moment.

3864

3865

@item

3866

Although the ``normal'' output of Wget tries to hide the passwords,

3867

debugging logs show them, in all forms. This problem is avoided by

3868

being careful when you send debug logs (yes, even when you send them to

3869

me).

3870

@end enumerate

3871

3872

@node Contributors, , Security Considerations, Appendices

3873

@section Contributors

3874

@cindex contributors

3875

3876

@iftex

3877

GNU Wget was written by Hrvoje Nik@v{s}i@'{c} @email{hniksic@@xemacs.org},

3878

@end iftex

3879

@ifnottex

3880

GNU Wget was written by Hrvoje Niksic @email{hniksic@@xemacs.org},

3881

@end ifnottex

3882

and it is currently maintained by Micah Cowan @email{micah@@cowan.name}.

3883

3884

However, the development of Wget could never have gone as far as it has, were

3885

it not for the help of many people, either with bug reports, feature proposals,

3886

patches, or letters saying ``Thanks!''.

3887

3888

Special thanks goes to the following people (no particular order):

3889

3890

@itemize @bullet

3891

@item Dan Harkless---contributed a lot of code and documentation of

3892

extremely high quality, as well as the @code{--page-requisites} and

3893

related options. He was the principal maintainer for some time and

3894

released Wget 1.6.

3895

3896

@item Ian Abbott---contributed bug fixes, Windows-related fixes, and

3897

provided a prototype implementation of the breadth-first recursive

3898

download. Co-maintained Wget during the 1.8 release cycle.

3899

3900

@item

3901

The dotsrc.org crew, in particular Karsten Thygesen---donated system

3902

resources such as the mailing list, web space, @sc{ftp} space, and

3903

version control repositories, along with a lot of time to make these

3904

actually work. Christian Reiniger was of invaluable help with setting

3905

up Subversion.

3906

3907

@item

3908

Heiko Herold---provided high-quality Windows builds and contributed

3909

bug and build reports for many years.

3910

3911

@item

3912

Shawn McHorse---bug reports and patches.

3913

3914

@item

3915

Kaveh R. Ghazi---on-the-fly @code{ansi2knr}-ization. Lots of

3916

portability fixes.

3917

3918

@item

3919

Gordon Matzigkeit---@file{.netrc} support.

3920

3921

@item

3922

@iftex

3923

Zlatko @v{C}alu@v{s}i@'{c}, Tomislav Vujec and Dra@v{z}en

3924

Ka@v{c}ar---feature suggestions and ``philosophical'' discussions.

3925

@end iftex

3926

@ifnottex

3927

Zlatko Calusic, Tomislav Vujec and Drazen Kacar---feature suggestions

3928

and ``philosophical'' discussions.

3929

@end ifnottex

3930

3931

@item

3932

Darko Budor---initial port to Windows.

3933

3934

@item

3935

Antonio Rosella---help and suggestions, plus the initial Italian

3936

translation.

3937

3938

@item

3939

@iftex

3940

Tomislav Petrovi@'{c}, Mario Miko@v{c}evi@'{c}---many bug reports and

3941

suggestions.

3942

@end iftex

3943

@ifnottex

3944

Tomislav Petrovic, Mario Mikocevic---many bug reports and suggestions.

3945

@end ifnottex

3946

3947

@item

3948

@iftex

3949

Fran@,{c}ois Pinard---many thorough bug reports and discussions.

3950

@end iftex

3951

@ifnottex

3952

Francois Pinard---many thorough bug reports and discussions.

3953

@end ifnottex

3954

3955

@item

3956

Karl Eichwalder---lots of help with internationalization, Makefile

3957

layout and many other things.

3958

3959

@item

3960

Junio Hamano---donated support for Opie and @sc{http} @code{Digest}

3961

authentication.

3962

3963

@item

3964

Mauro Tortonesi---improved IPv6 support, adding support for dual

3965

family systems. Refactored and enhanced FTP IPv6 code. Maintained GNU

3966

Wget from 2004--2007.

3967

3968

@item

3969

Christopher G.@: Lewis---maintenance of the Windows version of GNU WGet.

3970

3971

@item

3972

Gisle Vanem---many helpful patches and improvements, especially for

3973

Windows and MS-DOS support.

3974

3975

@item

3976

Ralf Wildenhues---contributed patches to convert Wget to use Automake as

3977

part of its build process, and various bugfixes.

3978

3979

@item

3980

Steven Schubiger---Many helpful patches, bugfixes and improvements.

3981

Notably, conversion of Wget to use the Gnulib quotes and quoteargs

3982

modules, and the addition of password prompts at the console, via the

3983

Gnulib getpasswd-gnu module.

3984

3985

@item

3986

Ted Mielczarek---donated support for CSS.

3987

3988

@item

3989

Saint Xavier---Support for IRIs (RFC 3987).

3990

3991

@item

3992

People who provided donations for development---including Brian Gough.

3993

@end itemize

3994

3995

The following people have provided patches, bug/build reports, useful

3996

suggestions, beta testing services, fan mail and all the other things

3997

that make maintenance so much fun:

3998

3999

Tim Adam,

4000

Adrian Aichner,

4001

Martin Baehr,

4002

Dieter Baron,

4003

Roger Beeman,

4004

Dan Berger,

4005

T.@: Bharath,

4006

Christian Biere,

4007

Paul Bludov,

4008

Daniel Bodea,

4009

Mark Boyns,

4010

John Burden,

4011

Julien Buty,

4012

Wanderlei Cavassin,

4013

Gilles Cedoc,

4014

Tim Charron,

4015

Noel Cragg,

4016

@iftex

4017

Kristijan @v{C}onka@v{s},

4018

@end iftex

4019

@ifnottex

4020

Kristijan Conkas,

4021

@end ifnottex

4022

John Daily,

4023

Andreas Damm,

4024

Ahmon Dancy,

4025

Andrew Davison,

4026

Bertrand Demiddelaer,

4027

Alexander Dergachev,

4028

Andrew Deryabin,

4029

Ulrich Drepper,

4030

Marc Duponcheel,

4031

@iftex

4032

Damir D@v{z}eko,

4033

@end iftex

4034

@ifnottex

4035

Damir Dzeko,

4036

@end ifnottex

4037

Alan Eldridge,

4038

Hans-Andreas Engel,

4039

@iftex

4040

Aleksandar Erkalovi@'{c},

4041

@end iftex

4042

@ifnottex

4043

Aleksandar Erkalovic,

4044

@end ifnottex

4045

Andy Eskilsson,

4046

@iftex

4047

Jo@~{a}o Ferreira,

4048

@end iftex

4049

@ifnottex

4050

Joao Ferreira,

4051

@end ifnottex

4052

Christian Fraenkel,

4053

David Fritz,

4054

Mike Frysinger,

4055

Charles C.@: Fu,

4056

FUJISHIMA Satsuki,

4057

Masashi Fujita,

4058

Howard Gayle,

4059

Marcel Gerrits,

4060

Lemble Gregory,

4061

Hans Grobler,

4062

Alain Guibert,

4063

Mathieu Guillaume,

4064

Aaron Hawley,

4065

Jochen Hein,

4066

Karl Heuer,

4067

Madhusudan Hosaagrahara,

4068

HIROSE Masaaki,

4069

Ulf Harnhammar,

4070

Gregor Hoffleit,

4071

Erik Magnus Hulthen,

4072

Richard Huveneers,

4073

Jonas Jensen,

4074

Larry Jones,

4075

Simon Josefsson,

4076

@iftex

4077

Mario Juri@'{c},

4078

@end iftex

4079

@ifnottex

4080

Mario Juric,

4081

@end ifnottex

4082

@iftex

4083

Hack Kampbj@o rn,

4084

@end iftex

4085

@ifnottex

4086

Hack Kampbjorn,

4087

@end ifnottex

4088

Const Kaplinsky,

4089

@iftex

4090

Goran Kezunovi@'{c},

4091

@end iftex

4092

@ifnottex

4093

Goran Kezunovic,

4094

@end ifnottex

4095

Igor Khristophorov,

4096

Robert Kleine,

4097

KOJIMA Haime,

4098

Fila Kolodny,

4099

Alexander Kourakos,

4100

Martin Kraemer,

4101

Sami Krank,

4102

Jay Krell,

4103

@tex

4104

$\Sigma\acute{\iota}\mu o\varsigma\;

4105

\Xi\varepsilon\nu\iota\tau\acute{\epsilon}\lambda\lambda\eta\varsigma$

4106

(Simos KSenitellis),

4107

@end tex

4108

@ifnottex

4109

Simos KSenitellis,

4110

@end ifnottex

4111

Christian Lackas,

4112

Hrvoje Lacko,

4113

Daniel S.@: Lewart,

4114

@iftex

4115

Nicol@'{a}s Lichtmeier,

4116

@end iftex

4117

@ifnottex

4118

Nicolas Lichtmeier,

4119

@end ifnottex

4120

Dave Love,

4121

Alexander V.@: Lukyanov,

4122

@iftex

4123

Thomas Lu@ss{}nig,

4124

@end iftex

4125

@ifnottex

4126

Thomas Lussnig,

4127

@end ifnottex

4128

Andre Majorel,

4129

Aurelien Marchand,

4130

Matthew J.@: Mellon,

4131

Jordan Mendelson,

4132

Ted Mielczarek,

4133

Robert Millan,

4134

Lin Zhe Min,

4135

Jan Minar,

4136

Tim Mooney,

4137

Keith Moore,

4138

Adam D.@: Moss,

4139

Simon Munton,

4140

Charlie Negyesi,

4141

R.@: K.@: Owen,

4142

Jim Paris,

4143

Kenny Parnell,

4144

Leonid Petrov,

4145

Simone Piunno,

4146

Andrew Pollock,

4147

Steve Pothier,

4148

@iftex

4149

Jan P@v{r}ikryl,

4150

@end iftex

4151

@ifnottex

4152

Jan Prikryl,

4153

@end ifnottex

4154

Marin Purgar,

4155

@iftex

4156

Csaba R@'{a}duly,

4157

@end iftex

4158

@ifnottex

4159

Csaba Raduly,

4160

@end ifnottex

4161

Keith Refson,

4162

Bill Richardson,

4163

Tyler Riddle,

4164

Tobias Ringstrom,

4165

Jochen Roderburg,

4166

@c Texinfo doesn't grok @'{@i}, so we have to use TeX itself.

4167

@tex

4168

Juan Jos\'{e} Rodr\'{\i}guez,

4169

@end tex

4170

@ifnottex

4171

Juan Jose Rodriguez,

4172

@end ifnottex

4173

Maciej W.@: Rozycki,

4174

Edward J.@: Sabol,

4175

Heinz Salzmann,

4176

Robert Schmidt,

4177

Nicolas Schodet,

4178

Benno Schulenberg,

4179

Andreas Schwab,

4180

Steven M.@: Schweda,

4181

Chris Seawood,

4182

Pranab Shenoy,

4183

Dennis Smit,

4184

Toomas Soome,

4185

Tage Stabell-Kulo,

4186

Philip Stadermann,

4187

Daniel Stenberg,

4188

Sven Sternberger,

4189

Markus Strasser,

4190

John Summerfield,

4191

Szakacsits Szabolcs,

4192

Mike Thomas,

4193

Philipp Thomas,

4194

Mauro Tortonesi,

4195

Dave Turner,

4196

Gisle Vanem,

4197

Rabin Vincent,

4198

Russell Vincent,

4199

@iftex

4200

@v{Z}eljko Vrba,

4201

@end iftex

4202

@ifnottex

4203

Zeljko Vrba,

4204

@end ifnottex

4205

Charles G Waldman,

4206

Douglas E.@: Wegscheid,

4207

Ralf Wildenhues,

4208

Joshua David Williams,

4209

Benjamin Wolsey,

4210

Saint Xavier,

4211

YAMAZAKI Makoto,

4212

Jasmin Zainul,

4213

@iftex

4214

Bojan @v{Z}drnja,

4215

@end iftex

4216

@ifnottex

4217

Bojan Zdrnja,

4218

@end ifnottex

4219

Kristijan Zimmer,

4220

Xin Zou.

4221

4222

Apologies to all who I accidentally left out, and many thanks to all the

4223

subscribers of the Wget mailing list.

4224

4225

@node Copying this manual, Concept Index, Appendices, Top

4226

@appendix Copying this manual

4227

4228

@menu

4229

* GNU Free Documentation License:: Licnse for copying this manual.

4230

@end menu

4231

4232

@node GNU Free Documentation License, , Copying this manual, Copying this manual

4233

@appendixsec GNU Free Documentation License

4234

@cindex FDL, GNU Free Documentation License

4235

4236

@include fdl.texi

4237

4238

4239

@node Concept Index, , Copying this manual, Top

4240

@unnumbered Concept Index

4241

@printindex cp

4242

4243

@contents

4244

4245

@bye

Older »