~ubuntu-branches/ubuntu/precise/wget/precise-proposed

Viewing changes to .pc/disable-SSLv2/doc/wget.texi

Committer: Bazaar Package Importer
Author(s): Steve Langasek
Date: 2011-10-19 00:00:09 UTC
mfrom: (2.1.13 sid)
Revision ID: james.westby@ubuntu.com-20111019000009-8p33w3wz4b1rdri0

Tags: 1.13-1ubuntu1

* Merge from Debian unstable, remaining changes:
  - Add wget-udeb to ship wget.gnu as alternative to busybox wget
    implementation.
  - Depend on libssl-dev 0.9.8k-7ubuntu4 (LP: #503339)
* Dropped changes, superseded in Debian:
  - Keep build dependencies in main:
    + debian/control: remove info2man build-dep
    + debian/patches/series: disable wget-infopod_generated_manpage
  - Mark wget Multi-Arch: foreign, so packages that aren't of the same arch
    can depend on it.
* Pass --with-ssl=openssl; we don't want to use gnutls, there's no udeb for
  it.
* Add a second build pass for the udeb, so we can build without libidn.

files added:
.pc/debian-changes-1.13-1

.pc/debian-changes-1.13-1/po

.pc/debian-changes-1.13-1/po/de.po

.tarball-version

.version

build-aux/snippet

build-aux/snippet/_Noreturn.h

build-aux/snippet/arg-nonnull.h

build-aux/snippet/c++defs.h

build-aux/snippet/warn-on-use.h

debian/patches/debian-changes-1.13-1

lib/accept.c

lib/alignof.h

lib/arpa_inet.in.h

lib/asnprintf.c

lib/asprintf.c

lib/basename-lgpl.c

lib/binary-io.h

lib/bind.c

lib/cloexec.c

lib/cloexec.h

lib/close.c

lib/connect.c

lib/dirname-lgpl.c

lib/dirname.h

lib/dosname.h

lib/dup-safer-flag.c

lib/dup-safer.c

lib/dup2.c

lib/fatal-signal.c

lib/fatal-signal.h

lib/fcntl.c

lib/fcntl.in.h

lib/fd-hook.c

lib/fd-hook.h

lib/fd-safer-flag.c

lib/fd-safer.c

lib/float+.h

lib/float.c

lib/float.in.h

lib/fseek.c

lib/futimens.c

lib/gai_strerror.c

lib/getaddrinfo.c

lib/getdtablesize.c

lib/getpeername.c

lib/getsockname.c

lib/gettime.c

lib/gettimeofday.c

lib/glthread

lib/glthread/lock.c

lib/glthread/lock.h

lib/glthread/threadlib.c

lib/gnulib.mk

lib/iconv.in.h

lib/inet_ntop.c

lib/ioctl.c

lib/listen.c

lib/lstat.c

lib/malloc.c

lib/mbtowc-impl.h

lib/mbtowc.c

lib/md5.c

lib/md5.h

lib/mkdir.c

lib/netdb.in.h

lib/netinet_in.in.h

lib/open.c

lib/pipe-safer.c

lib/pipe.h

lib/pipe2-safer.c

lib/pipe2.c

lib/printf-args.c

lib/printf-args.h

lib/printf-parse.c

lib/printf-parse.h

lib/rawmemchr.c

lib/rawmemchr.valgrind

lib/recv.c

lib/sched.in.h

lib/select.c

lib/send.c

lib/setsockopt.c

lib/sig-handler.h

lib/sigaction.c

lib/signal.in.h

lib/sigprocmask.c

lib/size_max.h

lib/snprintf.c

lib/socket.c

lib/sockets.c

lib/sockets.h

lib/spawn-pipe.c

lib/spawn-pipe.h

lib/spawn.in.h

lib/spawn_faction_addclose.c

lib/spawn_faction_adddup2.c

lib/spawn_faction_addopen.c

lib/spawn_faction_destroy.c

lib/spawn_faction_init.c

lib/spawn_int.h

lib/spawnattr_destroy.c

lib/spawnattr_init.c

lib/spawnattr_setflags.c

lib/spawnattr_setsigmask.c

lib/spawni.c

lib/spawnp.c

lib/stat-time.h

lib/stat.c

lib/strchrnul.c

lib/strchrnul.valgrind

lib/strerror-override.c

lib/strerror-override.h

lib/strerror_r.c

lib/stripslash.c

lib/sys_ioctl.in.h

lib/sys_select.in.h

lib/sys_socket.in.h

lib/sys_stat.in.h

lib/sys_time.in.h

lib/sys_uio.in.h

lib/sys_wait.in.h

lib/time.in.h

lib/timespec.h

lib/unistd--.h

lib/unistd-safer.h

lib/unlocked-io.h

lib/utimens.c

lib/utimens.h

lib/vasnprintf.c

lib/vasnprintf.h

lib/vasprintf.c

lib/w32sock.h

lib/w32spawn.h

lib/wait-process.c

lib/wait-process.h

lib/waitpid.c

lib/write.c

lib/xalloc-oversized.h

lib/xsize.h

m4/arpa_inet_h.m4

m4/asm-underscore.m4

m4/clock_time.m4

m4/close.m4

m4/configmake.m4

m4/dirname.m4

m4/double-slash-root.m4

m4/dup2.m4

m4/environ.m4

m4/fatal-signal.m4

m4/fcntl-o.m4

m4/fcntl.m4

m4/fcntl_h.m4

m4/float_h.m4

m4/fseek.m4

m4/futimens.m4

m4/getaddrinfo.m4

m4/getdtablesize.m4

m4/gettime.m4

m4/gettimeofday.m4

m4/hostent.m4

m4/iconv_h.m4

m4/inet_ntop.m4

m4/intlmacosx.m4

m4/intmax_t.m4

m4/inttypes_h.m4

m4/ioctl.m4

m4/largefile.m4

m4/lock.m4

m4/lstat.m4

m4/mbtowc.m4

m4/md5.m4

m4/mkdir.m4

m4/mode_t.m4

m4/netdb_h.m4

m4/netinet_in_h.m4

m4/nocrash.m4

m4/open.m4

m4/pipe2.m4

m4/posix_spawn.m4

m4/printf.m4

m4/rawmemchr.m4

m4/sched_h.m4

m4/select.m4

m4/servent.m4

m4/sig_atomic_t.m4

m4/sigaction.m4

m4/signal_h.m4

m4/signalblocking.m4

m4/sigpipe.m4

m4/size_max.m4

m4/snprintf.m4

m4/socketlib.m4

m4/sockets.m4

m4/socklen.m4

m4/sockpfaf.m4

m4/spawn-pipe.m4

m4/spawn_h.m4

m4/stat-time.m4

m4/stat.m4

m4/stdint_h.m4

m4/strchrnul.m4

m4/strerror_r.m4

m4/sys_ioctl_h.m4

m4/sys_select_h.m4

m4/sys_socket_h.m4

m4/sys_stat_h.m4

m4/sys_time_h.m4

m4/sys_uio_h.m4

m4/sys_wait_h.m4

m4/threadlib.m4

m4/time_h.m4

m4/timespec.m4

m4/unistd-safer.m4

m4/unlocked-io.m4

m4/utimbuf.m4

m4/utimens.m4

m4/utimes.m4

m4/vasnprintf.m4

m4/vasprintf.m4

m4/wait-process.m4

m4/waitpid.m4

m4/warn-on-use.m4

m4/wchar_h.m4

m4/wctype_h.m4

m4/write.m4

m4/xsize.m4

po/LINGUAS

tests/Test-auth-retcode.px

tests/Test-i-ftp.px

tests/Test-i-http.px

tests/Test-idn-cmd-utf8.px

tests/Test-idn-robots-utf8.px

files removed:
.pc/CVE-2010-2252

.pc/CVE-2010-2252/doc

.pc/CVE-2010-2252/doc/wget.texi

.pc/CVE-2010-2252/src

.pc/CVE-2010-2252/src/http.c

.pc/CVE-2010-2252/src/http.h

.pc/CVE-2010-2252/src/init.c

.pc/CVE-2010-2252/src/main.c

.pc/CVE-2010-2252/src/options.h

.pc/CVE-2010-2252/src/retr.c

.pc/disable-SSLv2

.pc/disable-SSLv2/doc

.pc/disable-SSLv2/doc/wget.texi

.pc/disable-SSLv2/po

.pc/disable-SSLv2/po/ca.po

.pc/disable-SSLv2/po/cs.po

.pc/disable-SSLv2/po/de.po

.pc/disable-SSLv2/po/es.po

.pc/disable-SSLv2/po/et.po

.pc/disable-SSLv2/po/fi.po

.pc/disable-SSLv2/po/fr.po

.pc/disable-SSLv2/po/ga.po

.pc/disable-SSLv2/po/hr.po

.pc/disable-SSLv2/po/hu.po

.pc/disable-SSLv2/po/id.po

.pc/disable-SSLv2/po/it.po

.pc/disable-SSLv2/po/ja.po

.pc/disable-SSLv2/po/lt.po

.pc/disable-SSLv2/po/nl.po

.pc/disable-SSLv2/po/pl.po

.pc/disable-SSLv2/po/pt.po

.pc/disable-SSLv2/po/pt_BR.po

.pc/disable-SSLv2/po/ru.po

.pc/disable-SSLv2/po/sk.po

.pc/disable-SSLv2/po/sl.po

.pc/disable-SSLv2/po/sv.po

.pc/disable-SSLv2/po/tr.po

.pc/disable-SSLv2/po/vi.po

.pc/disable-SSLv2/po/zh_CN.po

.pc/disable-SSLv2/po/zh_TW.po

.pc/disable-SSLv2/src

.pc/disable-SSLv2/src/init.c

.pc/disable-SSLv2/src/main.c

.pc/disable-SSLv2/src/openssl.c

.pc/fix-paramter-spelling-error-in-wget.texi

.pc/fix-paramter-spelling-error-in-wget.texi/doc

.pc/fix-paramter-spelling-error-in-wget.texi/doc/wget.texi

.pc/refresh-pofiles

.pc/refresh-pofiles/po

.pc/refresh-pofiles/po/be.po

.pc/refresh-pofiles/po/bg.po

.pc/refresh-pofiles/po/ca.po

.pc/refresh-pofiles/po/cs.po

.pc/refresh-pofiles/po/da.po

.pc/refresh-pofiles/po/de.po

.pc/refresh-pofiles/po/el.po

.pc/refresh-pofiles/po/en_GB.po

.pc/refresh-pofiles/po/en_US.po

.pc/refresh-pofiles/po/eo.po

.pc/refresh-pofiles/po/es.po

.pc/refresh-pofiles/po/et.po

.pc/refresh-pofiles/po/eu.po

.pc/refresh-pofiles/po/fi.po

.pc/refresh-pofiles/po/fr.po

.pc/refresh-pofiles/po/ga.po

.pc/refresh-pofiles/po/gl.po

.pc/refresh-pofiles/po/he.po

.pc/refresh-pofiles/po/hr.po

.pc/refresh-pofiles/po/hu.po

.pc/refresh-pofiles/po/id.po

.pc/refresh-pofiles/po/it.po

.pc/refresh-pofiles/po/ja.po

.pc/refresh-pofiles/po/lt.po

.pc/refresh-pofiles/po/nb.po

.pc/refresh-pofiles/po/nl.po

.pc/refresh-pofiles/po/pl.po

.pc/refresh-pofiles/po/pt.po

.pc/refresh-pofiles/po/pt_BR.po

.pc/refresh-pofiles/po/ro.po

.pc/refresh-pofiles/po/ru.po

.pc/refresh-pofiles/po/sk.po

.pc/refresh-pofiles/po/sl.po

.pc/refresh-pofiles/po/sr.po

.pc/refresh-pofiles/po/sv.po

.pc/refresh-pofiles/po/tr.po

.pc/refresh-pofiles/po/uk.po

.pc/refresh-pofiles/po/vi.po

.pc/refresh-pofiles/po/zh_CN.po

.pc/refresh-pofiles/po/zh_TW.po

.pc/wget-de.po-remove-double-quote-signs

.pc/wget-de.po-remove-double-quote-signs/po

.pc/wget-de.po-remove-double-quote-signs/po/de.po

.pc/wget-zh_CN.po-translation-correction

.pc/wget-zh_CN.po-translation-correction/po

.pc/wget-zh_CN.po-translation-correction/po/zh_CN.po

autogen.sh

build-aux/link-warning.h

build-aux/mkinstalldirs

configure.bat

debian/patches/CVE-2010-2252

debian/patches/fix-paramter-spelling-error-in-wget.texi

debian/patches/refresh-pofiles

debian/patches/wget-de.po-remove-double-quote-signs

debian/patches/wget-infopod_generated_manpage

debian/patches/wget-zh_CN.po-translation-correction

lib/getpagesize.c

lib/strcasecmp.c

lib/strings.in.h

lib/strncasecmp.c

m4/exitfail.m4

m4/getpagesize.m4

m4/strcase.m4

m4/strings_h.m4

m4/wchar.m4

m4/wctype.m4

md5/Makefile.am

md5/Makefile.in

md5/dummy.c

md5/m4

md5/m4/gnulib-cache.m4

md5/m4/gnulib-comp.m4

md5/m4/md5.m4

md5/md5.c

md5/md5.h

md5/stddef.in.h

md5/stdint.in.h

md5/wchar.in.h

po/en@boldquot.gmo

po/en@boldquot.po

po/en@quot.gmo

po/en@quot.po

po/en_US.gmo

po/en_US.po

src/gen-md5.c

src/gen-md5.h

src/snprintf.c

windows

windows/ChangeLog

windows/Makefile.am

windows/Makefile.doc

windows/Makefile.in

windows/Makefile.src

windows/Makefile.src.bor

windows/Makefile.src.mingw

windows/Makefile.top

windows/Makefile.top.bor

windows/Makefile.top.mingw

windows/README

windows/config-compiler.h

windows/config.h

files modified:
.pc/applied-patches

.pc/wget-doc-remove-usr-local-in-wget.texi/doc/wget.texi

.pc/wget-fr.po-spelling-correction/po/fr.po

AUTHORS

ChangeLog

GNUmakefile

INSTALL

Makefile.am

Makefile.in

NEWS

README

aclocal.m4

build-aux/announce-gen

build-aux/build_info.pl

build-aux/compile

build-aux/config.guess

build-aux/config.rpath

build-aux/config.sub

build-aux/depcomp

build-aux/gnupload

build-aux/install-sh

build-aux/mdate-sh

build-aux/missing

build-aux/texinfo.tex

build-aux/update-copyright

build-aux/useless-if-before-free

build-aux/vc-list-files

build-aux/ylwrap

configure

configure.ac

debian/changelog

debian/control

debian/copyright

debian/patches/series

debian/patches/wget-doc-remove-usr-local-in-wget.texi

debian/patches/wget-fr.po-spelling-correction

debian/rules

doc/ChangeLog

doc/Makefile.am

doc/Makefile.in

doc/fdl.texi

doc/stamp-vti

doc/texi2pod.pl

doc/version.texi

doc/wget.info

doc/wget.texi

lib/Makefile.am

lib/Makefile.in

lib/alloca.c

lib/alloca.in.h

lib/c-ctype.c

lib/c-ctype.h

lib/config.charset *

lib/errno.in.h

lib/error.c

lib/error.h

lib/exitfail.c

lib/exitfail.h

lib/fseeko.c

lib/getdelim.c

lib/getline.c

lib/getopt.c

lib/getopt.in.h

lib/getopt1.c

lib/getopt_int.h

lib/getpass.c

lib/getpass.h

lib/gettext.h

lib/intprops.h

lib/localcharset.c

lib/localcharset.h

lib/lseek.c

lib/mbrtowc.c

lib/mbsinit.c

lib/memchr.c

lib/quote.c

lib/quote.h

lib/quotearg.c

lib/quotearg.h

lib/realloc.c

lib/ref-add.sin

lib/ref-del.sin

lib/stdbool.in.h

lib/stddef.in.h

lib/stdint.in.h

lib/stdio-impl.h

lib/stdio-write.c

lib/stdio.in.h

lib/stdlib.in.h

lib/str-two-way.h

lib/strcasestr.c

lib/streq.h

lib/strerror.c

lib/string.in.h

lib/unistd.in.h

lib/verify.h

lib/wchar.in.h

lib/wctype.in.h

lib/xalloc-die.c

lib/xalloc.h

lib/xmalloc.c

m4/00gnulib.m4

m4/alloca.m4

m4/codeset.m4

m4/errno_h.m4

m4/error.m4

m4/extensions.m4

m4/fseeko.m4

m4/getdelim.m4

m4/getline.m4

m4/getopt.m4

m4/getpass.m4

m4/gettext.m4

m4/glibc21.m4

m4/gnulib-common.m4

m4/gnulib-comp.m4

m4/iconv.m4

m4/include_next.m4

m4/inline.m4

m4/lib-ld.m4

m4/lib-link.m4

m4/lib-prefix.m4

m4/localcharset.m4

m4/locale-fr.m4

m4/locale-ja.m4

m4/locale-zh.m4

m4/longlong.m4

m4/lseek.m4

m4/malloc.m4

m4/mbrtowc.m4

m4/mbsinit.m4

m4/mbstate_t.m4

m4/memchr.m4

m4/mmap-anon.m4

m4/multiarch.m4

m4/nls.m4

m4/po.m4

m4/quote.m4

m4/quotearg.m4

m4/realloc.m4

m4/stdbool.m4

m4/stddef_h.m4

m4/stdint.m4

m4/stdio_h.m4

m4/stdlib_h.m4

m4/strcasestr.m4

m4/strerror.m4

m4/string_h.m4

m4/unistd_h.m4

m4/wchar_t.m4

m4/wget.m4

m4/wint_t.m4

m4/xalloc.m4

maint.mk

msdos/config.h

po/Makefile.in.in

po/Makevars

po/POTFILES.in

po/Rules-quot

po/be.gmo

po/be.po

po/bg.gmo

po/bg.po

po/boldquot.sed

po/ca.gmo

po/ca.po

po/cs.gmo

po/cs.po

po/da.gmo

po/da.po

po/de.gmo

po/de.po

po/el.gmo

po/el.po

po/en_GB.gmo

po/en_GB.po

po/eo.gmo

po/eo.po

po/es.gmo

po/es.po

po/et.gmo

po/et.po

po/eu.gmo

po/eu.po

po/fi.gmo

po/fi.po

po/fr.gmo

po/fr.po

po/ga.gmo

po/ga.po

po/gl.gmo

po/gl.po

po/he.gmo

po/he.po

po/hr.gmo

po/hr.po

po/hu.gmo

po/hu.po

po/id.gmo

po/id.po

po/it.gmo

po/it.po

po/ja.gmo

po/ja.po

po/lt.gmo

po/lt.po

po/nb.gmo

po/nb.po

po/nl.gmo

po/nl.po

po/pl.gmo

po/pl.po

po/pt.gmo

po/pt.po

po/pt_BR.gmo

po/pt_BR.po

po/quot.sed

po/ro.gmo

po/ro.po

po/ru.gmo

po/ru.po

po/sk.gmo

po/sk.po

po/sl.gmo

po/sl.po

po/sr.gmo

po/sr.po

po/sv.gmo

po/sv.po

po/tr.gmo

po/tr.po

po/uk.gmo

po/uk.po

po/vi.gmo

po/vi.po

po/wget.pot

po/zh_CN.gmo

po/zh_CN.po

po/zh_TW.gmo

po/zh_TW.po

src/ChangeLog

src/Makefile.am

src/Makefile.in

src/build_info.c

src/build_info.c.in

src/cmpt.c

src/config.h.in

src/connect.c

src/connect.h

src/convert.c

src/convert.h

src/cookies.c

src/cookies.h

src/css-tokens.h

src/css-url.c

src/css-url.h

src/css.c

src/css.l

src/exits.c

src/exits.h

src/ftp-basic.c

src/ftp-ls.c

src/ftp-opie.c

src/ftp.c

src/ftp.h

src/gettext.h

src/gnutls.c

src/hash.c

src/hash.h

src/host.c

src/host.h

src/html-parse.c

src/html-parse.h

src/html-url.c

src/html-url.h

src/http-ntlm.c

src/http-ntlm.h

src/http.c

src/http.h

src/init.c

src/init.h

src/iri.c

src/iri.h

src/log.c

src/log.h

src/main.c

src/mswindows.c

src/mswindows.h

src/netrc.c

src/netrc.h

src/openssl.c

src/options.h

src/progress.c

src/progress.h

src/ptimer.c

src/ptimer.h

src/recur.c

src/recur.h

src/res.c

src/res.h

src/retr.c

src/retr.h

src/spider.c

src/spider.h

src/ssl.h

src/sysdep.h

src/test.c

src/test.h

src/url.c

src/url.h

src/utils.c

src/utils.h

src/wget.h

tests/ChangeLog

tests/FTPServer.pm

tests/Makefile.am

tests/Makefile.in

tests/Test--no-content-disposition-trivial.px

tests/Test--no-content-disposition.px

tests/Test--spider-fail.px

tests/Test--spider-r--no-content-disposition-trivial.px

tests/Test--spider-r--no-content-disposition.px

tests/Test--spider-r-HTTP-Content-Disposition.px

tests/Test--spider-r.px

tests/Test--spider.px

tests/Test-E-k-K.px

tests/Test-E-k.px

tests/Test-HTTP-Content-Disposition-1.px

tests/Test-HTTP-Content-Disposition-2.px

tests/Test-HTTP-Content-Disposition.px

tests/Test-N--no-content-disposition-trivial.px

tests/Test-N--no-content-disposition.px

tests/Test-N-HTTP-Content-Disposition.px

tests/Test-N-current.px

tests/Test-N-no-info.px

tests/Test-N-old.px

tests/Test-N-smaller.px

tests/Test-N.px

tests/Test-O--no-content-disposition-trivial.px

tests/Test-O--no-content-disposition.px

tests/Test-O-HTTP-Content-Disposition.px

tests/Test-O-nc.px

tests/Test-O-nonexisting.px

tests/Test-O.px

tests/Test-Restrict-Lowercase.px

tests/Test-Restrict-Uppercase.px

tests/Test-auth-basic.px

tests/Test-auth-no-challenge-url.px

tests/Test-auth-no-challenge.px

tests/Test-auth-with-content-disposition.px

tests/Test-c-full.px

tests/Test-c-partial.px

tests/Test-c-shorter.px

tests/Test-c.px

tests/Test-cookies-401.px

tests/Test-cookies.px

tests/Test-ftp-bad-list.px

tests/Test-ftp-iri-disabled.px

tests/Test-ftp-iri-fallback.px

tests/Test-ftp-iri-recursive.px

tests/Test-ftp-iri.px

tests/Test-ftp-pasv-fail.px

tests/Test-ftp-recursive.px

tests/Test-ftp.px

tests/Test-idn-cmd.px

tests/Test-idn-headers.px

tests/Test-idn-meta.px

tests/Test-idn-robots.px

tests/Test-iri-disabled.px

tests/Test-iri-forced-remote.px

tests/Test-iri-list.px

tests/Test-iri-percent.px

tests/Test-iri.px

tests/Test-k.px

tests/Test-meta-robots.px

tests/Test-nonexisting-quiet.px

tests/Test-noop.px

tests/Test-np.px

tests/Test-proxied-https-auth.px

tests/Test-proxy-auth-basic.px

tests/Test-restrict-ascii.px

tests/run-px

util/Makefile.am

util/Makefile.in

util/rmold.pl

Show diffs side-by-side

added added

removed removed

.pc/disable-SSLv2/doc/wget.texi

\input texinfo @c -*-texinfo-*-

@c %**start of header

@setfilename wget.info

@include version.texi

@settitle GNU Wget @value{VERSION} Manual

@c Disable the monstrous rectangles beside overfull hbox-es.

@finalout

@c Use `odd' to print double-sided.

@setchapternewpage on

@c %**end of header

@iftex

@c Remove this if you don't use A4 paper.

@afourpaper

@end iftex

@c Title for man page. The weird way texi2pod.pl is written requires

@c the preceding @set.

@set Wget Wget

@c man title Wget The non-interactive network downloader.

@dircategory Network Applications

@direntry

* Wget: (wget). The non-interactive network downloader.

@end direntry

@copying

This file documents the GNU Wget utility for downloading network

data.

@c man begin COPYRIGHT

2004, 2005, 2006, 2007, 2008, 2009 Free Software Foundation, Inc.

@iftex

Permission is granted to make and distribute verbatim copies of

this manual provided the copyright notice and this permission notice

are preserved on all copies.

@end iftex

@ignore

Permission is granted to process this file through TeX and print the

results, provided the printed document carries a copying permission

notice identical to this one except for the removal of this paragraph

(this paragraph not being relevant to the printed manual).

@end ignore

Permission is granted to copy, distribute and/or modify this document

under the terms of the GNU Free Documentation License, Version 1.2 or

any later version published by the Free Software Foundation; with no

Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A

copy of the license is included in the section entitled ``GNU Free

Documentation License''.

@c man end

@end copying

@titlepage

@title GNU Wget @value{VERSION}

@subtitle The non-interactive download utility

@subtitle Updated for Wget @value{VERSION}, @value{UPDATED}

@author by Hrvoje Nik@v{s}i@'{c} and others

@ignore

@c man begin AUTHOR

Originally written by Hrvoje Niksic <hniksic@xemacs.org>.

Currently maintained by Micah Cowan <micah@cowan.name>.

@c man end

@c man begin SEEALSO

This is @strong{not} the complete manual for GNU Wget.

For more complete information, including more detailed explanations of

some of the options, and a number of commands available

for use with @file{.wgetrc} files and the @samp{-e} option, see the GNU

Info entry for @file{wget}.

@c man end

@end ignore

@page

@vskip 0pt plus 1filll

@insertcopying

@end titlepage

@contents

@ifnottex

@node Top, Overview, (dir), (dir)

@top Wget @value{VERSION}

@insertcopying

@end ifnottex

@menu

* Overview:: Features of Wget.

* Invoking:: Wget command-line arguments.

* Recursive Download:: Downloading interlinked pages.

* Following Links:: The available methods of chasing links.

* Time-Stamping:: Mirroring according to time-stamps.

* Startup File:: Wget's initialization file.

* Examples:: Examples of usage.

* Various:: The stuff that doesn't fit anywhere else.

100

* Appendices:: Some useful references.

101

* Copying this manual:: You may give out copies of this manual.

102

* Concept Index:: Topics covered by this manual.

103

@end menu

104

105

@node Overview, Invoking, Top, Top

106

@chapter Overview

107

@cindex overview

108

@cindex features

109

110

@c man begin DESCRIPTION

111

GNU Wget is a free utility for non-interactive download of files from

112

the Web. It supports @sc{http}, @sc{https}, and @sc{ftp} protocols, as

113

well as retrieval through @sc{http} proxies.

114

115

@c man end

116

This chapter is a partial overview of Wget's features.

117

118

@itemize @bullet

119

@item

120

@c man begin DESCRIPTION

121

Wget is non-interactive, meaning that it can work in the background,

122

while the user is not logged on. This allows you to start a retrieval

123

and disconnect from the system, letting Wget finish the work. By

124

contrast, most of the Web browsers require constant user's presence,

125

which can be a great hindrance when transferring a lot of data.

126

@c man end

127

128

@item

129

@ignore

130

@c man begin DESCRIPTION

131

132

@c man end

133

@end ignore

134

@c man begin DESCRIPTION

135

Wget can follow links in @sc{html}, @sc{xhtml}, and @sc{css} pages, to

136

create local versions of remote web sites, fully recreating the

137

directory structure of the original site. This is sometimes referred to

138

as ``recursive downloading.'' While doing that, Wget respects the Robot

139

Exclusion Standard (@file{/robots.txt}). Wget can be instructed to

140

convert the links in downloaded files to point at the local files, for

141

offline viewing.

142

@c man end

143

144

@item

145

File name wildcard matching and recursive mirroring of directories are

146

available when retrieving via @sc{ftp}. Wget can read the time-stamp

147

information given by both @sc{http} and @sc{ftp} servers, and store it

148

locally. Thus Wget can see if the remote file has changed since last

149

retrieval, and automatically retrieve the new version if it has. This

150

makes Wget suitable for mirroring of @sc{ftp} sites, as well as home

151

pages.

152

153

@item

154

@ignore

155

@c man begin DESCRIPTION

156

157

@c man end

158

@end ignore

159

@c man begin DESCRIPTION

160

Wget has been designed for robustness over slow or unstable network

161

connections; if a download fails due to a network problem, it will

162

keep retrying until the whole file has been retrieved. If the server

163

supports regetting, it will instruct the server to continue the

164

download from where it left off.

165

@c man end

166

167

@item

168

Wget supports proxy servers, which can lighten the network load, speed

169

up retrieval and provide access behind firewalls. Wget uses the passive

170

@sc{ftp} downloading by default, active @sc{ftp} being an option.

171

172

@item

173

Wget supports IP version 6, the next generation of IP. IPv6 is

174

autodetected at compile-time, and can be disabled at either build or

175

run time. Binaries built with IPv6 support work well in both

176

IPv4-only and dual family environments.

177

178

@item

179

Built-in features offer mechanisms to tune which links you wish to follow

180

(@pxref{Following Links}).

181

182

@item

183

The progress of individual downloads is traced using a progress gauge.

184

Interactive downloads are tracked using a ``thermometer''-style gauge,

185

whereas non-interactive ones are traced with dots, each dot

186

representing a fixed amount of data received (1KB by default). Either

187

gauge can be customized to your preferences.

188

189

@item

190

Most of the features are fully configurable, either through command line

191

options, or via the initialization file @file{.wgetrc} (@pxref{Startup

192

File}). Wget allows you to define @dfn{global} startup files

193

(@file{/etc/wgetrc} by default) for site settings.

194

195

@ignore

196

@c man begin FILES

197

@table @samp

198

@item /etc/wgetrc

199

Default location of the @dfn{global} startup file.

200

201

@item .wgetrc

202

User startup file.

203

@end table

204

@c man end

205

@end ignore

206

207

@item

208

Finally, GNU Wget is free software. This means that everyone may use

209

it, redistribute it and/or modify it under the terms of the GNU General

210

Public License, as published by the Free Software Foundation (see the

211

file @file{COPYING} that came with GNU Wget, for details).

212

@end itemize

213

214

@node Invoking, Recursive Download, Overview, Top

215

@chapter Invoking

216

@cindex invoking

217

@cindex command line

218

@cindex arguments

219

@cindex nohup

220

221

By default, Wget is very simple to invoke. The basic syntax is:

222

223

@example

224

@c man begin SYNOPSIS

225

wget [@var{option}]@dots{} [@var{URL}]@dots{}

226

@c man end

227

@end example

228

229

Wget will simply download all the @sc{url}s specified on the command

230

line. @var{URL} is a @dfn{Uniform Resource Locator}, as defined below.

231

232

However, you may wish to change some of the default parameters of

233

Wget. You can do it two ways: permanently, adding the appropriate

234

command to @file{.wgetrc} (@pxref{Startup File}), or specifying it on

235

the command line.

236

237

@menu

238

* URL Format::

239

* Option Syntax::

240

* Basic Startup Options::

241

* Logging and Input File Options::

242

* Download Options::

243

* Directory Options::

244

* HTTP Options::

245

* HTTPS (SSL/TLS) Options::

246

* FTP Options::

247

* Recursive Retrieval Options::

248

* Recursive Accept/Reject Options::

249

* Exit Status::

250

@end menu

251

252

@node URL Format, Option Syntax, Invoking, Invoking

253

@section URL Format

254

@cindex URL

255

@cindex URL syntax

256

257

@dfn{URL} is an acronym for Uniform Resource Locator. A uniform

258

resource locator is a compact string representation for a resource

259

available via the Internet. Wget recognizes the @sc{url} syntax as per

260

@sc{rfc1738}. This is the most widely used form (square brackets denote

261

optional parts):

262

263

@example

264

http://host[:port]/directory/file

265

ftp://host[:port]/directory/file

266

@end example

267

268

You can also encode your username and password within a @sc{url}:

269

270

@example

271

ftp://user:password@@host/path

272

http://user:password@@host/path

273

@end example

274

275

Either @var{user} or @var{password}, or both, may be left out. If you

276

leave out either the @sc{http} username or password, no authentication

277

will be sent. If you leave out the @sc{ftp} username, @samp{anonymous}

278

will be used. If you leave out the @sc{ftp} password, your email

279

address will be supplied as a default password.@footnote{If you have a

280

@file{.netrc} file in your home directory, password will also be

281

searched for there.}

282

283

@strong{Important Note}: if you specify a password-containing @sc{url}

284

on the command line, the username and password will be plainly visible

285

to all users on the system, by way of @code{ps}. On multi-user systems,

286

this is a big security risk. To work around it, use @code{wget -i -}

287

and feed the @sc{url}s to Wget's standard input, each on a separate

288

line, terminated by @kbd{C-d}.

289

290

You can encode unsafe characters in a @sc{url} as @samp{%xy}, @code{xy}

291

being the hexadecimal representation of the character's @sc{ascii}

292

value. Some common unsafe characters include @samp{%} (quoted as

293

@samp{%25}), @samp{:} (quoted as @samp{%3A}), and @samp{@@} (quoted as

294

@samp{%40}). Refer to @sc{rfc1738} for a comprehensive list of unsafe

295

characters.

296

297

Wget also supports the @code{type} feature for @sc{ftp} @sc{url}s. By

298

default, @sc{ftp} documents are retrieved in the binary mode (type

299

@samp{i}), which means that they are downloaded unchanged. Another

300

useful mode is the @samp{a} (@dfn{ASCII}) mode, which converts the line

301

delimiters between the different operating systems, and is thus useful

302

for text files. Here is an example:

303

304

@example

305

ftp://host/directory/file;type=a

306

@end example

307

308

Two alternative variants of @sc{url} specification are also supported,

309

because of historical (hysterical?) reasons and their widespreaded use.

310

311

@sc{ftp}-only syntax (supported by @code{NcFTP}):

312

@example

313

host:/dir/file

314

@end example

315

316

@sc{http}-only syntax (introduced by @code{Netscape}):

317

@example

318

host[:port]/dir/file

319

@end example

320

321

These two alternative forms are deprecated, and may cease being

322

supported in the future.

323

324

If you do not understand the difference between these notations, or do

325

not know which one to use, just use the plain ordinary format you use

326

with your favorite browser, like @code{Lynx} or @code{Netscape}.

327

328

@c man begin OPTIONS

329

330

@node Option Syntax, Basic Startup Options, URL Format, Invoking

331

@section Option Syntax

332

@cindex option syntax

333

@cindex syntax of options

334

335

Since Wget uses GNU getopt to process command-line arguments, every

336

option has a long form along with the short one. Long options are

337

more convenient to remember, but take time to type. You may freely

338

mix different option styles, or specify options after the command-line

339

arguments. Thus you may write:

340

341

@example

342

wget -r --tries=10 http://fly.srk.fer.hr/ -o log

343

@end example

344

345

The space between the option accepting an argument and the argument may

346

be omitted. Instead of @samp{-o log} you can write @samp{-olog}.

347

348

You may put several options that do not require arguments together,

349

like:

350

351

@example

352

wget -drc @var{URL}

353

@end example

354

355

This is completely equivalent to:

356

357

@example

358

wget -d -r -c @var{URL}

359

@end example

360

361

Since the options can be specified after the arguments, you may

362

terminate them with @samp{--}. So the following will try to download

363

@sc{url} @samp{-x}, reporting failure to @file{log}:

364

365

@example

366

wget -o log -- -x

367

@end example

368

369

The options that accept comma-separated lists all respect the convention

370

that specifying an empty list clears its value. This can be useful to

371

clear the @file{.wgetrc} settings. For instance, if your @file{.wgetrc}

372

sets @code{exclude_directories} to @file{/cgi-bin}, the following

373

example will first reset it, and then set it to exclude @file{/~nobody}

374

and @file{/~somebody}. You can also clear the lists in @file{.wgetrc}

375

(@pxref{Wgetrc Syntax}).

376

377

@example

378

wget -X '' -X /~nobody,/~somebody

379

@end example

380

381

Most options that do not accept arguments are @dfn{boolean} options,

382

so named because their state can be captured with a yes-or-no

383

(``boolean'') variable. For example, @samp{--follow-ftp} tells Wget

384

to follow FTP links from HTML files and, on the other hand,

385

@samp{--no-glob} tells it not to perform file globbing on FTP URLs. A

386

boolean option is either @dfn{affirmative} or @dfn{negative}

387

(beginning with @samp{--no}). All such options share several

388

properties.

389

390

Unless stated otherwise, it is assumed that the default behavior is

391

the opposite of what the option accomplishes. For example, the

392

documented existence of @samp{--follow-ftp} assumes that the default

393

is to @emph{not} follow FTP links from HTML pages.

394

395

Affirmative options can be negated by prepending the @samp{--no-} to

396

the option name; negative options can be negated by omitting the

397

@samp{--no-} prefix. This might seem superfluous---if the default for

398

an affirmative option is to not do something, then why provide a way

399

to explicitly turn it off? But the startup file may in fact change

400

the default. For instance, using @code{follow_ftp = on} in

401

@file{.wgetrc} makes Wget @emph{follow} FTP links by default, and

402

using @samp{--no-follow-ftp} is the only way to restore the factory

403

default from the command line.

404

405

@node Basic Startup Options, Logging and Input File Options, Option Syntax, Invoking

406

@section Basic Startup Options

407

408

@table @samp

409

@item -V

410

@itemx --version

411

Display the version of Wget.

412

413

@item -h

414

@itemx --help

415

Print a help message describing all of Wget's command-line options.

416

417

@item -b

418

@itemx --background

419

Go to background immediately after startup. If no output file is

420

specified via the @samp{-o}, output is redirected to @file{wget-log}.

421

422

@cindex execute wgetrc command

423

@item -e @var{command}

424

@itemx --execute @var{command}

425

Execute @var{command} as if it were a part of @file{.wgetrc}

426

(@pxref{Startup File}). A command thus invoked will be executed

427

@emph{after} the commands in @file{.wgetrc}, thus taking precedence over

428

them. If you need to specify more than one wgetrc command, use multiple

429

instances of @samp{-e}.

430

431

@end table

432

433

@node Logging and Input File Options, Download Options, Basic Startup Options, Invoking

434

@section Logging and Input File Options

435

436

@table @samp

437

@cindex output file

438

@cindex log file

439

@item -o @var{logfile}

440

@itemx --output-file=@var{logfile}

441

Log all messages to @var{logfile}. The messages are normally reported

442

to standard error.

443

444

@cindex append to log

445

@item -a @var{logfile}

446

@itemx --append-output=@var{logfile}

447

Append to @var{logfile}. This is the same as @samp{-o}, only it appends

448

to @var{logfile} instead of overwriting the old log file. If

449

@var{logfile} does not exist, a new file is created.

450

451

@cindex debug

452

@item -d

453

@itemx --debug

454

Turn on debug output, meaning various information important to the

455

developers of Wget if it does not work properly. Your system

456

administrator may have chosen to compile Wget without debug support, in

457

which case @samp{-d} will not work. Please note that compiling with

458

debug support is always safe---Wget compiled with the debug support will

459

@emph{not} print any debug info unless requested with @samp{-d}.

460

@xref{Reporting Bugs}, for more information on how to use @samp{-d} for

461

sending bug reports.

462

463

@cindex quiet

464

@item -q

465

@itemx --quiet

466

Turn off Wget's output.

467

468

@cindex verbose

469

@item -v

470

@itemx --verbose

471

Turn on verbose output, with all the available data. The default output

472

is verbose.

473

474

@item -nv

475

@itemx --no-verbose

476

Turn off verbose without being completely quiet (use @samp{-q} for

477

that), which means that error messages and basic information still get

478

printed.

479

480

@cindex input-file

481

@item -i @var{file}

482

@itemx --input-file=@var{file}

483

Read @sc{url}s from a local or external @var{file}. If @samp{-} is

484

specified as @var{file}, @sc{url}s are read from the standard input.

485

(Use @samp{./-} to read from a file literally named @samp{-}.)

486

487

If this function is used, no @sc{url}s need be present on the command

488

line. If there are @sc{url}s both on the command line and in an input

489

file, those on the command lines will be the first ones to be

490

retrieved. If @samp{--force-html} is not specified, then @var{file}

491

should consist of a series of URLs, one per line.

492

493

However, if you specify @samp{--force-html}, the document will be

494

regarded as @samp{html}. In that case you may have problems with

495

relative links, which you can solve either by adding @code{<base

496

href="@var{url}">} to the documents or by specifying

497

@samp{--base=@var{url}} on the command line.

498

499

If the @var{file} is an external one, the document will be automatically

500

treated as @samp{html} if the Content-Type matches @samp{text/html}.

501

Furthermore, the @var{file}'s location will be implicitly used as base

502

href if none was specified.

503

504

@cindex force html

505

@item -F

506

@itemx --force-html

507

When input is read from a file, force it to be treated as an @sc{html}

508

file. This enables you to retrieve relative links from existing

509

@sc{html} files on your local disk, by adding @code{<base

510

href="@var{url}">} to @sc{html}, or using the @samp{--base} command-line

511

option.

512

513

@cindex base for relative links in input file

514

@item -B @var{URL}

515

@itemx --base=@var{URL}

516

Resolves relative links using @var{URL} as the point of reference,

517

when reading links from an HTML file specified via the

518

@samp{-i}/@samp{--input-file} option (together with

519

@samp{--force-html}, or when the input file was fetched remotely from

520

a server describing it as @sc{html}). This is equivalent to the

521

presence of a @code{BASE} tag in the @sc{html} input file, with

522

@var{URL} as the value for the @code{href} attribute.

523

524

For instance, if you specify @samp{http://foo/bar/a.html} for

525

@var{URL}, and Wget reads @samp{../baz/b.html} from the input file, it

526

would be resolved to @samp{http://foo/baz/b.html}.

527

@end table

528

529

@node Download Options, Directory Options, Logging and Input File Options, Invoking

530

@section Download Options

531

532

@table @samp

533

@cindex bind address

534

@cindex client IP address

535

@cindex IP address, client

536

@item --bind-address=@var{ADDRESS}

537

When making client TCP/IP connections, bind to @var{ADDRESS} on

538

the local machine. @var{ADDRESS} may be specified as a hostname or IP

539

address. This option can be useful if your machine is bound to multiple

540

IPs.

541

542

@cindex retries

543

@cindex tries

544

@cindex number of retries

545

@item -t @var{number}

546

@itemx --tries=@var{number}

547

Set number of retries to @var{number}. Specify 0 or @samp{inf} for

548

infinite retrying. The default is to retry 20 times, with the exception

549

of fatal errors like ``connection refused'' or ``not found'' (404),

550

which are not retried.

551

552

@item -O @var{file}

553

@itemx --output-document=@var{file}

554

The documents will not be written to the appropriate files, but all

555

will be concatenated together and written to @var{file}. If @samp{-}

556

is used as @var{file}, documents will be printed to standard output,

557

disabling link conversion. (Use @samp{./-} to print to a file

558

literally named @samp{-}.)

559

560

Use of @samp{-O} is @emph{not} intended to mean simply ``use the name

561

@var{file} instead of the one in the URL;'' rather, it is

562

analogous to shell redirection:

563

@samp{wget -O file http://foo} is intended to work like

564

@samp{wget -O - http://foo > file}; @file{file} will be truncated

565

immediately, and @emph{all} downloaded content will be written there.

566

567

For this reason, @samp{-N} (for timestamp-checking) is not supported

568

in combination with @samp{-O}: since @var{file} is always newly

569

created, it will always have a very new timestamp. A warning will be

570

issued if this combination is used.

571

572

Similarly, using @samp{-r} or @samp{-p} with @samp{-O} may not work as

573

you expect: Wget won't just download the first file to @var{file} and

574

then download the rest to their normal names: @emph{all} downloaded

575

content will be placed in @var{file}. This was disabled in version

576

1.11, but has been reinstated (with a warning) in 1.11.2, as there are

577

some cases where this behavior can actually have some use.

578

579

Note that a combination with @samp{-k} is only permitted when

580

downloading a single document, as in that case it will just convert

581

all relative URIs to external ones; @samp{-k} makes no sense for

582

multiple URIs when they're all being downloaded to a single file.

583

584

@cindex clobbering, file

585

@cindex downloading multiple times

586

@cindex no-clobber

587

@item -nc

588

@itemx --no-clobber

589

If a file is downloaded more than once in the same directory, Wget's

590

behavior depends on a few options, including @samp{-nc}. In certain

591

cases, the local file will be @dfn{clobbered}, or overwritten, upon

592

repeated download. In other cases it will be preserved.

593

594

When running Wget without @samp{-N}, @samp{-nc}, @samp{-r}, or

595

@samp{-p}, downloading the same file in the same directory will result

596

in the original copy of @var{file} being preserved and the second copy

597

being named @samp{@var{file}.1}. If that file is downloaded yet

598

again, the third copy will be named @samp{@var{file}.2}, and so on.

599

(This is also the behavior with @samp{-nd}, even if @samp{-r} or

600

@samp{-p} are in effect.) When @samp{-nc} is specified, this behavior

601

is suppressed, and Wget will refuse to download newer copies of

602

@samp{@var{file}}. Therefore, ``@code{no-clobber}'' is actually a

603

misnomer in this mode---it's not clobbering that's prevented (as the

604

numeric suffixes were already preventing clobbering), but rather the

605

multiple version saving that's prevented.

606

607

When running Wget with @samp{-r} or @samp{-p}, but without @samp{-N},

608

@samp{-nd}, or @samp{-nc}, re-downloading a file will result in the

609

new copy simply overwriting the old. Adding @samp{-nc} will prevent

610

this behavior, instead causing the original version to be preserved

611

and any newer copies on the server to be ignored.

612

613

When running Wget with @samp{-N}, with or without @samp{-r} or

614

@samp{-p}, the decision as to whether or not to download a newer copy

615

of a file depends on the local and remote timestamp and size of the

616

file (@pxref{Time-Stamping}). @samp{-nc} may not be specified at the

617

same time as @samp{-N}.

618

619

Note that when @samp{-nc} is specified, files with the suffixes

620

@samp{.html} or @samp{.htm} will be loaded from the local disk and

621

parsed as if they had been retrieved from the Web.

622

623

@cindex continue retrieval

624

@cindex incomplete downloads

625

@cindex resume download

626

@item -c

627

@itemx --continue

628

Continue getting a partially-downloaded file. This is useful when you

629

want to finish up a download started by a previous instance of Wget, or

630

by another program. For instance:

631

632

@example

633

wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z

634

@end example

635

636

If there is a file named @file{ls-lR.Z} in the current directory, Wget

637

will assume that it is the first portion of the remote file, and will

638

ask the server to continue the retrieval from an offset equal to the

639

length of the local file.

640

641

Note that you don't need to specify this option if you just want the

642

current invocation of Wget to retry downloading a file should the

643

connection be lost midway through. This is the default behavior.

644

@samp{-c} only affects resumption of downloads started @emph{prior} to

645

this invocation of Wget, and whose local files are still sitting around.

646

647

Without @samp{-c}, the previous example would just download the remote

648

file to @file{ls-lR.Z.1}, leaving the truncated @file{ls-lR.Z} file

649

alone.

650

651

Beginning with Wget 1.7, if you use @samp{-c} on a non-empty file, and

652

it turns out that the server does not support continued downloading,

653

Wget will refuse to start the download from scratch, which would

654

effectively ruin existing contents. If you really want the download to

655

start from scratch, remove the file.

656

657

Also beginning with Wget 1.7, if you use @samp{-c} on a file which is of

658

equal size as the one on the server, Wget will refuse to download the

659

file and print an explanatory message. The same happens when the file

660

is smaller on the server than locally (presumably because it was changed

661

on the server since your last download attempt)---because ``continuing''

662

is not meaningful, no download occurs.

663

664

On the other side of the coin, while using @samp{-c}, any file that's

665

bigger on the server than locally will be considered an incomplete

666

download and only @code{(length(remote) - length(local))} bytes will be

667

downloaded and tacked onto the end of the local file. This behavior can

668

be desirable in certain cases---for instance, you can use @samp{wget -c}

669

to download just the new portion that's been appended to a data

670

collection or log file.

671

672

However, if the file is bigger on the server because it's been

673

@emph{changed}, as opposed to just @emph{appended} to, you'll end up

674

with a garbled file. Wget has no way of verifying that the local file

675

is really a valid prefix of the remote file. You need to be especially

676

careful of this when using @samp{-c} in conjunction with @samp{-r},

677

since every file will be considered as an "incomplete download" candidate.

678

679

Another instance where you'll get a garbled file if you try to use

680

@samp{-c} is if you have a lame @sc{http} proxy that inserts a

681

``transfer interrupted'' string into the local file. In the future a

682

``rollback'' option may be added to deal with this case.

683

684

Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}

685

servers that support the @code{Range} header.

686

687

@cindex progress indicator

688

@cindex dot style

689

@item --progress=@var{type}

690

Select the type of the progress indicator you wish to use. Legal

691

indicators are ``dot'' and ``bar''.

692

693

The ``bar'' indicator is used by default. It draws an @sc{ascii} progress

694

bar graphics (a.k.a ``thermometer'' display) indicating the status of

695

retrieval. If the output is not a TTY, the ``dot'' bar will be used by

696

default.

697

698

Use @samp{--progress=dot} to switch to the ``dot'' display. It traces

699

the retrieval by printing dots on the screen, each dot representing a

700

fixed amount of downloaded data.

701

702

When using the dotted retrieval, you may also set the @dfn{style} by

703

specifying the type as @samp{dot:@var{style}}. Different styles assign

704

different meaning to one dot. With the @code{default} style each dot

705

represents 1K, there are ten dots in a cluster and 50 dots in a line.

706

The @code{binary} style has a more ``computer''-like orientation---8K

707

dots, 16-dots clusters and 48 dots per line (which makes for 384K

708

lines). The @code{mega} style is suitable for downloading very large

709

files---each dot represents 64K retrieved, there are eight dots in a

710

cluster, and 48 dots on each line (so each line contains 3M).

711

712

Note that you can set the default style using the @code{progress}

713

command in @file{.wgetrc}. That setting may be overridden from the

714

command line. The exception is that, when the output is not a TTY, the

715

``dot'' progress will be favored over ``bar''. To force the bar output,

716

use @samp{--progress=bar:force}.

717

718

@item -N

719

@itemx --timestamping

720

Turn on time-stamping. @xref{Time-Stamping}, for details.

721

722

@cindex server response, print

723

@item -S

724

@itemx --server-response

725

Print the headers sent by @sc{http} servers and responses sent by

726

@sc{ftp} servers.

727

728

@cindex Wget as spider

729

@cindex spider

730

@item --spider

731

When invoked with this option, Wget will behave as a Web @dfn{spider},

732

which means that it will not download the pages, just check that they

733

are there. For example, you can use Wget to check your bookmarks:

734

735

@example

736

wget --spider --force-html -i bookmarks.html

737

@end example

738

739

This feature needs much more work for Wget to get close to the

740

functionality of real web spiders.

741

742

@cindex timeout

743

@item -T seconds

744

@itemx --timeout=@var{seconds}

745

Set the network timeout to @var{seconds} seconds. This is equivalent

746

to specifying @samp{--dns-timeout}, @samp{--connect-timeout}, and

747

@samp{--read-timeout}, all at the same time.

748

749

When interacting with the network, Wget can check for timeout and

750

abort the operation if it takes too long. This prevents anomalies

751

like hanging reads and infinite connects. The only timeout enabled by

752

default is a 900-second read timeout. Setting a timeout to 0 disables

753

it altogether. Unless you know what you are doing, it is best not to

754

change the default timeout settings.

755

756

All timeout-related options accept decimal values, as well as

757

subsecond values. For example, @samp{0.1} seconds is a legal (though

758

unwise) choice of timeout. Subsecond timeouts are useful for checking

759

server response times or for testing network latency.

760

761

@cindex DNS timeout

762

@cindex timeout, DNS

763

@item --dns-timeout=@var{seconds}

764

Set the DNS lookup timeout to @var{seconds} seconds. DNS lookups that

765

don't complete within the specified time will fail. By default, there

766

is no timeout on DNS lookups, other than that implemented by system

767

libraries.

768

769

@cindex connect timeout

770

@cindex timeout, connect

771

@item --connect-timeout=@var{seconds}

772

Set the connect timeout to @var{seconds} seconds. TCP connections that

773

take longer to establish will be aborted. By default, there is no

774

connect timeout, other than that implemented by system libraries.

775

776

@cindex read timeout

777

@cindex timeout, read

778

@item --read-timeout=@var{seconds}

779

Set the read (and write) timeout to @var{seconds} seconds. The

780

``time'' of this timeout refers to @dfn{idle time}: if, at any point in

781

the download, no data is received for more than the specified number

782

of seconds, reading fails and the download is restarted. This option

783

does not directly affect the duration of the entire download.

784

785

Of course, the remote server may choose to terminate the connection

786

sooner than this option requires. The default read timeout is 900

787

seconds.

788

789

@cindex bandwidth, limit

790

@cindex rate, limit

791

@cindex limit bandwidth

792

@item --limit-rate=@var{amount}

793

Limit the download speed to @var{amount} bytes per second. Amount may

794

be expressed in bytes, kilobytes with the @samp{k} suffix, or megabytes

795

with the @samp{m} suffix. For example, @samp{--limit-rate=20k} will

796

limit the retrieval rate to 20KB/s. This is useful when, for whatever

797

reason, you don't want Wget to consume the entire available bandwidth.

798

799

This option allows the use of decimal numbers, usually in conjunction

800

with power suffixes; for example, @samp{--limit-rate=2.5k} is a legal

801

value.

802

803

Note that Wget implements the limiting by sleeping the appropriate

804

amount of time after a network read that took less time than specified

805

by the rate. Eventually this strategy causes the TCP transfer to slow

806

down to approximately the specified rate. However, it may take some

807

time for this balance to be achieved, so don't be surprised if limiting

808

the rate doesn't work well with very small files.

809

810

@cindex pause

811

@cindex wait

812

@item -w @var{seconds}

813

@itemx --wait=@var{seconds}

814

Wait the specified number of seconds between the retrievals. Use of

815

this option is recommended, as it lightens the server load by making the

816

requests less frequent. Instead of in seconds, the time can be

817

specified in minutes using the @code{m} suffix, in hours using @code{h}

818

suffix, or in days using @code{d} suffix.

819

820

Specifying a large value for this option is useful if the network or the

821

destination host is down, so that Wget can wait long enough to

822

reasonably expect the network error to be fixed before the retry. The

823

waiting interval specified by this function is influenced by

824

@code{--random-wait}, which see.

825

826

@cindex retries, waiting between

827

@cindex waiting between retries

828

@item --waitretry=@var{seconds}

829

If you don't want Wget to wait between @emph{every} retrieval, but only

830

between retries of failed downloads, you can use this option. Wget will

831

use @dfn{linear backoff}, waiting 1 second after the first failure on a

832

given file, then waiting 2 seconds after the second failure on that

833

file, up to the maximum number of @var{seconds} you specify. Therefore,

834

a value of 10 will actually make Wget wait up to (1 + 2 + ... + 10) = 55

835

seconds per file.

836

837

By default, Wget will assume a value of 10 seconds.

838

839

@cindex wait, random

840

@cindex random wait

841

@item --random-wait

842

Some web sites may perform log analysis to identify retrieval programs

843

such as Wget by looking for statistically significant similarities in

844

the time between requests. This option causes the time between requests

845

to vary between 0.5 and 1.5 * @var{wait} seconds, where @var{wait} was

846

specified using the @samp{--wait} option, in order to mask Wget's

847

presence from such analysis.

848

849

A 2001 article in a publication devoted to development on a popular

850

consumer platform provided code to perform this analysis on the fly.

851

Its author suggested blocking at the class C address level to ensure

852

automated retrieval programs were blocked despite changing DHCP-supplied

853

addresses.

854

855

The @samp{--random-wait} option was inspired by this ill-advised

856

recommendation to block many unrelated users from a web site due to the

857

actions of one.

858

859

@cindex proxy

860

@itemx --no-proxy

861

Don't use proxies, even if the appropriate @code{*_proxy} environment

862

variable is defined.

863

864

@c man end

865

For more information about the use of proxies with Wget, @xref{Proxies}.

866

@c man begin OPTIONS

867

868

@cindex quota

869

@item -Q @var{quota}

870

@itemx --quota=@var{quota}

871

Specify download quota for automatic retrievals. The value can be

872

specified in bytes (default), kilobytes (with @samp{k} suffix), or

873

megabytes (with @samp{m} suffix).

874

875

Note that quota will never affect downloading a single file. So if you

876

specify @samp{wget -Q10k ftp://wuarchive.wustl.edu/ls-lR.gz}, all of the

877

@file{ls-lR.gz} will be downloaded. The same goes even when several

878

@sc{url}s are specified on the command-line. However, quota is

879

respected when retrieving either recursively, or from an input file.

880

Thus you may safely type @samp{wget -Q2m -i sites}---download will be

881

aborted when the quota is exceeded.

882

883

Setting quota to 0 or to @samp{inf} unlimits the download quota.

884

885

@cindex DNS cache

886

@cindex caching of DNS lookups

887

@item --no-dns-cache

888

Turn off caching of DNS lookups. Normally, Wget remembers the IP

889

addresses it looked up from DNS so it doesn't have to repeatedly

890

contact the DNS server for the same (typically small) set of hosts it

891

retrieves from. This cache exists in memory only; a new Wget run will

892

contact DNS again.

893

894

However, it has been reported that in some situations it is not

895

desirable to cache host names, even for the duration of a

896

short-running application like Wget. With this option Wget issues a

897

new DNS lookup (more precisely, a new call to @code{gethostbyname} or

898

@code{getaddrinfo}) each time it makes a new connection. Please note

899

that this option will @emph{not} affect caching that might be

900

performed by the resolving library or by an external caching layer,

901

such as NSCD.

902

903

If you don't understand exactly what this option does, you probably

904

won't need it.

905

906

@cindex file names, restrict

907

@cindex Windows file names

908

@item --restrict-file-names=@var{modes}

909

Change which characters found in remote URLs must be escaped during

910

generation of local filenames. Characters that are @dfn{restricted}

911

by this option are escaped, i.e. replaced with @samp{%HH}, where

912

@samp{HH} is the hexadecimal number that corresponds to the restricted

913

character. This option may also be used to force all alphabetical

914

cases to be either lower- or uppercase.

915

916

By default, Wget escapes the characters that are not valid or safe as

917

part of file names on your operating system, as well as control

918

characters that are typically unprintable. This option is useful for

919

changing these defaults, perhaps because you are downloading to a

920

non-native partition, or because you want to disable escaping of the

921

control characters, or you want to further restrict characters to only

922

those in the @sc{ascii} range of values.

923

924

The @var{modes} are a comma-separated set of text values. The

925

acceptable values are @samp{unix}, @samp{windows}, @samp{nocontrol},

926

@samp{ascii}, @samp{lowercase}, and @samp{uppercase}. The values

927

@samp{unix} and @samp{windows} are mutually exclusive (one will

928

override the other), as are @samp{lowercase} and

929

@samp{uppercase}. Those last are special cases, as they do not change

930

the set of characters that would be escaped, but rather force local

931

file paths to be converted either to lower- or uppercase.

932

933

When ``unix'' is specified, Wget escapes the character @samp{/} and

934

the control characters in the ranges 0--31 and 128--159. This is the

935

default on Unix-like operating systems.

936

937

When ``windows'' is given, Wget escapes the characters @samp{\},

938

@samp{|}, @samp{/}, @samp{:}, @samp{?}, @samp{"}, @samp{*}, @samp{<},

939

@samp{>}, and the control characters in the ranges 0--31 and 128--159.

940

In addition to this, Wget in Windows mode uses @samp{+} instead of

941

@samp{:} to separate host and port in local file names, and uses

942

@samp{@@} instead of @samp{?} to separate the query portion of the file

943

name from the rest. Therefore, a URL that would be saved as

944

@samp{www.xemacs.org:4300/search.pl?input=blah} in Unix mode would be

945

saved as @samp{www.xemacs.org+4300/search.pl@@input=blah} in Windows

946

mode. This mode is the default on Windows.

947

948

If you specify @samp{nocontrol}, then the escaping of the control

949

characters is also switched off. This option may make sense

950

when you are downloading URLs whose names contain UTF-8 characters, on

951

a system which can save and display filenames in UTF-8 (some possible

952

byte values used in UTF-8 byte sequences fall in the range of values

953

designated by Wget as ``controls'').

954

955

The @samp{ascii} mode is used to specify that any bytes whose values

956

are outside the range of @sc{ascii} characters (that is, greater than

957

127) shall be escaped. This can be useful when saving filenames

958

whose encoding does not match the one used locally.

959

960

@cindex IPv6

961

@itemx -4

962

@itemx --inet4-only

963

@itemx -6

964

@itemx --inet6-only

965

Force connecting to IPv4 or IPv6 addresses. With @samp{--inet4-only}

966

or @samp{-4}, Wget will only connect to IPv4 hosts, ignoring AAAA

967

records in DNS, and refusing to connect to IPv6 addresses specified in

968

URLs. Conversely, with @samp{--inet6-only} or @samp{-6}, Wget will

969

only connect to IPv6 hosts and ignore A records and IPv4 addresses.

970

971

Neither options should be needed normally. By default, an IPv6-aware

972

Wget will use the address family specified by the host's DNS record.

973

If the DNS responds with both IPv4 and IPv6 addresses, Wget will try

974

them in sequence until it finds one it can connect to. (Also see

975

@code{--prefer-family} option described below.)

976

977

These options can be used to deliberately force the use of IPv4 or

978

IPv6 address families on dual family systems, usually to aid debugging

979

or to deal with broken network configuration. Only one of

980

@samp{--inet6-only} and @samp{--inet4-only} may be specified at the

981

same time. Neither option is available in Wget compiled without IPv6

982

support.

983

984

@item --prefer-family=none/IPv4/IPv6

985

When given a choice of several addresses, connect to the addresses

986

with specified address family first. The address order returned by

987

DNS is used without change by default.

988

989

This avoids spurious errors and connect attempts when accessing hosts

990

that resolve to both IPv6 and IPv4 addresses from IPv4 networks. For

991

example, @samp{www.kame.net} resolves to

992

@samp{2001:200:0:8002:203:47ff:fea5:3085} and to

993

@samp{203.178.141.194}. When the preferred family is @code{IPv4}, the

994

IPv4 address is used first; when the preferred family is @code{IPv6},

995

the IPv6 address is used first; if the specified value is @code{none},

996

the address order returned by DNS is used without change.

997

998

Unlike @samp{-4} and @samp{-6}, this option doesn't inhibit access to

999

any address family, it only changes the @emph{order} in which the

1000

addresses are accessed. Also note that the reordering performed by

1001

this option is @dfn{stable}---it doesn't affect order of addresses of

1002

the same family. That is, the relative order of all IPv4 addresses

1003

and of all IPv6 addresses remains intact in all cases.

1004

1005

@item --retry-connrefused

1006

Consider ``connection refused'' a transient error and try again.

1007

Normally Wget gives up on a URL when it is unable to connect to the

1008

site because failure to connect is taken as a sign that the server is

1009

not running at all and that retries would not help. This option is

1010

for mirroring unreliable sites whose servers tend to disappear for

1011

short periods of time.

1012

1013

@cindex user

1014

@cindex password

1015

@cindex authentication

1016

@item --user=@var{user}

1017

@itemx --password=@var{password}

1018

Specify the username @var{user} and password @var{password} for both

1019

@sc{ftp} and @sc{http} file retrieval. These parameters can be overridden

1020

using the @samp{--ftp-user} and @samp{--ftp-password} options for

1021

@sc{ftp} connections and the @samp{--http-user} and @samp{--http-password}

1022

options for @sc{http} connections.

1023

1024

@item --ask-password

1025

Prompt for a password for each connection established. Cannot be specified

1026

when @samp{--password} is being used, because they are mutually exclusive.

1027

1028

@cindex iri support

1029

@cindex idn support

1030

@item --no-iri

1031

1032

Turn off internationalized URI (IRI) support. Use @samp{--iri} to

1033

turn it on. IRI support is activated by default.

1034

1035

You can set the default state of IRI support using the @code{iri}

1036

command in @file{.wgetrc}. That setting may be overridden from the

1037

command line.

1038

1039

@cindex local encoding

1040

@item --local-encoding=@var{encoding}

1041

1042

Force Wget to use @var{encoding} as the default system encoding. That affects

1043

how Wget converts URLs specified as arguments from locale to @sc{utf-8} for

1044

IRI support.

1045

1046

Wget use the function @code{nl_langinfo()} and then the @code{CHARSET}

1047

environment variable to get the locale. If it fails, @sc{ascii} is used.

1048

1049

You can set the default local encoding using the @code{local_encoding}

1050

command in @file{.wgetrc}. That setting may be overridden from the

1051

command line.

1052

1053

@cindex remote encoding

1054

@item --remote-encoding=@var{encoding}

1055

1056

Force Wget to use @var{encoding} as the default remote server encoding.

1057

That affects how Wget converts URIs found in files from remote encoding

1058

to @sc{utf-8} during a recursive fetch. This options is only useful for

1059

IRI support, for the interpretation of non-@sc{ascii} characters.

1060

1061

For HTTP, remote encoding can be found in HTTP @code{Content-Type}

1062

header and in HTML @code{Content-Type http-equiv} meta tag.

1063

1064

You can set the default encoding using the @code{remoteencoding}

1065

command in @file{.wgetrc}. That setting may be overridden from the

1066

command line.

1067

@end table

1068

1069

@node Directory Options, HTTP Options, Download Options, Invoking

1070

@section Directory Options

1071

1072

@table @samp

1073

@item -nd

1074

@itemx --no-directories

1075

Do not create a hierarchy of directories when retrieving recursively.

1076

With this option turned on, all files will get saved to the current

1077

directory, without clobbering (if a name shows up more than once, the

1078

filenames will get extensions @samp{.n}).

1079

1080

@item -x

1081

@itemx --force-directories

1082

The opposite of @samp{-nd}---create a hierarchy of directories, even if

1083

one would not have been created otherwise. E.g. @samp{wget -x

1084

http://fly.srk.fer.hr/robots.txt} will save the downloaded file to

1085

@file{fly.srk.fer.hr/robots.txt}.

1086

1087

@item -nH

1088

@itemx --no-host-directories

1089

Disable generation of host-prefixed directories. By default, invoking

1090

Wget with @samp{-r http://fly.srk.fer.hr/} will create a structure of

1091

directories beginning with @file{fly.srk.fer.hr/}. This option disables

1092

such behavior.

1093

1094

@item --protocol-directories

1095

Use the protocol name as a directory component of local file names. For

1096

example, with this option, @samp{wget -r http://@var{host}} will save to

1097

@samp{http/@var{host}/...} rather than just to @samp{@var{host}/...}.

1098

1099

@cindex cut directories

1100

@item --cut-dirs=@var{number}

1101

Ignore @var{number} directory components. This is useful for getting a

1102

fine-grained control over the directory where recursive retrieval will

1103

be saved.

1104

1105

Take, for example, the directory at

1106

@samp{ftp://ftp.xemacs.org/pub/xemacs/}. If you retrieve it with

1107

@samp{-r}, it will be saved locally under

1108

@file{ftp.xemacs.org/pub/xemacs/}. While the @samp{-nH} option can

1109

remove the @file{ftp.xemacs.org/} part, you are still stuck with

1110

@file{pub/xemacs}. This is where @samp{--cut-dirs} comes in handy; it

1111

makes Wget not ``see'' @var{number} remote directory components. Here

1112

are several examples of how @samp{--cut-dirs} option works.

1113

1114

@example

1115

@group

1116

No options -> ftp.xemacs.org/pub/xemacs/

1117

-nH -> pub/xemacs/

1118

-nH --cut-dirs=1 -> xemacs/

1119

-nH --cut-dirs=2 -> .

1120

1121

--cut-dirs=1 -> ftp.xemacs.org/xemacs/

1122

...

1123

@end group

1124

@end example

1125

1126

If you just want to get rid of the directory structure, this option is

1127

similar to a combination of @samp{-nd} and @samp{-P}. However, unlike

1128

@samp{-nd}, @samp{--cut-dirs} does not lose with subdirectories---for

1129

instance, with @samp{-nH --cut-dirs=1}, a @file{beta/} subdirectory will

1130

be placed to @file{xemacs/beta}, as one would expect.

1131

1132

@cindex directory prefix

1133

@item -P @var{prefix}

1134

@itemx --directory-prefix=@var{prefix}

1135

Set directory prefix to @var{prefix}. The @dfn{directory prefix} is the

1136

directory where all other files and subdirectories will be saved to,

1137

i.e. the top of the retrieval tree. The default is @samp{.} (the

1138

current directory).

1139

@end table

1140

1141

@node HTTP Options, HTTPS (SSL/TLS) Options, Directory Options, Invoking

1142

@section HTTP Options

1143

1144

@table @samp

1145

@cindex default page name

1146

@cindex index.html

1147

@item --default-page=@var{name}

1148

Use @var{name} as the default file name when it isn't known (i.e., for

1149

URLs that end in a slash), instead of @file{index.html}.

1150

1151

@cindex .html extension

1152

@cindex .css extension

1153

@item -E

1154

@itemx --adjust-extension

1155

If a file of type @samp{application/xhtml+xml} or @samp{text/html} is

1156

downloaded and the URL does not end with the regexp

1157

@samp{\.[Hh][Tt][Mm][Ll]?}, this option will cause the suffix @samp{.html}

1158

to be appended to the local filename. This is useful, for instance, when

1159

you're mirroring a remote site that uses @samp{.asp} pages, but you want

1160

the mirrored pages to be viewable on your stock Apache server. Another

1161

good use for this is when you're downloading CGI-generated materials. A URL

1162

like @samp{http://site.com/article.cgi?25} will be saved as

1163

@file{article.cgi?25.html}.

1164

1165

Note that filenames changed in this way will be re-downloaded every time

1166

you re-mirror a site, because Wget can't tell that the local

1167

@file{@var{X}.html} file corresponds to remote URL @samp{@var{X}} (since

1168

it doesn't yet know that the URL produces output of type

1169

@samp{text/html} or @samp{application/xhtml+xml}. To prevent this

1170

re-downloading, you must use @samp{-k} and @samp{-K} so that the original

1171

version of the file will be saved as @file{@var{X}.orig} (@pxref{Recursive

1172

Retrieval Options}).

1173

1174

As of version 1.12, Wget will also ensure that any downloaded files of

1175

type @samp{text/css} end in the suffix @samp{.css}, and the option was

1176

renamed from @samp{--html-extension}, to better reflect its new

1177

behavior. The old option name is still acceptable, but should now be

1178

considered deprecated.

1179

1180

At some point in the future, this option may well be expanded to

1181

include suffixes for other types of content, including content types

1182

that are not parsed by Wget.

1183

1184

@cindex http user

1185

@cindex http password

1186

@cindex authentication

1187

@item --http-user=@var{user}

1188

@itemx --http-password=@var{password}

1189

Specify the username @var{user} and password @var{password} on an

1190

@sc{http} server. According to the type of the challenge, Wget will

1191

encode them using either the @code{basic} (insecure),

1192

the @code{digest}, or the Windows @code{NTLM} authentication scheme.

1193

1194

Another way to specify username and password is in the @sc{url} itself

1195

(@pxref{URL Format}). Either method reveals your password to anyone who

1196

bothers to run @code{ps}. To prevent the passwords from being seen,

1197

store them in @file{.wgetrc} or @file{.netrc}, and make sure to protect

1198

those files from other users with @code{chmod}. If the passwords are

1199

really important, do not leave them lying in those files either---edit

1200

the files and delete them after Wget has started the download.

1201

1202

@iftex

1203

For more information about security issues with Wget, @xref{Security

1204

Considerations}.

1205

@end iftex

1206

1207

@cindex Keep-Alive, turning off

1208

@cindex Persistent Connections, disabling

1209

@item --no-http-keep-alive

1210

Turn off the ``keep-alive'' feature for HTTP downloads. Normally, Wget

1211

asks the server to keep the connection open so that, when you download

1212

more than one document from the same server, they get transferred over

1213

the same TCP connection. This saves time and at the same time reduces

1214

the load on the server.

1215

1216

This option is useful when, for some reason, persistent (keep-alive)

1217

connections don't work for you, for example due to a server bug or due

1218

to the inability of server-side scripts to cope with the connections.

1219

1220

@cindex proxy

1221

@cindex cache

1222

@item --no-cache

1223

Disable server-side cache. In this case, Wget will send the remote

1224

server an appropriate directive (@samp{Pragma: no-cache}) to get the

1225

file from the remote service, rather than returning the cached version.

1226

This is especially useful for retrieving and flushing out-of-date

1227

documents on proxy servers.

1228

1229

Caching is allowed by default.

1230

1231

@cindex cookies

1232

@item --no-cookies

1233

Disable the use of cookies. Cookies are a mechanism for maintaining

1234

server-side state. The server sends the client a cookie using the

1235

@code{Set-Cookie} header, and the client responds with the same cookie

1236

upon further requests. Since cookies allow the server owners to keep

1237

track of visitors and for sites to exchange this information, some

1238

consider them a breach of privacy. The default is to use cookies;

1239

however, @emph{storing} cookies is not on by default.

1240

1241

@cindex loading cookies

1242

@cindex cookies, loading

1243

@item --load-cookies @var{file}

1244

Load cookies from @var{file} before the first HTTP retrieval.

1245

@var{file} is a textual file in the format originally used by Netscape's

1246

@file{cookies.txt} file.

1247

1248

You will typically use this option when mirroring sites that require

1249

that you be logged in to access some or all of their content. The login

1250

process typically works by the web server issuing an @sc{http} cookie

1251

upon receiving and verifying your credentials. The cookie is then

1252

resent by the browser when accessing that part of the site, and so

1253

proves your identity.

1254

1255

Mirroring such a site requires Wget to send the same cookies your

1256

browser sends when communicating with the site. This is achieved by

1257

@samp{--load-cookies}---simply point Wget to the location of the

1258

@file{cookies.txt} file, and it will send the same cookies your browser

1259

would send in the same situation. Different browsers keep textual

1260

cookie files in different locations:

1261

1262

@table @asis

1263

@item Netscape 4.x.

1264

The cookies are in @file{~/.netscape/cookies.txt}.

1265

1266

@item Mozilla and Netscape 6.x.

1267

Mozilla's cookie file is also named @file{cookies.txt}, located

1268

somewhere under @file{~/.mozilla}, in the directory of your profile.

1269

The full path usually ends up looking somewhat like

1270

@file{~/.mozilla/default/@var{some-weird-string}/cookies.txt}.

1271

1272

@item Internet Explorer.

1273

You can produce a cookie file Wget can use by using the File menu,

1274

Import and Export, Export Cookies. This has been tested with Internet

1275

Explorer 5; it is not guaranteed to work with earlier versions.

1276

1277

@item Other browsers.

1278

If you are using a different browser to create your cookies,

1279

@samp{--load-cookies} will only work if you can locate or produce a

1280

cookie file in the Netscape format that Wget expects.

1281

@end table

1282

1283

If you cannot use @samp{--load-cookies}, there might still be an

1284

alternative. If your browser supports a ``cookie manager'', you can use

1285

it to view the cookies used when accessing the site you're mirroring.

1286

Write down the name and value of the cookie, and manually instruct Wget

1287

to send those cookies, bypassing the ``official'' cookie support:

1288

1289

@example

1290

wget --no-cookies --header "Cookie: @var{name}=@var{value}"

1291

@end example

1292

1293

@cindex saving cookies

1294

@cindex cookies, saving

1295

@item --save-cookies @var{file}

1296

Save cookies to @var{file} before exiting. This will not save cookies

1297

that have expired or that have no expiry time (so-called ``session

1298

cookies''), but also see @samp{--keep-session-cookies}.

1299

1300

@cindex cookies, session

1301

@cindex session cookies

1302

@item --keep-session-cookies

1303

When specified, causes @samp{--save-cookies} to also save session

1304

cookies. Session cookies are normally not saved because they are

1305

meant to be kept in memory and forgotten when you exit the browser.

1306

Saving them is useful on sites that require you to log in or to visit

1307

the home page before you can access some pages. With this option,

1308

multiple Wget runs are considered a single browser session as far as

1309

the site is concerned.

1310

1311

Since the cookie file format does not normally carry session cookies,

1312

Wget marks them with an expiry timestamp of 0. Wget's

1313

@samp{--load-cookies} recognizes those as session cookies, but it might

1314

confuse other browsers. Also note that cookies so loaded will be

1315

treated as other session cookies, which means that if you want

1316

@samp{--save-cookies} to preserve them again, you must use

1317

@samp{--keep-session-cookies} again.

1318

1319

@cindex Content-Length, ignore

1320

@cindex ignore length

1321

@item --ignore-length

1322

Unfortunately, some @sc{http} servers (@sc{cgi} programs, to be more

1323

precise) send out bogus @code{Content-Length} headers, which makes Wget

1324

go wild, as it thinks not all the document was retrieved. You can spot

1325

this syndrome if Wget retries getting the same document again and again,

1326

each time claiming that the (otherwise normal) connection has closed on

1327

the very same byte.

1328

1329

With this option, Wget will ignore the @code{Content-Length} header---as

1330

if it never existed.

1331

1332

@cindex header, add

1333

@item --header=@var{header-line}

1334

Send @var{header-line} along with the rest of the headers in each

1335

@sc{http} request. The supplied header is sent as-is, which means it

1336

must contain name and value separated by colon, and must not contain

1337

newlines.

1338

1339

You may define more than one additional header by specifying

1340

@samp{--header} more than once.

1341

1342

@example

1343

@group

1344

wget --header='Accept-Charset: iso-8859-2' \

1345

--header='Accept-Language: hr' \

1346

http://fly.srk.fer.hr/

1347

@end group

1348

@end example

1349

1350

Specification of an empty string as the header value will clear all

1351

previous user-defined headers.

1352

1353

As of Wget 1.10, this option can be used to override headers otherwise

1354

generated automatically. This example instructs Wget to connect to

1355

localhost, but to specify @samp{foo.bar} in the @code{Host} header:

1356

1357

@example

1358

wget --header="Host: foo.bar" http://localhost/

1359

@end example

1360

1361

In versions of Wget prior to 1.10 such use of @samp{--header} caused

1362

sending of duplicate headers.

1363

1364

@cindex redirect

1365

@item --max-redirect=@var{number}

1366

Specifies the maximum number of redirections to follow for a resource.

1367

The default is 20, which is usually far more than necessary. However, on

1368

those occasions where you want to allow more (or fewer), this is the

1369

option to use.

1370

1371

@cindex proxy user

1372

@cindex proxy password

1373

@cindex proxy authentication

1374

@item --proxy-user=@var{user}

1375

@itemx --proxy-password=@var{password}

1376

Specify the username @var{user} and password @var{password} for

1377

authentication on a proxy server. Wget will encode them using the

1378

@code{basic} authentication scheme.

1379

1380

Security considerations similar to those with @samp{--http-password}

1381

pertain here as well.

1382

1383

@cindex http referer

1384

@cindex referer, http

1385

@item --referer=@var{url}

1386

Include `Referer: @var{url}' header in HTTP request. Useful for

1387

retrieving documents with server-side processing that assume they are

1388

always being retrieved by interactive web browsers and only come out

1389

properly when Referer is set to one of the pages that point to them.

1390

1391

@cindex server response, save

1392

@item --save-headers

1393

Save the headers sent by the @sc{http} server to the file, preceding the

1394

actual contents, with an empty line as the separator.

1395

1396

@cindex user-agent

1397

@item -U @var{agent-string}

1398

@itemx --user-agent=@var{agent-string}

1399

Identify as @var{agent-string} to the @sc{http} server.

1400

1401

The @sc{http} protocol allows the clients to identify themselves using a

1402

@code{User-Agent} header field. This enables distinguishing the

1403

@sc{www} software, usually for statistical purposes or for tracing of

1404

protocol violations. Wget normally identifies as

1405

@samp{Wget/@var{version}}, @var{version} being the current version

1406

number of Wget.

1407

1408

However, some sites have been known to impose the policy of tailoring

1409

the output according to the @code{User-Agent}-supplied information.

1410

While this is not such a bad idea in theory, it has been abused by

1411

servers denying information to clients other than (historically)

1412

Netscape or, more frequently, Microsoft Internet Explorer. This

1413

option allows you to change the @code{User-Agent} line issued by Wget.

1414

Use of this option is discouraged, unless you really know what you are

1415

doing.

1416

1417

Specifying empty user agent with @samp{--user-agent=""} instructs Wget

1418

not to send the @code{User-Agent} header in @sc{http} requests.

1419

1420

@cindex POST

1421

@item --post-data=@var{string}

1422

@itemx --post-file=@var{file}

1423

Use POST as the method for all HTTP requests and send the specified

1424

data in the request body. @samp{--post-data} sends @var{string} as

1425

data, whereas @samp{--post-file} sends the contents of @var{file}.

1426

Other than that, they work in exactly the same way. In particular,

1427

they @emph{both} expect content of the form @code{key1=value1&key2=value2},

1428

with percent-encoding for special characters; the only difference is

1429

that one expects its content as a command-line parameter and the other

1430

accepts its content from a file. In particular, @samp{--post-file} is

1431

@emph{not} for transmitting files as form attachments: those must

1432

appear as @code{key=value} data (with appropriate percent-coding) just

1433

like everything else. Wget does not currently support

1434

@code{multipart/form-data} for transmitting POST data; only

1435

@code{application/x-www-form-urlencoded}. Only one of

1436

@samp{--post-data} and @samp{--post-file} should be specified.

1437

1438

Please be aware that Wget needs to know the size of the POST data in

1439

advance. Therefore the argument to @code{--post-file} must be a regular

1440

file; specifying a FIFO or something like @file{/dev/stdin} won't work.

1441

It's not quite clear how to work around this limitation inherent in

1442

HTTP/1.0. Although HTTP/1.1 introduces @dfn{chunked} transfer that

1443

doesn't require knowing the request length in advance, a client can't

1444

use chunked unless it knows it's talking to an HTTP/1.1 server. And it

1445

can't know that until it receives a response, which in turn requires the

1446

request to have been completed -- a chicken-and-egg problem.

1447

1448

Note: if Wget is redirected after the POST request is completed, it

1449

will not send the POST data to the redirected URL. This is because

1450

URLs that process POST often respond with a redirection to a regular

1451

page, which does not desire or accept POST. It is not completely

1452

clear that this behavior is optimal; if it doesn't work out, it might

1453

be changed in the future.

1454

1455

This example shows how to log to a server using POST and then proceed to

1456

download the desired pages, presumably only accessible to authorized

1457

users:

1458

1459

@example

1460

@group

1461

# @r{Log in to the server. This can be done only once.}

1462

wget --save-cookies cookies.txt \

1463

--post-data 'user=foo&password=bar' \

1464

http://server.com/auth.php

1465

1466

# @r{Now grab the page or pages we care about.}

1467

wget --load-cookies cookies.txt \

1468

-p http://server.com/interesting/article.php

1469

@end group

1470

@end example

1471

1472

If the server is using session cookies to track user authentication,

1473

the above will not work because @samp{--save-cookies} will not save

1474

them (and neither will browsers) and the @file{cookies.txt} file will

1475

be empty. In that case use @samp{--keep-session-cookies} along with

1476

@samp{--save-cookies} to force saving of session cookies.

1477

1478

@cindex Content-Disposition

1479

@item --content-disposition

1480

1481

If this is set to on, experimental (not fully-functional) support for

1482

@code{Content-Disposition} headers is enabled. This can currently result in

1483

extra round-trips to the server for a @code{HEAD} request, and is known

1484

to suffer from a few bugs, which is why it is not currently enabled by default.

1485

1486

This option is useful for some file-downloading CGI programs that use

1487

@code{Content-Disposition} headers to describe what the name of a

1488

downloaded file should be.

1489

1490

@cindex Trust server names

1491

@item --trust-server-names

1492

1493

If this is set to on, on a redirect the last component of the

1494

redirection URL will be used as the local file name. By default it is

1495

used the last component in the original URL.

1496

1497

@cindex authentication

1498

@item --auth-no-challenge

1499

1500

If this option is given, Wget will send Basic HTTP authentication

1501

information (plaintext username and password) for all requests, just

1502

like Wget 1.10.2 and prior did by default.

1503

1504

Use of this option is not recommended, and is intended only to support

1505

some few obscure servers, which never send HTTP authentication

1506

challenges, but accept unsolicited auth info, say, in addition to

1507

form-based authentication.

1508

1509

@end table

1510

1511

@node HTTPS (SSL/TLS) Options, FTP Options, HTTP Options, Invoking

1512

@section HTTPS (SSL/TLS) Options

1513

1514

@cindex SSL

1515

To support encrypted HTTP (HTTPS) downloads, Wget must be compiled

1516

with an external SSL library, currently OpenSSL. If Wget is compiled

1517

without SSL support, none of these options are available.

1518

1519

@table @samp

1520

@cindex SSL protocol, choose

1521

@item --secure-protocol=@var{protocol}

1522

Choose the secure protocol to be used. Legal values are @samp{auto},

1523

@samp{SSLv2}, @samp{SSLv3}, and @samp{TLSv1}. If @samp{auto} is used,

1524

the SSL library is given the liberty of choosing the appropriate

1525

protocol automatically, which is achieved by sending an SSLv2 greeting

1526

and announcing support for SSLv3 and TLSv1. This is the default.

1527

1528

Specifying @samp{SSLv2}, @samp{SSLv3}, or @samp{TLSv1} forces the use

1529

of the corresponding protocol. This is useful when talking to old and

1530

buggy SSL server implementations that make it hard for OpenSSL to

1531

choose the correct protocol version. Fortunately, such servers are

1532

quite rare.

1533

1534

@cindex SSL certificate, check

1535

@item --no-check-certificate

1536

Don't check the server certificate against the available certificate

1537

authorities. Also don't require the URL host name to match the common

1538

name presented by the certificate.

1539

1540

As of Wget 1.10, the default is to verify the server's certificate

1541

against the recognized certificate authorities, breaking the SSL

1542

handshake and aborting the download if the verification fails.

1543

Although this provides more secure downloads, it does break

1544

interoperability with some sites that worked with previous Wget

1545

versions, particularly those using self-signed, expired, or otherwise

1546

invalid certificates. This option forces an ``insecure'' mode of

1547

operation that turns the certificate verification errors into warnings

1548

and allows you to proceed.

1549

1550

If you encounter ``certificate verification'' errors or ones saying

1551

that ``common name doesn't match requested host name'', you can use

1552

this option to bypass the verification and proceed with the download.

1553

@emph{Only use this option if you are otherwise convinced of the

1554

site's authenticity, or if you really don't care about the validity of

1555

its certificate.} It is almost always a bad idea not to check the

1556

certificates when transmitting confidential or important data.

1557

1558

@cindex SSL certificate

1559

@item --certificate=@var{file}

1560

Use the client certificate stored in @var{file}. This is needed for

1561

servers that are configured to require certificates from the clients

1562

that connect to them. Normally a certificate is not required and this

1563

switch is optional.

1564

1565

@cindex SSL certificate type, specify

1566

@item --certificate-type=@var{type}

1567

Specify the type of the client certificate. Legal values are

1568

@samp{PEM} (assumed by default) and @samp{DER}, also known as

1569

@samp{ASN1}.

1570

1571

@item --private-key=@var{file}

1572

Read the private key from @var{file}. This allows you to provide the

1573

private key in a file separate from the certificate.

1574

1575

@item --private-key-type=@var{type}

1576

Specify the type of the private key. Accepted values are @samp{PEM}

1577

(the default) and @samp{DER}.

1578

1579

@item --ca-certificate=@var{file}

1580

Use @var{file} as the file with the bundle of certificate authorities

1581

(``CA'') to verify the peers. The certificates must be in PEM format.

1582

1583

Without this option Wget looks for CA certificates at the

1584

system-specified locations, chosen at OpenSSL installation time.

1585

1586

@cindex SSL certificate authority

1587

@item --ca-directory=@var{directory}

1588

Specifies directory containing CA certificates in PEM format. Each

1589

file contains one CA certificate, and the file name is based on a hash

1590

value derived from the certificate. This is achieved by processing a

1591

certificate directory with the @code{c_rehash} utility supplied with

1592

OpenSSL. Using @samp{--ca-directory} is more efficient than

1593

@samp{--ca-certificate} when many certificates are installed because

1594

it allows Wget to fetch certificates on demand.

1595

1596

Without this option Wget looks for CA certificates at the

1597

system-specified locations, chosen at OpenSSL installation time.

1598

1599

@cindex entropy, specifying source of

1600

@cindex randomness, specifying source of

1601

@item --random-file=@var{file}

1602

Use @var{file} as the source of random data for seeding the

1603

pseudo-random number generator on systems without @file{/dev/random}.

1604

1605

On such systems the SSL library needs an external source of randomness

1606

to initialize. Randomness may be provided by EGD (see

1607

@samp{--egd-file} below) or read from an external source specified by

1608

the user. If this option is not specified, Wget looks for random data

1609

in @code{$RANDFILE} or, if that is unset, in @file{$HOME/.rnd}. If

1610

none of those are available, it is likely that SSL encryption will not

1611

be usable.

1612

1613

If you're getting the ``Could not seed OpenSSL PRNG; disabling SSL.''

1614

error, you should provide random data using some of the methods

1615

described above.

1616

1617

@cindex EGD

1618

@item --egd-file=@var{file}

1619

Use @var{file} as the EGD socket. EGD stands for @dfn{Entropy

1620

Gathering Daemon}, a user-space program that collects data from

1621

various unpredictable system sources and makes it available to other

1622

programs that might need it. Encryption software, such as the SSL

1623

library, needs sources of non-repeating randomness to seed the random

1624

number generator used to produce cryptographically strong keys.

1625

1626

OpenSSL allows the user to specify his own source of entropy using the

1627

@code{RAND_FILE} environment variable. If this variable is unset, or

1628

if the specified file does not produce enough randomness, OpenSSL will

1629

read random data from EGD socket specified using this option.

1630

1631

If this option is not specified (and the equivalent startup command is

1632

not used), EGD is never contacted. EGD is not needed on modern Unix

1633

systems that support @file{/dev/random}.

1634

@end table

1635

1636

@node FTP Options, Recursive Retrieval Options, HTTPS (SSL/TLS) Options, Invoking

1637

@section FTP Options

1638

1639

@table @samp

1640

@cindex ftp user

1641

@cindex ftp password

1642

@cindex ftp authentication

1643

@item --ftp-user=@var{user}

1644

@itemx --ftp-password=@var{password}

1645

Specify the username @var{user} and password @var{password} on an

1646

@sc{ftp} server. Without this, or the corresponding startup option,

1647

the password defaults to @samp{-wget@@}, normally used for anonymous

1648

FTP.

1649

1650

Another way to specify username and password is in the @sc{url} itself

1651

(@pxref{URL Format}). Either method reveals your password to anyone who

1652

bothers to run @code{ps}. To prevent the passwords from being seen,

1653

store them in @file{.wgetrc} or @file{.netrc}, and make sure to protect

1654

those files from other users with @code{chmod}. If the passwords are

1655

really important, do not leave them lying in those files either---edit

1656

the files and delete them after Wget has started the download.

1657

1658

@iftex

1659

For more information about security issues with Wget, @xref{Security

1660

Considerations}.

1661

@end iftex

1662

1663

@cindex .listing files, removing

1664

@item --no-remove-listing

1665

Don't remove the temporary @file{.listing} files generated by @sc{ftp}

1666

retrievals. Normally, these files contain the raw directory listings

1667

received from @sc{ftp} servers. Not removing them can be useful for

1668

debugging purposes, or when you want to be able to easily check on the

1669

contents of remote server directories (e.g. to verify that a mirror

1670

you're running is complete).

1671

1672

Note that even though Wget writes to a known filename for this file,

1673

this is not a security hole in the scenario of a user making

1674

@file{.listing} a symbolic link to @file{/etc/passwd} or something and

1675

asking @code{root} to run Wget in his or her directory. Depending on

1676

the options used, either Wget will refuse to write to @file{.listing},

1677

making the globbing/recursion/time-stamping operation fail, or the

1678

symbolic link will be deleted and replaced with the actual

1679

@file{.listing} file, or the listing will be written to a

1680

@file{.listing.@var{number}} file.

1681

1682

Even though this situation isn't a problem, though, @code{root} should

1683

never run Wget in a non-trusted user's directory. A user could do

1684

something as simple as linking @file{index.html} to @file{/etc/passwd}

1685

and asking @code{root} to run Wget with @samp{-N} or @samp{-r} so the file

1686

will be overwritten.

1687

1688

@cindex globbing, toggle

1689

@item --no-glob

1690

Turn off @sc{ftp} globbing. Globbing refers to the use of shell-like

1691

special characters (@dfn{wildcards}), like @samp{*}, @samp{?}, @samp{[}

1692

and @samp{]} to retrieve more than one file from the same directory at

1693

once, like:

1694

1695

@example

1696

wget ftp://gnjilux.srk.fer.hr/*.msg

1697

@end example

1698

1699

By default, globbing will be turned on if the @sc{url} contains a

1700

globbing character. This option may be used to turn globbing on or off

1701

permanently.

1702

1703

You may have to quote the @sc{url} to protect it from being expanded by

1704

your shell. Globbing makes Wget look for a directory listing, which is

1705

system-specific. This is why it currently works only with Unix @sc{ftp}

1706

servers (and the ones emulating Unix @code{ls} output).

1707

1708

@cindex passive ftp

1709

@item --no-passive-ftp

1710

Disable the use of the @dfn{passive} FTP transfer mode. Passive FTP

1711

mandates that the client connect to the server to establish the data

1712

connection rather than the other way around.

1713

1714

If the machine is connected to the Internet directly, both passive and

1715

active FTP should work equally well. Behind most firewall and NAT

1716

configurations passive FTP has a better chance of working. However,

1717

in some rare firewall configurations, active FTP actually works when

1718

passive FTP doesn't. If you suspect this to be the case, use this

1719

option, or set @code{passive_ftp=off} in your init file.

1720

1721

@cindex symbolic links, retrieving

1722

@item --retr-symlinks

1723

Usually, when retrieving @sc{ftp} directories recursively and a symbolic

1724

link is encountered, the linked-to file is not downloaded. Instead, a

1725

matching symbolic link is created on the local filesystem. The

1726

pointed-to file will not be downloaded unless this recursive retrieval

1727

would have encountered it separately and downloaded it anyway.

1728

1729

When @samp{--retr-symlinks} is specified, however, symbolic links are

1730

traversed and the pointed-to files are retrieved. At this time, this

1731

option does not cause Wget to traverse symlinks to directories and

1732

recurse through them, but in the future it should be enhanced to do

1733

this.

1734

1735

Note that when retrieving a file (not a directory) because it was

1736

specified on the command-line, rather than because it was recursed to,

1737

this option has no effect. Symbolic links are always traversed in this

1738

case.

1739

@end table

1740

1741

@node Recursive Retrieval Options, Recursive Accept/Reject Options, FTP Options, Invoking

1742

@section Recursive Retrieval Options

1743

1744

@table @samp

1745

@item -r

1746

@itemx --recursive

1747

Turn on recursive retrieving. @xref{Recursive Download}, for more

1748

details.

1749

1750

@item -l @var{depth}

1751

@itemx --level=@var{depth}

1752

Specify recursion maximum depth level @var{depth} (@pxref{Recursive

1753

Download}). The default maximum depth is 5.

1754

1755

@cindex proxy filling

1756

@cindex delete after retrieval

1757

@cindex filling proxy cache

1758

@item --delete-after

1759

This option tells Wget to delete every single file it downloads,

1760

@emph{after} having done so. It is useful for pre-fetching popular

1761

pages through a proxy, e.g.:

1762

1763

@example

1764

wget -r -nd --delete-after http://whatever.com/~popular/page/

1765

@end example

1766

1767

The @samp{-r} option is to retrieve recursively, and @samp{-nd} to not

1768

create directories.

1769

1770

Note that @samp{--delete-after} deletes files on the local machine. It

1771

does not issue the @samp{DELE} command to remote FTP sites, for

1772

instance. Also note that when @samp{--delete-after} is specified,

1773

@samp{--convert-links} is ignored, so @samp{.orig} files are simply not

1774

created in the first place.

1775

1776

@cindex conversion of links

1777

@cindex link conversion

1778

@item -k

1779

@itemx --convert-links

1780

After the download is complete, convert the links in the document to

1781

make them suitable for local viewing. This affects not only the visible

1782

hyperlinks, but any part of the document that links to external content,

1783

such as embedded images, links to style sheets, hyperlinks to non-@sc{html}

1784

content, etc.

1785

1786

Each link will be changed in one of the two ways:

1787

1788

@itemize @bullet

1789

@item

1790

The links to files that have been downloaded by Wget will be changed to

1791

refer to the file they point to as a relative link.

1792

1793

Example: if the downloaded file @file{/foo/doc.html} links to

1794

@file{/bar/img.gif}, also downloaded, then the link in @file{doc.html}

1795

will be modified to point to @samp{../bar/img.gif}. This kind of

1796

transformation works reliably for arbitrary combinations of directories.

1797

1798

@item

1799

The links to files that have not been downloaded by Wget will be changed

1800

to include host name and absolute path of the location they point to.

1801

1802

Example: if the downloaded file @file{/foo/doc.html} links to

1803

@file{/bar/img.gif} (or to @file{../bar/img.gif}), then the link in

1804

@file{doc.html} will be modified to point to

1805

@file{http://@var{hostname}/bar/img.gif}.

1806

@end itemize

1807

1808

Because of this, local browsing works reliably: if a linked file was

1809

downloaded, the link will refer to its local name; if it was not

1810

downloaded, the link will refer to its full Internet address rather than

1811

presenting a broken link. The fact that the former links are converted

1812

to relative links ensures that you can move the downloaded hierarchy to

1813

another directory.

1814

1815

Note that only at the end of the download can Wget know which links have

1816

been downloaded. Because of that, the work done by @samp{-k} will be

1817

performed at the end of all the downloads.

1818

1819

@cindex backing up converted files

1820

@item -K

1821

@itemx --backup-converted

1822

When converting a file, back up the original version with a @samp{.orig}

1823

suffix. Affects the behavior of @samp{-N} (@pxref{HTTP Time-Stamping

1824

Internals}).

1825

1826

@item -m

1827

@itemx --mirror

1828

Turn on options suitable for mirroring. This option turns on recursion

1829

and time-stamping, sets infinite recursion depth and keeps @sc{ftp}

1830

directory listings. It is currently equivalent to

1831

@samp{-r -N -l inf --no-remove-listing}.

1832

1833

@cindex page requisites

1834

@cindex required images, downloading

1835

@item -p

1836

@itemx --page-requisites

1837

This option causes Wget to download all the files that are necessary to

1838

properly display a given @sc{html} page. This includes such things as

1839

inlined images, sounds, and referenced stylesheets.

1840

1841

Ordinarily, when downloading a single @sc{html} page, any requisite documents

1842

that may be needed to display it properly are not downloaded. Using

1843

@samp{-r} together with @samp{-l} can help, but since Wget does not

1844

ordinarily distinguish between external and inlined documents, one is

1845

generally left with ``leaf documents'' that are missing their

1846

requisites.

1847

1848

For instance, say document @file{1.html} contains an @code{<IMG>} tag

1849

referencing @file{1.gif} and an @code{<A>} tag pointing to external

1850

document @file{2.html}. Say that @file{2.html} is similar but that its

1851

image is @file{2.gif} and it links to @file{3.html}. Say this

1852

continues up to some arbitrarily high number.

1853

1854

If one executes the command:

1855

1856

@example

1857

wget -r -l 2 http://@var{site}/1.html

1858

@end example

1859

1860

then @file{1.html}, @file{1.gif}, @file{2.html}, @file{2.gif}, and

1861

@file{3.html} will be downloaded. As you can see, @file{3.html} is

1862

without its requisite @file{3.gif} because Wget is simply counting the

1863

number of hops (up to 2) away from @file{1.html} in order to determine

1864

where to stop the recursion. However, with this command:

1865

1866

@example

1867

wget -r -l 2 -p http://@var{site}/1.html

1868

@end example

1869

1870

all the above files @emph{and} @file{3.html}'s requisite @file{3.gif}

1871

will be downloaded. Similarly,

1872

1873

@example

1874

wget -r -l 1 -p http://@var{site}/1.html

1875

@end example

1876

1877

will cause @file{1.html}, @file{1.gif}, @file{2.html}, and @file{2.gif}

1878

to be downloaded. One might think that:

1879

1880

@example

1881

wget -r -l 0 -p http://@var{site}/1.html

1882

@end example

1883

1884

would download just @file{1.html} and @file{1.gif}, but unfortunately

1885

this is not the case, because @samp{-l 0} is equivalent to

1886

@samp{-l inf}---that is, infinite recursion. To download a single @sc{html}

1887

page (or a handful of them, all specified on the command-line or in a

1888

@samp{-i} @sc{url} input file) and its (or their) requisites, simply leave off

1889

@samp{-r} and @samp{-l}:

1890

1891

@example

1892

wget -p http://@var{site}/1.html

1893

@end example

1894

1895

Note that Wget will behave as if @samp{-r} had been specified, but only

1896

that single page and its requisites will be downloaded. Links from that

1897

page to external documents will not be followed. Actually, to download

1898

a single page and all its requisites (even if they exist on separate

1899

websites), and make sure the lot displays properly locally, this author

1900

likes to use a few options in addition to @samp{-p}:

1901

1902

@example

1903

wget -E -H -k -K -p http://@var{site}/@var{document}

1904

@end example

1905

1906

To finish off this topic, it's worth knowing that Wget's idea of an

1907

external document link is any URL specified in an @code{<A>} tag, an

1908

@code{<AREA>} tag, or a @code{<LINK>} tag other than @code{<LINK

1909

REL="stylesheet">}.

1910

1911

@cindex @sc{html} comments

1912

@cindex comments, @sc{html}

1913

@item --strict-comments

1914

Turn on strict parsing of @sc{html} comments. The default is to terminate

1915

comments at the first occurrence of @samp{-->}.

1916

1917

According to specifications, @sc{html} comments are expressed as @sc{sgml}

1918

@dfn{declarations}. Declaration is special markup that begins with

1919

@samp{<!} and ends with @samp{>}, such as @samp{<!DOCTYPE ...>}, that

1920

may contain comments between a pair of @samp{--} delimiters. @sc{html}

1921

comments are ``empty declarations'', @sc{sgml} declarations without any

1922

non-comment text. Therefore, @samp{} is a valid comment, and

1923

so is @samp{}, but @samp{} is not.

1924

1925

On the other hand, most @sc{html} writers don't perceive comments as anything

1926

other than text delimited with @samp{}, which is not

1927

quite the same. For example, something like @samp{}

1928

works as a valid comment as long as the number of dashes is a multiple

1929

of four (!). If not, the comment technically lasts until the next

1930

@samp{--}, which may be at the other end of the document. Because of

1931

this, many popular browsers completely ignore the specification and

1932

implement what users have come to expect: comments delimited with

1933

@samp{}.

1934

1935

Until version 1.9, Wget interpreted comments strictly, which resulted in

1936

missing links in many web pages that displayed fine in browsers, but had

1937

the misfortune of containing non-compliant comments. Beginning with

1938

version 1.9, Wget has joined the ranks of clients that implements

1939

``naive'' comments, terminating each comment at the first occurrence of

1940

@samp{-->}.

1941

1942

If, for whatever reason, you want strict comment parsing, use this

1943

option to turn it on.

1944

@end table

1945

1946

@node Recursive Accept/Reject Options, Exit Status, Recursive Retrieval Options, Invoking

1947

@section Recursive Accept/Reject Options

1948

1949

@table @samp

1950

@item -A @var{acclist} --accept @var{acclist}

1951

@itemx -R @var{rejlist} --reject @var{rejlist}

1952

Specify comma-separated lists of file name suffixes or patterns to

1953

accept or reject (@pxref{Types of Files}). Note that if

1954

any of the wildcard characters, @samp{*}, @samp{?}, @samp{[} or

1955

@samp{]}, appear in an element of @var{acclist} or @var{rejlist},

1956

it will be treated as a pattern, rather than a suffix.

1957

1958

@item -D @var{domain-list}

1959

@itemx --domains=@var{domain-list}

1960

Set domains to be followed. @var{domain-list} is a comma-separated list

1961

of domains. Note that it does @emph{not} turn on @samp{-H}.

1962

1963

@item --exclude-domains @var{domain-list}

1964

Specify the domains that are @emph{not} to be followed.

1965

(@pxref{Spanning Hosts}).

1966

1967

@cindex follow FTP links

1968

@item --follow-ftp

1969

Follow @sc{ftp} links from @sc{html} documents. Without this option,

1970

Wget will ignore all the @sc{ftp} links.

1971

1972

@cindex tag-based recursive pruning

1973

@item --follow-tags=@var{list}

1974

Wget has an internal table of @sc{html} tag / attribute pairs that it

1975

considers when looking for linked documents during a recursive

1976

retrieval. If a user wants only a subset of those tags to be

1977

considered, however, he or she should be specify such tags in a

1978

comma-separated @var{list} with this option.

1979

1980

@item --ignore-tags=@var{list}

1981

This is the opposite of the @samp{--follow-tags} option. To skip

1982

certain @sc{html} tags when recursively looking for documents to download,

1983

specify them in a comma-separated @var{list}.

1984

1985

In the past, this option was the best bet for downloading a single page

1986

and its requisites, using a command-line like:

1987

1988

@example

1989

wget --ignore-tags=a,area -H -k -K -r http://@var{site}/@var{document}

1990

@end example

1991

1992

However, the author of this option came across a page with tags like

1993

@code{<LINK REL="home" HREF="/">} and came to the realization that

1994

specifying tags to ignore was not enough. One can't just tell Wget to

1995

ignore @code{<LINK>}, because then stylesheets will not be downloaded.

1996

Now the best bet for downloading a single page and its requisites is the

1997

dedicated @samp{--page-requisites} option.

1998

1999

@cindex case fold

2000

@cindex ignore case

2001

@item --ignore-case

2002

Ignore case when matching files and directories. This influences the

2003

behavior of -R, -A, -I, and -X options, as well as globbing

2004

implemented when downloading from FTP sites. For example, with this

2005

option, @samp{-A *.txt} will match @samp{file1.txt}, but also

2006

@samp{file2.TXT}, @samp{file3.TxT}, and so on.

2007

2008

@item -H

2009

@itemx --span-hosts

2010

Enable spanning across hosts when doing recursive retrieving

2011

(@pxref{Spanning Hosts}).

2012

2013

@item -L

2014

@itemx --relative

2015

Follow relative links only. Useful for retrieving a specific home page

2016

without any distractions, not even those from the same hosts

2017

(@pxref{Relative Links}).

2018

2019

@item -I @var{list}

2020

@itemx --include-directories=@var{list}

2021

Specify a comma-separated list of directories you wish to follow when

2022

downloading (@pxref{Directory-Based Limits}). Elements

2023

of @var{list} may contain wildcards.

2024

2025

@item -X @var{list}

2026

@itemx --exclude-directories=@var{list}

2027

Specify a comma-separated list of directories you wish to exclude from

2028

download (@pxref{Directory-Based Limits}). Elements of

2029

@var{list} may contain wildcards.

2030

2031

@item -np

2032

@item --no-parent

2033

Do not ever ascend to the parent directory when retrieving recursively.

2034

This is a useful option, since it guarantees that only the files

2035

@emph{below} a certain hierarchy will be downloaded.

2036

@xref{Directory-Based Limits}, for more details.

2037

@end table

2038

2039

@c man end

2040

2041

@node Exit Status, , Recursive Accept/Reject Options, Invoking

2042

@section Exit Status

2043

2044

@c man begin EXITSTATUS

2045

2046

Wget may return one of several error codes if it encounters problems.

2047

2048

2049

@table @asis

2050

@item 0

2051

No problems occurred.

2052

2053

@item 1

2054

Generic error code.

2055

2056

@item 2

2057

Parse error---for instance, when parsing command-line options, the

2058

@samp{.wgetrc} or @samp{.netrc}...

2059

2060

@item 3

2061

File I/O error.

2062

2063

@item 4

2064

Network failure.

2065

2066

@item 5

2067

SSL verification failure.

2068

2069

@item 6

2070

Username/password authentication failure.

2071

2072

@item 7

2073

Protocol errors.

2074

2075

@item 8

2076

Server issued an error response.

2077

@end table

2078

2079

2080

With the exceptions of 0 and 1, the lower-numbered exit codes take

2081

precedence over higher-numbered ones, when multiple types of errors

2082

are encountered.

2083

2084

In versions of Wget prior to 1.12, Wget's exit status tended to be

2085

unhelpful and inconsistent. Recursive downloads would virtually always

2086

return 0 (success), regardless of any issues encountered, and

2087

non-recursive fetches only returned the status corresponding to the

2088

most recently-attempted download.

2089

2090

@c man end

2091

2092

@node Recursive Download, Following Links, Invoking, Top

2093

@chapter Recursive Download

2094

@cindex recursion

2095

@cindex retrieving

2096

@cindex recursive download

2097

2098

GNU Wget is capable of traversing parts of the Web (or a single

2099

@sc{http} or @sc{ftp} server), following links and directory structure.

2100

We refer to this as to @dfn{recursive retrieval}, or @dfn{recursion}.

2101

2102

With @sc{http} @sc{url}s, Wget retrieves and parses the @sc{html} or

2103

@sc{css} from the given @sc{url}, retrieving the files the document

2104

refers to, through markup like @code{href} or @code{src}, or @sc{css}

2105

@sc{uri} values specified using the @samp{url()} functional notation.

2106

If the freshly downloaded file is also of type @code{text/html},

2107

@code{application/xhtml+xml}, or @code{text/css}, it will be parsed

2108

and followed further.

2109

2110

Recursive retrieval of @sc{http} and @sc{html}/@sc{css} content is

2111

@dfn{breadth-first}. This means that Wget first downloads the requested

2112

document, then the documents linked from that document, then the

2113

documents linked by them, and so on. In other words, Wget first

2114

downloads the documents at depth 1, then those at depth 2, and so on

2115

until the specified maximum depth.

2116

2117

The maximum @dfn{depth} to which the retrieval may descend is specified

2118

with the @samp{-l} option. The default maximum depth is five layers.

2119

2120

When retrieving an @sc{ftp} @sc{url} recursively, Wget will retrieve all

2121

the data from the given directory tree (including the subdirectories up

2122

to the specified depth) on the remote server, creating its mirror image

2123

locally. @sc{ftp} retrieval is also limited by the @code{depth}

2124

parameter. Unlike @sc{http} recursion, @sc{ftp} recursion is performed

2125

depth-first.

2126

2127

By default, Wget will create a local directory tree, corresponding to

2128

the one found on the remote server.

2129

2130

Recursive retrieving can find a number of applications, the most

2131

important of which is mirroring. It is also useful for @sc{www}

2132

presentations, and any other opportunities where slow network

2133

connections should be bypassed by storing the files locally.

2134

2135

You should be warned that recursive downloads can overload the remote

2136

servers. Because of that, many administrators frown upon them and may

2137

ban access from your site if they detect very fast downloads of big

2138

amounts of content. When downloading from Internet servers, consider

2139

using the @samp{-w} option to introduce a delay between accesses to the

2140

server. The download will take a while longer, but the server

2141

administrator will not be alarmed by your rudeness.

2142

2143

Of course, recursive download may cause problems on your machine. If

2144

left to run unchecked, it can easily fill up the disk. If downloading

2145

from local network, it can also take bandwidth on the system, as well as

2146

consume memory and CPU.

2147

2148

Try to specify the criteria that match the kind of download you are

2149

trying to achieve. If you want to download only one page, use

2150

@samp{--page-requisites} without any additional recursion. If you want

2151

to download things under one directory, use @samp{-np} to avoid

2152

downloading things from other directories. If you want to download all

2153

the files from one directory, use @samp{-l 1} to make sure the recursion

2154

depth never exceeds one. @xref{Following Links}, for more information

2155

about this.

2156

2157

Recursive retrieval should be used with care. Don't say you were not

2158

warned.

2159

2160

@node Following Links, Time-Stamping, Recursive Download, Top

2161

@chapter Following Links

2162

@cindex links

2163

@cindex following links

2164

2165

When retrieving recursively, one does not wish to retrieve loads of

2166

unnecessary data. Most of the time the users bear in mind exactly what

2167

they want to download, and want Wget to follow only specific links.

2168

2169

For example, if you wish to download the music archive from

2170

@samp{fly.srk.fer.hr}, you will not want to download all the home pages

2171

that happen to be referenced by an obscure part of the archive.

2172

2173

Wget possesses several mechanisms that allows you to fine-tune which

2174

links it will follow.

2175

2176

@menu

2177

* Spanning Hosts:: (Un)limiting retrieval based on host name.

2178

* Types of Files:: Getting only certain files.

2179

* Directory-Based Limits:: Getting only certain directories.

2180

* Relative Links:: Follow relative links only.

2181

* FTP Links:: Following FTP links.

2182

@end menu

2183

2184

@node Spanning Hosts, Types of Files, Following Links, Following Links

2185

@section Spanning Hosts

2186

@cindex spanning hosts

2187

@cindex hosts, spanning

2188

2189

Wget's recursive retrieval normally refuses to visit hosts different

2190

than the one you specified on the command line. This is a reasonable

2191

default; without it, every retrieval would have the potential to turn

2192

your Wget into a small version of google.

2193

2194

However, visiting different hosts, or @dfn{host spanning,} is sometimes

2195

a useful option. Maybe the images are served from a different server.

2196

Maybe you're mirroring a site that consists of pages interlinked between

2197

three servers. Maybe the server has two equivalent names, and the @sc{html}

2198

pages refer to both interchangeably.

2199

2200

@table @asis

2201

@item Span to any host---@samp{-H}

2202

2203

The @samp{-H} option turns on host spanning, thus allowing Wget's

2204

recursive run to visit any host referenced by a link. Unless sufficient

2205

recursion-limiting criteria are applied depth, these foreign hosts will

2206

typically link to yet more hosts, and so on until Wget ends up sucking

2207

up much more data than you have intended.

2208

2209

@item Limit spanning to certain domains---@samp{-D}

2210

2211

The @samp{-D} option allows you to specify the domains that will be

2212

followed, thus limiting the recursion only to the hosts that belong to

2213

these domains. Obviously, this makes sense only in conjunction with

2214

@samp{-H}. A typical example would be downloading the contents of

2215

@samp{www.server.com}, but allowing downloads from

2216

@samp{images.server.com}, etc.:

2217

2218

@example

2219

wget -rH -Dserver.com http://www.server.com/

2220

@end example

2221

2222

You can specify more than one address by separating them with a comma,

2223

e.g. @samp{-Ddomain1.com,domain2.com}.

2224

2225

@item Keep download off certain domains---@samp{--exclude-domains}

2226

2227

If there are domains you want to exclude specifically, you can do it

2228

with @samp{--exclude-domains}, which accepts the same type of arguments

2229

of @samp{-D}, but will @emph{exclude} all the listed domains. For

2230

example, if you want to download all the hosts from @samp{foo.edu}

2231

domain, with the exception of @samp{sunsite.foo.edu}, you can do it like

2232

this:

2233

2234

@example

2235

wget -rH -Dfoo.edu --exclude-domains sunsite.foo.edu \

2236

http://www.foo.edu/

2237

@end example

2238

2239

@end table

2240

2241

@node Types of Files, Directory-Based Limits, Spanning Hosts, Following Links

2242

@section Types of Files

2243

@cindex types of files

2244

2245

When downloading material from the web, you will often want to restrict

2246

the retrieval to only certain file types. For example, if you are

2247

interested in downloading @sc{gif}s, you will not be overjoyed to get

2248

loads of PostScript documents, and vice versa.

2249

2250

Wget offers two options to deal with this problem. Each option

2251

description lists a short name, a long name, and the equivalent command

2252

in @file{.wgetrc}.

2253

2254

@cindex accept wildcards

2255

@cindex accept suffixes

2256

@cindex wildcards, accept

2257

@cindex suffixes, accept

2258

@table @samp

2259

@item -A @var{acclist}

2260

@itemx --accept @var{acclist}

2261

@itemx accept = @var{acclist}

2262

The argument to @samp{--accept} option is a list of file suffixes or

2263

patterns that Wget will download during recursive retrieval. A suffix

2264

is the ending part of a file, and consists of ``normal'' letters,

2265

e.g. @samp{gif} or @samp{.jpg}. A matching pattern contains shell-like

2266

wildcards, e.g. @samp{books*} or @samp{zelazny*196[0-9]*}.

2267

2268

So, specifying @samp{wget -A gif,jpg} will make Wget download only the

2269

files ending with @samp{gif} or @samp{jpg}, i.e. @sc{gif}s and

2270

@sc{jpeg}s. On the other hand, @samp{wget -A "zelazny*196[0-9]*"} will

2271

download only files beginning with @samp{zelazny} and containing numbers

2272

from 1960 to 1969 anywhere within. Look up the manual of your shell for

2273

a description of how pattern matching works.

2274

2275

Of course, any number of suffixes and patterns can be combined into a

2276

comma-separated list, and given as an argument to @samp{-A}.

2277

2278

@cindex reject wildcards

2279

@cindex reject suffixes

2280

@cindex wildcards, reject

2281

@cindex suffixes, reject

2282

@item -R @var{rejlist}

2283

@itemx --reject @var{rejlist}

2284

@itemx reject = @var{rejlist}

2285

The @samp{--reject} option works the same way as @samp{--accept}, only

2286

its logic is the reverse; Wget will download all files @emph{except} the

2287

ones matching the suffixes (or patterns) in the list.

2288

2289

So, if you want to download a whole page except for the cumbersome

2290

@sc{mpeg}s and @sc{.au} files, you can use @samp{wget -R mpg,mpeg,au}.

2291

Analogously, to download all files except the ones beginning with

2292

@samp{bjork}, use @samp{wget -R "bjork*"}. The quotes are to prevent

2293

expansion by the shell.

2294

@end table

2295

2296

@noindent

2297

The @samp{-A} and @samp{-R} options may be combined to achieve even

2298

better fine-tuning of which files to retrieve. E.g. @samp{wget -A

2299

"*zelazny*" -R .ps} will download all the files having @samp{zelazny} as

2300

a part of their name, but @emph{not} the PostScript files.

2301

2302

Note that these two options do not affect the downloading of @sc{html}

2303

files (as determined by a @samp{.htm} or @samp{.html} filename

2304

prefix). This behavior may not be desirable for all users, and may be

2305

changed for future versions of Wget.

2306

2307

Note, too, that query strings (strings at the end of a URL beginning

2308

with a question mark (@samp{?}) are not included as part of the

2309

filename for accept/reject rules, even though these will actually

2310

contribute to the name chosen for the local file. It is expected that

2311

a future version of Wget will provide an option to allow matching

2312

against query strings.

2313

2314

Finally, it's worth noting that the accept/reject lists are matched

2315

@emph{twice} against downloaded files: once against the URL's filename

2316

portion, to determine if the file should be downloaded in the first

2317

place; then, after it has been accepted and successfully downloaded,

2318

the local file's name is also checked against the accept/reject lists

2319

to see if it should be removed. The rationale was that, since

2320

@samp{.htm} and @samp{.html} files are always downloaded regardless of

2321

accept/reject rules, they should be removed @emph{after} being

2322

downloaded and scanned for links, if they did match the accept/reject

2323

lists. However, this can lead to unexpected results, since the local

2324

filenames can differ from the original URL filenames in the following

2325

ways, all of which can change whether an accept/reject rule matches:

2326

2327

@itemize @bullet

2328

@item

2329

If the local file already exists and @samp{--no-directories} was

2330

specified, a numeric suffix will be appended to the original name.

2331

@item

2332

If @samp{--adjust-extension} was specified, the local filename might have

2333

@samp{.html} appended to it. If Wget is invoked with @samp{-E -A.php},

2334

a filename such as @samp{index.php} will match be accepted, but upon

2335

download will be named @samp{index.php.html}, which no longer matches,

2336

and so the file will be deleted.

2337

@item

2338

Query strings do not contribute to URL matching, but are included in

2339

local filenames, and so @emph{do} contribute to filename matching.

2340

@end itemize

2341

2342

@noindent

2343

This behavior, too, is considered less-than-desirable, and may change

2344

in a future version of Wget.

2345

2346

@node Directory-Based Limits, Relative Links, Types of Files, Following Links

2347

@section Directory-Based Limits

2348

@cindex directories

2349

@cindex directory limits

2350

2351

Regardless of other link-following facilities, it is often useful to

2352

place the restriction of what files to retrieve based on the directories

2353

those files are placed in. There can be many reasons for this---the

2354

home pages may be organized in a reasonable directory structure; or some

2355

directories may contain useless information, e.g. @file{/cgi-bin} or

2356

@file{/dev} directories.

2357

2358

Wget offers three different options to deal with this requirement. Each

2359

option description lists a short name, a long name, and the equivalent

2360

command in @file{.wgetrc}.

2361

2362

@cindex directories, include

2363

@cindex include directories

2364

@cindex accept directories

2365

@table @samp

2366

@item -I @var{list}

2367

@itemx --include @var{list}

2368

@itemx include_directories = @var{list}

2369

@samp{-I} option accepts a comma-separated list of directories included

2370

in the retrieval. Any other directories will simply be ignored. The

2371

directories are absolute paths.

2372

2373

So, if you wish to download from @samp{http://host/people/bozo/}

2374

following only links to bozo's colleagues in the @file{/people}

2375

directory and the bogus scripts in @file{/cgi-bin}, you can specify:

2376

2377

@example

2378

wget -I /people,/cgi-bin http://host/people/bozo/

2379

@end example

2380

2381

@cindex directories, exclude

2382

@cindex exclude directories

2383

@cindex reject directories

2384

@item -X @var{list}

2385

@itemx --exclude @var{list}

2386

@itemx exclude_directories = @var{list}

2387

@samp{-X} option is exactly the reverse of @samp{-I}---this is a list of

2388

directories @emph{excluded} from the download. E.g. if you do not want

2389

Wget to download things from @file{/cgi-bin} directory, specify @samp{-X

2390

/cgi-bin} on the command line.

2391

2392

The same as with @samp{-A}/@samp{-R}, these two options can be combined

2393

to get a better fine-tuning of downloading subdirectories. E.g. if you

2394

want to load all the files from @file{/pub} hierarchy except for

2395

@file{/pub/worthless}, specify @samp{-I/pub -X/pub/worthless}.

2396

2397

@cindex no parent

2398

@item -np

2399

@itemx --no-parent

2400

@itemx no_parent = on

2401

The simplest, and often very useful way of limiting directories is

2402

disallowing retrieval of the links that refer to the hierarchy

2403

@dfn{above} than the beginning directory, i.e. disallowing ascent to the

2404

parent directory/directories.

2405

2406

The @samp{--no-parent} option (short @samp{-np}) is useful in this case.

2407

Using it guarantees that you will never leave the existing hierarchy.

2408

Supposing you issue Wget with:

2409

2410

@example

2411

wget -r --no-parent http://somehost/~luzer/my-archive/

2412

@end example

2413

2414

You may rest assured that none of the references to

2415

@file{/~his-girls-homepage/} or @file{/~luzer/all-my-mpegs/} will be

2416

followed. Only the archive you are interested in will be downloaded.

2417

Essentially, @samp{--no-parent} is similar to

2418

@samp{-I/~luzer/my-archive}, only it handles redirections in a more

2419

intelligent fashion.

2420

2421

@strong{Note} that, for HTTP (and HTTPS), the trailing slash is very

2422

important to @samp{--no-parent}. HTTP has no concept of a ``directory''---Wget

2423

relies on you to indicate what's a directory and what isn't. In

2424

@samp{http://foo/bar/}, Wget will consider @samp{bar} to be a

2425

directory, while in @samp{http://foo/bar} (no trailing slash),

2426

@samp{bar} will be considered a filename (so @samp{--no-parent} would be

2427

meaningless, as its parent is @samp{/}).

2428

@end table

2429

2430

@node Relative Links, FTP Links, Directory-Based Limits, Following Links

2431

@section Relative Links

2432

@cindex relative links

2433

2434

When @samp{-L} is turned on, only the relative links are ever followed.

2435

Relative links are here defined those that do not refer to the web

2436

server root. For example, these links are relative:

2437

2438

@example

2439

2440

2441

2442

@end example

2443

2444

These links are not relative:

2445

2446

@example

2447

2448

2449

2450

@end example

2451

2452

Using this option guarantees that recursive retrieval will not span

2453

hosts, even without @samp{-H}. In simple cases it also allows downloads

2454

to ``just work'' without having to convert links.

2455

2456

This option is probably not very useful and might be removed in a future

2457

release.

2458

2459

@node FTP Links, , Relative Links, Following Links

2460

@section Following FTP Links

2461

@cindex following ftp links

2462

2463

The rules for @sc{ftp} are somewhat specific, as it is necessary for

2464

them to be. @sc{ftp} links in @sc{html} documents are often included

2465

for purposes of reference, and it is often inconvenient to download them

2466

by default.

2467

2468

To have @sc{ftp} links followed from @sc{html} documents, you need to

2469

specify the @samp{--follow-ftp} option. Having done that, @sc{ftp}

2470

links will span hosts regardless of @samp{-H} setting. This is logical,

2471

as @sc{ftp} links rarely point to the same host where the @sc{http}

2472

server resides. For similar reasons, the @samp{-L} options has no

2473

effect on such downloads. On the other hand, domain acceptance

2474

(@samp{-D}) and suffix rules (@samp{-A} and @samp{-R}) apply normally.

2475

2476

Also note that followed links to @sc{ftp} directories will not be

2477

retrieved recursively further.

2478

2479

@node Time-Stamping, Startup File, Following Links, Top

2480

@chapter Time-Stamping

2481

@cindex time-stamping

2482

@cindex timestamping

2483

@cindex updating the archives

2484

@cindex incremental updating

2485

2486

One of the most important aspects of mirroring information from the

2487

Internet is updating your archives.

2488

2489

Downloading the whole archive again and again, just to replace a few

2490

changed files is expensive, both in terms of wasted bandwidth and money,

2491

and the time to do the update. This is why all the mirroring tools

2492

offer the option of incremental updating.

2493

2494

Such an updating mechanism means that the remote server is scanned in

2495

search of @dfn{new} files. Only those new files will be downloaded in

2496

the place of the old ones.

2497

2498

A file is considered new if one of these two conditions are met:

2499

2500

@enumerate

2501

@item

2502

A file of that name does not already exist locally.

2503

2504

@item

2505

A file of that name does exist, but the remote file was modified more

2506

recently than the local file.

2507

@end enumerate

2508

2509

To implement this, the program needs to be aware of the time of last

2510

modification of both local and remote files. We call this information the

2511

@dfn{time-stamp} of a file.

2512

2513

The time-stamping in GNU Wget is turned on using @samp{--timestamping}

2514

(@samp{-N}) option, or through @code{timestamping = on} directive in

2515

@file{.wgetrc}. With this option, for each file it intends to download,

2516

Wget will check whether a local file of the same name exists. If it

2517

does, and the remote file is not newer, Wget will not download it.

2518

2519

If the local file does not exist, or the sizes of the files do not

2520

match, Wget will download the remote file no matter what the time-stamps

2521

say.

2522

2523

@menu

2524

* Time-Stamping Usage::

2525

* HTTP Time-Stamping Internals::

2526

* FTP Time-Stamping Internals::

2527

@end menu

2528

2529

@node Time-Stamping Usage, HTTP Time-Stamping Internals, Time-Stamping, Time-Stamping

2530

@section Time-Stamping Usage

2531

@cindex time-stamping usage

2532

@cindex usage, time-stamping

2533

2534

The usage of time-stamping is simple. Say you would like to download a

2535

file so that it keeps its date of modification.

2536

2537

@example

2538

wget -S http://www.gnu.ai.mit.edu/

2539

@end example

2540

2541

A simple @code{ls -l} shows that the time stamp on the local file equals

2542

the state of the @code{Last-Modified} header, as returned by the server.

2543

As you can see, the time-stamping info is preserved locally, even

2544

without @samp{-N} (at least for @sc{http}).

2545

2546

Several days later, you would like Wget to check if the remote file has

2547

changed, and download it if it has.

2548

2549

@example

2550

wget -N http://www.gnu.ai.mit.edu/

2551

@end example

2552

2553

Wget will ask the server for the last-modified date. If the local file

2554

has the same timestamp as the server, or a newer one, the remote file

2555

will not be re-fetched. However, if the remote file is more recent,

2556

Wget will proceed to fetch it.

2557

2558

The same goes for @sc{ftp}. For example:

2559

2560

@example

2561

wget "ftp://ftp.ifi.uio.no/pub/emacs/gnus/*"

2562

@end example

2563

2564

(The quotes around that URL are to prevent the shell from trying to

2565

interpret the @samp{*}.)

2566

2567

After download, a local directory listing will show that the timestamps

2568

match those on the remote server. Reissuing the command with @samp{-N}

2569

will make Wget re-fetch @emph{only} the files that have been modified

2570

since the last download.

2571

2572

If you wished to mirror the GNU archive every week, you would use a

2573

command like the following, weekly:

2574

2575

@example

2576

wget --timestamping -r ftp://ftp.gnu.org/pub/gnu/

2577

@end example

2578

2579

Note that time-stamping will only work for files for which the server

2580

gives a timestamp. For @sc{http}, this depends on getting a

2581

@code{Last-Modified} header. For @sc{ftp}, this depends on getting a

2582

directory listing with dates in a format that Wget can parse

2583

(@pxref{FTP Time-Stamping Internals}).

2584

2585

@node HTTP Time-Stamping Internals, FTP Time-Stamping Internals, Time-Stamping Usage, Time-Stamping

2586

@section HTTP Time-Stamping Internals

2587

@cindex http time-stamping

2588

2589

Time-stamping in @sc{http} is implemented by checking of the

2590

@code{Last-Modified} header. If you wish to retrieve the file

2591

@file{foo.html} through @sc{http}, Wget will check whether

2592

@file{foo.html} exists locally. If it doesn't, @file{foo.html} will be

2593

retrieved unconditionally.

2594

2595

If the file does exist locally, Wget will first check its local

2596

time-stamp (similar to the way @code{ls -l} checks it), and then send a

2597

@code{HEAD} request to the remote server, demanding the information on

2598

the remote file.

2599

2600

The @code{Last-Modified} header is examined to find which file was

2601

modified more recently (which makes it ``newer''). If the remote file

2602

is newer, it will be downloaded; if it is older, Wget will give

2603

up.@footnote{As an additional check, Wget will look at the

2604

@code{Content-Length} header, and compare the sizes; if they are not the

2605

same, the remote file will be downloaded no matter what the time-stamp

2606

says.}

2607

2608

When @samp{--backup-converted} (@samp{-K}) is specified in conjunction

2609

with @samp{-N}, server file @samp{@var{X}} is compared to local file

2610

@samp{@var{X}.orig}, if extant, rather than being compared to local file

2611

@samp{@var{X}}, which will always differ if it's been converted by

2612

@samp{--convert-links} (@samp{-k}).

2613

2614

Arguably, @sc{http} time-stamping should be implemented using the

2615

@code{If-Modified-Since} request.

2616

2617

@node FTP Time-Stamping Internals, , HTTP Time-Stamping Internals, Time-Stamping

2618

@section FTP Time-Stamping Internals

2619

@cindex ftp time-stamping

2620

2621

In theory, @sc{ftp} time-stamping works much the same as @sc{http}, only

2622

@sc{ftp} has no headers---time-stamps must be ferreted out of directory

2623

listings.

2624

2625

If an @sc{ftp} download is recursive or uses globbing, Wget will use the

2626

@sc{ftp} @code{LIST} command to get a file listing for the directory

2627

containing the desired file(s). It will try to analyze the listing,

2628

treating it like Unix @code{ls -l} output, extracting the time-stamps.

2629

The rest is exactly the same as for @sc{http}. Note that when

2630

retrieving individual files from an @sc{ftp} server without using

2631

globbing or recursion, listing files will not be downloaded (and thus

2632

files will not be time-stamped) unless @samp{-N} is specified.

2633

2634

Assumption that every directory listing is a Unix-style listing may

2635

sound extremely constraining, but in practice it is not, as many

2636

non-Unix @sc{ftp} servers use the Unixoid listing format because most

2637

(all?) of the clients understand it. Bear in mind that @sc{rfc959}

2638

defines no standard way to get a file list, let alone the time-stamps.

2639

We can only hope that a future standard will define this.

2640

2641

Another non-standard solution includes the use of @code{MDTM} command

2642

that is supported by some @sc{ftp} servers (including the popular

2643

@code{wu-ftpd}), which returns the exact time of the specified file.

2644

Wget may support this command in the future.

2645

2646

@node Startup File, Examples, Time-Stamping, Top

2647

@chapter Startup File

2648

@cindex startup file

2649

@cindex wgetrc

2650

@cindex .wgetrc

2651

@cindex startup

2652

@cindex .netrc

2653

2654

Once you know how to change default settings of Wget through command

2655

line arguments, you may wish to make some of those settings permanent.

2656

You can do that in a convenient way by creating the Wget startup

2657

file---@file{.wgetrc}.

2658

2659

Besides @file{.wgetrc} is the ``main'' initialization file, it is

2660

convenient to have a special facility for storing passwords. Thus Wget

2661

reads and interprets the contents of @file{$HOME/.netrc}, if it finds

2662

it. You can find @file{.netrc} format in your system manuals.

2663

2664

Wget reads @file{.wgetrc} upon startup, recognizing a limited set of

2665

commands.

2666

2667

@menu

2668

* Wgetrc Location:: Location of various wgetrc files.

2669

* Wgetrc Syntax:: Syntax of wgetrc.

2670

* Wgetrc Commands:: List of available commands.

2671

* Sample Wgetrc:: A wgetrc example.

2672

@end menu

2673

2674

@node Wgetrc Location, Wgetrc Syntax, Startup File, Startup File

2675

@section Wgetrc Location

2676

@cindex wgetrc location

2677

@cindex location of wgetrc

2678

2679

When initializing, Wget will look for a @dfn{global} startup file,

2680

@file{/etc/wgetrc} by default and read commands from there, if it exists.

2681

2682

Then it will look for the user's file. If the environmental variable

2683

@code{WGETRC} is set, Wget will try to load that file. Failing that, no

2684

further attempts will be made.

2685

2686

If @code{WGETRC} is not set, Wget will try to load @file{$HOME/.wgetrc}.

2687

2688

The fact that user's settings are loaded after the system-wide ones

2689

means that in case of collision user's wgetrc @emph{overrides} the

2690

system-wide wgetrc (in @file{/etc/wgetrc} by default).

2691

Fascist admins, away!

2692

2693

@node Wgetrc Syntax, Wgetrc Commands, Wgetrc Location, Startup File

2694

@section Wgetrc Syntax

2695

@cindex wgetrc syntax

2696

@cindex syntax of wgetrc

2697

2698

The syntax of a wgetrc command is simple:

2699

2700

@example

2701

variable = value

2702

@end example

2703

2704

The @dfn{variable} will also be called @dfn{command}. Valid

2705

@dfn{values} are different for different commands.

2706

2707

The commands are case-insensitive and underscore-insensitive. Thus

2708

@samp{DIr__PrefiX} is the same as @samp{dirprefix}. Empty lines, lines

2709

beginning with @samp{#} and lines containing white-space only are

2710

discarded.

2711

2712

Commands that expect a comma-separated list will clear the list on an

2713

empty command. So, if you wish to reset the rejection list specified in

2714

global @file{wgetrc}, you can do it with:

2715

2716

@example

2717

reject =

2718

@end example

2719

2720

@node Wgetrc Commands, Sample Wgetrc, Wgetrc Syntax, Startup File

2721

@section Wgetrc Commands

2722

@cindex wgetrc commands

2723

2724

The complete set of commands is listed below. Legal values are listed

2725

after the @samp{=}. Simple Boolean values can be set or unset using

2726

@samp{on} and @samp{off} or @samp{1} and @samp{0}.

2727

2728

Some commands take pseudo-arbitrary values. @var{address} values can be

2729

hostnames or dotted-quad IP addresses. @var{n} can be any positive

2730

integer, or @samp{inf} for infinity, where appropriate. @var{string}

2731

values can be any non-empty string.

2732

2733

Most of these commands have direct command-line equivalents. Also, any

2734

wgetrc command can be specified on the command line using the

2735

@samp{--execute} switch (@pxref{Basic Startup Options}.)

2736

2737

@table @asis

2738

@item accept/reject = @var{string}

2739

Same as @samp{-A}/@samp{-R} (@pxref{Types of Files}).

2740

2741

@item add_hostdir = on/off

2742

Enable/disable host-prefixed file names. @samp{-nH} disables it.

2743

2744

@item ask_password = on/off

2745

Prompt for a password for each connection established. Cannot be specified

2746

when @samp{--password} is being used, because they are mutually

2747

exclusive. Equivalent to @samp{--ask-password}.

2748

2749

@item auth_no_challenge = on/off

2750

If this option is given, Wget will send Basic HTTP authentication

2751

information (plaintext username and password) for all requests. See

2752

@samp{--auth-no-challenge}.

2753

2754

@item background = on/off

2755

Enable/disable going to background---the same as @samp{-b} (which

2756

enables it).

2757

2758

@item backup_converted = on/off

2759

Enable/disable saving pre-converted files with the suffix

2760

@samp{.orig}---the same as @samp{-K} (which enables it).

2761

2762

@c @item backups = @var{number}

2763

@c #### Document me!

2764

2765

@item base = @var{string}

2766

Consider relative @sc{url}s in input files (specified via the

2767

@samp{input} command or the @samp{--input-file}/@samp{-i} option,

2768

together with @samp{force_html} or @samp{--force-html})

2769

as being relative to @var{string}---the same as @samp{--base=@var{string}}.

2770

2771

@item bind_address = @var{address}

2772

Bind to @var{address}, like the @samp{--bind-address=@var{address}}.

2773

2774

@item ca_certificate = @var{file}

2775

Set the certificate authority bundle file to @var{file}. The same

2776

as @samp{--ca-certificate=@var{file}}.

2777

2778

@item ca_directory = @var{directory}

2779

Set the directory used for certificate authorities. The same as

2780

@samp{--ca-directory=@var{directory}}.

2781

2782

@item cache = on/off

2783

When set to off, disallow server-caching. See the @samp{--no-cache}

2784

option.

2785

2786

@item certificate = @var{file}

2787

Set the client certificate file name to @var{file}. The same as

2788

@samp{--certificate=@var{file}}.

2789

2790

@item certificate_type = @var{string}

2791

Specify the type of the client certificate, legal values being

2792

@samp{PEM} (the default) and @samp{DER} (aka ASN1). The same as

2793

@samp{--certificate-type=@var{string}}.

2794

2795

@item check_certificate = on/off

2796

If this is set to off, the server certificate is not checked against

2797

the specified client authorities. The default is ``on''. The same as

2798

@samp{--check-certificate}.

2799

2800

@item connect_timeout = @var{n}

2801

Set the connect timeout---the same as @samp{--connect-timeout}.

2802

2803

@item content_disposition = on/off

2804

Turn on recognition of the (non-standard) @samp{Content-Disposition}

2805

HTTP header---if set to @samp{on}, the same as @samp{--content-disposition}.

2806

2807

@item trust_server_names = on/off

2808

If set to on, use the last component of a redirection URL for the local

2809

file name.

2810

2811

@item continue = on/off

2812

If set to on, force continuation of preexistent partially retrieved

2813

files. See @samp{-c} before setting it.

2814

2815

@item convert_links = on/off

2816

Convert non-relative links locally. The same as @samp{-k}.

2817

2818

@item cookies = on/off

2819

When set to off, disallow cookies. See the @samp{--cookies} option.

2820

2821

@item cut_dirs = @var{n}

2822

Ignore @var{n} remote directory components. Equivalent to

2823

@samp{--cut-dirs=@var{n}}.

2824

2825

@item debug = on/off

2826

Debug mode, same as @samp{-d}.

2827

2828

@item default_page = @var{string}

2829

Default page name---the same as @samp{--default-page=@var{string}}.

2830

2831

@item delete_after = on/off

2832

Delete after download---the same as @samp{--delete-after}.

2833

2834

@item dir_prefix = @var{string}

2835

Top of directory tree---the same as @samp{-P @var{string}}.

2836

2837

@item dirstruct = on/off

2838

Turning dirstruct on or off---the same as @samp{-x} or @samp{-nd},

2839

respectively.

2840

2841

@item dns_cache = on/off

2842

Turn DNS caching on/off. Since DNS caching is on by default, this

2843

option is normally used to turn it off and is equivalent to

2844

@samp{--no-dns-cache}.

2845

2846

@item dns_timeout = @var{n}

2847

Set the DNS timeout---the same as @samp{--dns-timeout}.

2848

2849

@item domains = @var{string}

2850

Same as @samp{-D} (@pxref{Spanning Hosts}).

2851

2852

@item dot_bytes = @var{n}

2853

Specify the number of bytes ``contained'' in a dot, as seen throughout

2854

the retrieval (1024 by default). You can postfix the value with

2855

@samp{k} or @samp{m}, representing kilobytes and megabytes,

2856

respectively. With dot settings you can tailor the dot retrieval to

2857

suit your needs, or you can use the predefined @dfn{styles}

2858

(@pxref{Download Options}).

2859

2860

@item dot_spacing = @var{n}

2861

Specify the number of dots in a single cluster (10 by default).

2862

2863

@item dots_in_line = @var{n}

2864

Specify the number of dots that will be printed in each line throughout

2865

the retrieval (50 by default).

2866

2867

@item egd_file = @var{file}

2868

Use @var{string} as the EGD socket file name. The same as

2869

@samp{--egd-file=@var{file}}.

2870

2871

@item exclude_directories = @var{string}

2872

Specify a comma-separated list of directories you wish to exclude from

2873

download---the same as @samp{-X @var{string}} (@pxref{Directory-Based

2874

Limits}).

2875

2876

@item exclude_domains = @var{string}

2877

Same as @samp{--exclude-domains=@var{string}} (@pxref{Spanning

2878

Hosts}).

2879

2880

@item follow_ftp = on/off

2881

Follow @sc{ftp} links from @sc{html} documents---the same as

2882

@samp{--follow-ftp}.

2883

2884

@item follow_tags = @var{string}

2885

Only follow certain @sc{html} tags when doing a recursive retrieval,

2886

just like @samp{--follow-tags=@var{string}}.

2887

2888

@item force_html = on/off

2889

If set to on, force the input filename to be regarded as an @sc{html}

2890

document---the same as @samp{-F}.

2891

2892

@item ftp_password = @var{string}

2893

Set your @sc{ftp} password to @var{string}. Without this setting, the

2894

password defaults to @samp{-wget@@}, which is a useful default for

2895

anonymous @sc{ftp} access.

2896

2897

This command used to be named @code{passwd} prior to Wget 1.10.

2898

2899

@item ftp_proxy = @var{string}

2900

Use @var{string} as @sc{ftp} proxy, instead of the one specified in

2901

environment.

2902

2903

@item ftp_user = @var{string}

2904

Set @sc{ftp} user to @var{string}.

2905

2906

This command used to be named @code{login} prior to Wget 1.10.

2907

2908

@item glob = on/off

2909

Turn globbing on/off---the same as @samp{--glob} and @samp{--no-glob}.

2910

2911

@item header = @var{string}

2912

Define a header for HTTP downloads, like using

2913

@samp{--header=@var{string}}.

2914

2915

@item adjust_extension = on/off

2916

Add a @samp{.html} extension to @samp{text/html} or

2917

@samp{application/xhtml+xml} files that lack one, or a @samp{.css}

2918

extension to @samp{text/css} files that lack one, like

2919

@samp{-E}. Previously named @samp{html_extension} (still acceptable,

2920

but deprecated).

2921

2922

@item http_keep_alive = on/off

2923

Turn the keep-alive feature on or off (defaults to on). Turning it

2924

off is equivalent to @samp{--no-http-keep-alive}.

2925

2926

@item http_password = @var{string}

2927

Set @sc{http} password, equivalent to

2928

@samp{--http-password=@var{string}}.

2929

2930

@item http_proxy = @var{string}

2931

Use @var{string} as @sc{http} proxy, instead of the one specified in

2932

environment.

2933

2934

@item http_user = @var{string}

2935

Set @sc{http} user to @var{string}, equivalent to

2936

@samp{--http-user=@var{string}}.

2937

2938

@item https_proxy = @var{string}

2939

Use @var{string} as @sc{https} proxy, instead of the one specified in

2940

environment.

2941

2942

@item ignore_case = on/off

2943

When set to on, match files and directories case insensitively; the

2944

same as @samp{--ignore-case}.

2945

2946

@item ignore_length = on/off

2947

When set to on, ignore @code{Content-Length} header; the same as

2948

@samp{--ignore-length}.

2949

2950

@item ignore_tags = @var{string}

2951

Ignore certain @sc{html} tags when doing a recursive retrieval, like

2952

@samp{--ignore-tags=@var{string}}.

2953

2954

@item include_directories = @var{string}

2955

Specify a comma-separated list of directories you wish to follow when

2956

downloading---the same as @samp{-I @var{string}}.

2957

2958

@item iri = on/off

2959

When set to on, enable internationalized URI (IRI) support; the same as

2960

@samp{--iri}.

2961

2962

@item inet4_only = on/off

2963

Force connecting to IPv4 addresses, off by default. You can put this

2964

in the global init file to disable Wget's attempts to resolve and

2965

connect to IPv6 hosts. Available only if Wget was compiled with IPv6

2966

support. The same as @samp{--inet4-only} or @samp{-4}.

2967

2968

@item inet6_only = on/off

2969

Force connecting to IPv6 addresses, off by default. Available only if

2970

Wget was compiled with IPv6 support. The same as @samp{--inet6-only}

2971

or @samp{-6}.

2972

2973

@item input = @var{file}

2974

Read the @sc{url}s from @var{string}, like @samp{-i @var{file}}.

2975

2976

@item keep_session_cookies = on/off

2977

When specified, causes @samp{save_cookies = on} to also save session

2978

cookies. See @samp{--keep-session-cookies}.

2979

2980

@item limit_rate = @var{rate}

2981

Limit the download speed to no more than @var{rate} bytes per second.

2982

The same as @samp{--limit-rate=@var{rate}}.

2983

2984

@item load_cookies = @var{file}

2985

Load cookies from @var{file}. See @samp{--load-cookies @var{file}}.

2986

2987

@item local_encoding = @var{encoding}

2988

Force Wget to use @var{encoding} as the default system encoding. See

2989

@samp{--local-encoding}.

2990

2991

@item logfile = @var{file}

2992

Set logfile to @var{file}, the same as @samp{-o @var{file}}.

2993

2994

@item max_redirect = @var{number}

2995

Specifies the maximum number of redirections to follow for a resource.

2996

See @samp{--max-redirect=@var{number}}.

2997

2998

@item mirror = on/off

2999

Turn mirroring on/off. The same as @samp{-m}.

3000

3001

@item netrc = on/off

3002

Turn reading netrc on or off.

3003

3004

@item no_clobber = on/off

3005

Same as @samp{-nc}.

3006

3007

@item no_parent = on/off

3008

Disallow retrieving outside the directory hierarchy, like

3009

@samp{--no-parent} (@pxref{Directory-Based Limits}).

3010

3011

@item no_proxy = @var{string}

3012

Use @var{string} as the comma-separated list of domains to avoid in

3013

proxy loading, instead of the one specified in environment.

3014

3015

@item output_document = @var{file}

3016

Set the output filename---the same as @samp{-O @var{file}}.

3017

3018

@item page_requisites = on/off

3019

Download all ancillary documents necessary for a single @sc{html} page to

3020

display properly---the same as @samp{-p}.

3021

3022

@item passive_ftp = on/off

3023

Change setting of passive @sc{ftp}, equivalent to the

3024

@samp{--passive-ftp} option.

3025

3026

@itemx password = @var{string}

3027

Specify password @var{string} for both @sc{ftp} and @sc{http} file retrieval.

3028

This command can be overridden using the @samp{ftp_password} and

3029

@samp{http_password} command for @sc{ftp} and @sc{http} respectively.

3030

3031

@item post_data = @var{string}

3032

Use POST as the method for all HTTP requests and send @var{string} in

3033

the request body. The same as @samp{--post-data=@var{string}}.

3034

3035

@item post_file = @var{file}

3036

Use POST as the method for all HTTP requests and send the contents of

3037

@var{file} in the request body. The same as

3038

@samp{--post-file=@var{file}}.

3039

3040

@item prefer_family = none/IPv4/IPv6

3041

When given a choice of several addresses, connect to the addresses

3042

with specified address family first. The address order returned by

3043

DNS is used without change by default. The same as @samp{--prefer-family},

3044

which see for a detailed discussion of why this is useful.

3045

3046

@item private_key = @var{file}

3047

Set the private key file to @var{file}. The same as

3048

@samp{--private-key=@var{file}}.

3049

3050

@item private_key_type = @var{string}

3051

Specify the type of the private key, legal values being @samp{PEM}

3052

(the default) and @samp{DER} (aka ASN1). The same as

3053

@samp{--private-type=@var{string}}.

3054

3055

@item progress = @var{string}

3056

Set the type of the progress indicator. Legal types are @samp{dot}

3057

and @samp{bar}. Equivalent to @samp{--progress=@var{string}}.

3058

3059

@item protocol_directories = on/off

3060

When set, use the protocol name as a directory component of local file

3061

names. The same as @samp{--protocol-directories}.

3062

3063

@item proxy_password = @var{string}

3064

Set proxy authentication password to @var{string}, like

3065

@samp{--proxy-password=@var{string}}.

3066

3067

@item proxy_user = @var{string}

3068

Set proxy authentication user name to @var{string}, like

3069

@samp{--proxy-user=@var{string}}.

3070

3071

@item quiet = on/off

3072

Quiet mode---the same as @samp{-q}.

3073

3074

@item quota = @var{quota}

3075

Specify the download quota, which is useful to put in the global

3076

@file{wgetrc}. When download quota is specified, Wget will stop

3077

retrieving after the download sum has become greater than quota. The

3078

quota can be specified in bytes (default), kbytes @samp{k} appended) or

3079

mbytes (@samp{m} appended). Thus @samp{quota = 5m} will set the quota

3080

to 5 megabytes. Note that the user's startup file overrides system

3081

settings.

3082

3083

@item random_file = @var{file}

3084

Use @var{file} as a source of randomness on systems lacking

3085

@file{/dev/random}.

3086

3087

@item random_wait = on/off

3088

Turn random between-request wait times on or off. The same as

3089

@samp{--random-wait}.

3090

3091

@item read_timeout = @var{n}

3092

Set the read (and write) timeout---the same as

3093

@samp{--read-timeout=@var{n}}.

3094

3095

@item reclevel = @var{n}

3096

Recursion level (depth)---the same as @samp{-l @var{n}}.

3097

3098

@item recursive = on/off

3099

Recursive on/off---the same as @samp{-r}.

3100

3101

@item referer = @var{string}

3102

Set HTTP @samp{Referer:} header just like

3103

@samp{--referer=@var{string}}. (Note that it was the folks who wrote

3104

the @sc{http} spec who got the spelling of ``referrer'' wrong.)

3105

3106

@item relative_only = on/off

3107

Follow only relative links---the same as @samp{-L} (@pxref{Relative

3108

Links}).

3109

3110

@item remote_encoding = @var{encoding}

3111

Force Wget to use @var{encoding} as the default remote server encoding.

3112

See @samp{--remote-encoding}.

3113

3114

@item remove_listing = on/off

3115

If set to on, remove @sc{ftp} listings downloaded by Wget. Setting it

3116

to off is the same as @samp{--no-remove-listing}.

3117

3118

@item restrict_file_names = unix/windows

3119

Restrict the file names generated by Wget from URLs. See

3120

@samp{--restrict-file-names} for a more detailed description.

3121

3122

@item retr_symlinks = on/off

3123

When set to on, retrieve symbolic links as if they were plain files; the

3124

same as @samp{--retr-symlinks}.

3125

3126

@item retry_connrefused = on/off

3127

When set to on, consider ``connection refused'' a transient

3128

error---the same as @samp{--retry-connrefused}.

3129

3130

@item robots = on/off

3131

Specify whether the norobots convention is respected by Wget, ``on'' by

3132

default. This switch controls both the @file{/robots.txt} and the

3133

@samp{nofollow} aspect of the spec. @xref{Robot Exclusion}, for more

3134

details about this. Be sure you know what you are doing before turning

3135

this off.

3136

3137

@item save_cookies = @var{file}

3138

Save cookies to @var{file}. The same as @samp{--save-cookies

3139

@var{file}}.

3140

3141

@item save_headers = on/off

3142

Same as @samp{--save-headers}.

3143

3144

@item secure_protocol = @var{string}

3145

Choose the secure protocol to be used. Legal values are @samp{auto}

3146

(the default), @samp{SSLv2}, @samp{SSLv3}, and @samp{TLSv1}. The same

3147

as @samp{--secure-protocol=@var{string}}.

3148

3149

@item server_response = on/off

3150

Choose whether or not to print the @sc{http} and @sc{ftp} server

3151

responses---the same as @samp{-S}.

3152

3153

@item span_hosts = on/off

3154

Same as @samp{-H}.

3155

3156

@item spider = on/off

3157

Same as @samp{--spider}.

3158

3159

@item strict_comments = on/off

3160

Same as @samp{--strict-comments}.

3161

3162

@item timeout = @var{n}

3163

Set all applicable timeout values to @var{n}, the same as @samp{-T

3164

@var{n}}.

3165

3166

@item timestamping = on/off

3167

Turn timestamping on/off. The same as @samp{-N} (@pxref{Time-Stamping}).

3168

3169

@item tries = @var{n}

3170

Set number of retries per @sc{url}---the same as @samp{-t @var{n}}.

3171

3172

@item use_proxy = on/off

3173

When set to off, don't use proxy even when proxy-related environment

3174

variables are set. In that case it is the same as using

3175

@samp{--no-proxy}.

3176

3177

@item user = @var{string}

3178

Specify username @var{string} for both @sc{ftp} and @sc{http} file retrieval.

3179

This command can be overridden using the @samp{ftp_user} and

3180

@samp{http_user} command for @sc{ftp} and @sc{http} respectively.

3181

3182

@item user_agent = @var{string}

3183

User agent identification sent to the HTTP Server---the same as

3184

@samp{--user-agent=@var{string}}.

3185

3186

@item verbose = on/off

3187

Turn verbose on/off---the same as @samp{-v}/@samp{-nv}.

3188

3189

@item wait = @var{n}

3190

Wait @var{n} seconds between retrievals---the same as @samp{-w

3191

@var{n}}.

3192

3193

@item wait_retry = @var{n}

3194

Wait up to @var{n} seconds between retries of failed retrievals

3195

only---the same as @samp{--waitretry=@var{n}}. Note that this is

3196

turned on by default in the global @file{wgetrc}.

3197

@end table

3198

3199

@node Sample Wgetrc, , Wgetrc Commands, Startup File

3200

@section Sample Wgetrc

3201

@cindex sample wgetrc

3202

3203

This is the sample initialization file, as given in the distribution.

3204

It is divided in two section---one for global usage (suitable for global

3205

startup file), and one for local usage (suitable for

3206

@file{$HOME/.wgetrc}). Be careful about the things you change.

3207

3208

Note that almost all the lines are commented out. For a command to have

3209

any effect, you must remove the @samp{#} character at the beginning of

3210

its line.

3211

3212

@example

3213

@include sample.wgetrc.munged_for_texi_inclusion

3214

@end example

3215

3216

@node Examples, Various, Startup File, Top

3217

@chapter Examples

3218

@cindex examples

3219

3220

@c man begin EXAMPLES

3221

The examples are divided into three sections loosely based on their

3222

complexity.

3223

3224

@menu

3225

* Simple Usage:: Simple, basic usage of the program.

3226

* Advanced Usage:: Advanced tips.

3227

* Very Advanced Usage:: The hairy stuff.

3228

@end menu

3229

3230

@node Simple Usage, Advanced Usage, Examples, Examples

3231

@section Simple Usage

3232

3233

@itemize @bullet

3234

@item

3235

Say you want to download a @sc{url}. Just type:

3236

3237

@example

3238

wget http://fly.srk.fer.hr/

3239

@end example

3240

3241

@item

3242

But what will happen if the connection is slow, and the file is lengthy?

3243

The connection will probably fail before the whole file is retrieved,

3244

more than once. In this case, Wget will try getting the file until it

3245

either gets the whole of it, or exceeds the default number of retries

3246

(this being 20). It is easy to change the number of tries to 45, to

3247

insure that the whole file will arrive safely:

3248

3249

@example

3250

wget --tries=45 http://fly.srk.fer.hr/jpg/flyweb.jpg

3251

@end example

3252

3253

@item

3254

Now let's leave Wget to work in the background, and write its progress

3255

to log file @file{log}. It is tiring to type @samp{--tries}, so we

3256

shall use @samp{-t}.

3257

3258

@example

3259

wget -t 45 -o log http://fly.srk.fer.hr/jpg/flyweb.jpg &

3260

@end example

3261

3262

The ampersand at the end of the line makes sure that Wget works in the

3263

background. To unlimit the number of retries, use @samp{-t inf}.

3264

3265

@item

3266

The usage of @sc{ftp} is as simple. Wget will take care of login and

3267

password.

3268

3269

@example

3270

wget ftp://gnjilux.srk.fer.hr/welcome.msg

3271

@end example

3272

3273

@item

3274

If you specify a directory, Wget will retrieve the directory listing,

3275

parse it and convert it to @sc{html}. Try:

3276

3277

@example

3278

wget ftp://ftp.gnu.org/pub/gnu/

3279

links index.html

3280

@end example

3281

@end itemize

3282

3283

@node Advanced Usage, Very Advanced Usage, Simple Usage, Examples

3284

@section Advanced Usage

3285

3286

@itemize @bullet

3287

@item

3288

You have a file that contains the URLs you want to download? Use the

3289

@samp{-i} switch:

3290

3291

@example

3292

wget -i @var{file}

3293

@end example

3294

3295

If you specify @samp{-} as file name, the @sc{url}s will be read from

3296

standard input.

3297

3298

@item

3299

Create a five levels deep mirror image of the GNU web site, with the

3300

same directory structure the original has, with only one try per

3301

document, saving the log of the activities to @file{gnulog}:

3302

3303

@example

3304

wget -r http://www.gnu.org/ -o gnulog

3305

@end example

3306

3307

@item

3308

The same as the above, but convert the links in the downloaded files to

3309

point to local files, so you can view the documents off-line:

3310

3311

@example

3312

wget --convert-links -r http://www.gnu.org/ -o gnulog

3313

@end example

3314

3315

@item

3316

Retrieve only one @sc{html} page, but make sure that all the elements needed

3317

for the page to be displayed, such as inline images and external style

3318

sheets, are also downloaded. Also make sure the downloaded page

3319

references the downloaded links.

3320

3321

@example

3322

wget -p --convert-links http://www.server.com/dir/page.html

3323

@end example

3324

3325

The @sc{html} page will be saved to @file{www.server.com/dir/page.html}, and

3326

the images, stylesheets, etc., somewhere under @file{www.server.com/},

3327

depending on where they were on the remote server.

3328

3329

@item

3330

The same as the above, but without the @file{www.server.com/} directory.

3331

In fact, I don't want to have all those random server directories

3332

anyway---just save @emph{all} those files under a @file{download/}

3333

subdirectory of the current directory.

3334

3335

@example

3336

wget -p --convert-links -nH -nd -Pdownload \

3337

http://www.server.com/dir/page.html

3338

@end example

3339

3340

@item

3341

Retrieve the index.html of @samp{www.lycos.com}, showing the original

3342

server headers:

3343

3344

@example

3345

wget -S http://www.lycos.com/

3346

@end example

3347

3348

@item

3349

Save the server headers with the file, perhaps for post-processing.

3350

3351

@example

3352

wget --save-headers http://www.lycos.com/

3353

more index.html

3354

@end example

3355

3356

@item

3357

Retrieve the first two levels of @samp{wuarchive.wustl.edu}, saving them

3358

to @file{/tmp}.

3359

3360

@example

3361

wget -r -l2 -P/tmp ftp://wuarchive.wustl.edu/

3362

@end example

3363

3364

@item

3365

You want to download all the @sc{gif}s from a directory on an @sc{http}

3366

server. You tried @samp{wget http://www.server.com/dir/*.gif}, but that

3367

didn't work because @sc{http} retrieval does not support globbing. In

3368

that case, use:

3369

3370

@example

3371

wget -r -l1 --no-parent -A.gif http://www.server.com/dir/

3372

@end example

3373

3374

More verbose, but the effect is the same. @samp{-r -l1} means to

3375

retrieve recursively (@pxref{Recursive Download}), with maximum depth

3376

of 1. @samp{--no-parent} means that references to the parent directory

3377

are ignored (@pxref{Directory-Based Limits}), and @samp{-A.gif} means to

3378

download only the @sc{gif} files. @samp{-A "*.gif"} would have worked

3379

too.

3380

3381

@item

3382

Suppose you were in the middle of downloading, when Wget was

3383

interrupted. Now you do not want to clobber the files already present.

3384

It would be:

3385

3386

@example

3387

wget -nc -r http://www.gnu.org/

3388

@end example

3389

3390

@item

3391

If you want to encode your own username and password to @sc{http} or

3392

@sc{ftp}, use the appropriate @sc{url} syntax (@pxref{URL Format}).

3393

3394

@example

3395

wget ftp://hniksic:mypassword@@unix.server.com/.emacs

3396

@end example

3397

3398

Note, however, that this usage is not advisable on multi-user systems

3399

because it reveals your password to anyone who looks at the output of

3400

@code{ps}.

3401

3402

@cindex redirecting output

3403

@item

3404

You would like the output documents to go to standard output instead of

3405

to files?

3406

3407

@example

3408

wget -O - http://jagor.srce.hr/ http://www.srce.hr/

3409

@end example

3410

3411

You can also combine the two options and make pipelines to retrieve the

3412

documents from remote hotlists:

3413

3414

@example

3415

wget -O - http://cool.list.com/ | wget --force-html -i -

3416

@end example

3417

@end itemize

3418

3419

@node Very Advanced Usage, , Advanced Usage, Examples

3420

@section Very Advanced Usage

3421

3422

@cindex mirroring

3423

@itemize @bullet

3424

@item

3425

If you wish Wget to keep a mirror of a page (or @sc{ftp}

3426

subdirectories), use @samp{--mirror} (@samp{-m}), which is the shorthand

3427

for @samp{-r -l inf -N}. You can put Wget in the crontab file asking it

3428

to recheck a site each Sunday:

3429

3430

@example

3431

crontab

3432

0 0 * * 0 wget --mirror http://www.gnu.org/ -o /home/me/weeklog

3433

@end example

3434

3435

@item

3436

In addition to the above, you want the links to be converted for local

3437

viewing. But, after having read this manual, you know that link

3438

conversion doesn't play well with timestamping, so you also want Wget to

3439

back up the original @sc{html} files before the conversion. Wget invocation

3440

would look like this:

3441

3442

@example

3443

wget --mirror --convert-links --backup-converted \

3444

http://www.gnu.org/ -o /home/me/weeklog

3445

@end example

3446

3447

@item

3448

But you've also noticed that local viewing doesn't work all that well

3449

when @sc{html} files are saved under extensions other than @samp{.html},

3450

perhaps because they were served as @file{index.cgi}. So you'd like

3451

Wget to rename all the files served with content-type @samp{text/html}

3452

or @samp{application/xhtml+xml} to @file{@var{name}.html}.

3453

3454

@example

3455

wget --mirror --convert-links --backup-converted \

3456

--html-extension -o /home/me/weeklog \

3457

http://www.gnu.org/

3458

@end example

3459

3460

Or, with less typing:

3461

3462

@example

3463

wget -m -k -K -E http://www.gnu.org/ -o /home/me/weeklog

3464

@end example

3465

@end itemize

3466

@c man end

3467

3468

@node Various, Appendices, Examples, Top

3469

@chapter Various

3470

@cindex various

3471

3472

This chapter contains all the stuff that could not fit anywhere else.

3473

3474

@menu

3475

* Proxies:: Support for proxy servers.

3476

* Distribution:: Getting the latest version.

3477

* Web Site:: GNU Wget's presence on the World Wide Web.

3478

* Mailing Lists:: Wget mailing list for announcements and discussion.

3479

* Internet Relay Chat:: Wget's presence on IRC.

3480

* Reporting Bugs:: How and where to report bugs.

3481

* Portability:: The systems Wget works on.

3482

* Signals:: Signal-handling performed by Wget.

3483

@end menu

3484

3485

@node Proxies, Distribution, Various, Various

3486

@section Proxies

3487

@cindex proxies

3488

3489

@dfn{Proxies} are special-purpose @sc{http} servers designed to transfer

3490

data from remote servers to local clients. One typical use of proxies

3491

is lightening network load for users behind a slow connection. This is

3492

achieved by channeling all @sc{http} and @sc{ftp} requests through the

3493

proxy which caches the transferred data. When a cached resource is

3494

requested again, proxy will return the data from cache. Another use for

3495

proxies is for companies that separate (for security reasons) their

3496

internal networks from the rest of Internet. In order to obtain

3497

information from the Web, their users connect and retrieve remote data

3498

using an authorized proxy.

3499

3500

Wget supports proxies for both @sc{http} and @sc{ftp} retrievals. The

3501

standard way to specify proxy location, which Wget recognizes, is using

3502

the following environment variables:

3503

3504

@table @code

3505

@item http_proxy

3506

@itemx https_proxy

3507

If set, the @code{http_proxy} and @code{https_proxy} variables should

3508

contain the @sc{url}s of the proxies for @sc{http} and @sc{https}

3509

connections respectively.

3510

3511

@item ftp_proxy

3512

This variable should contain the @sc{url} of the proxy for @sc{ftp}

3513

connections. It is quite common that @code{http_proxy} and

3514

@code{ftp_proxy} are set to the same @sc{url}.

3515

3516

@item no_proxy

3517

This variable should contain a comma-separated list of domain extensions

3518

proxy should @emph{not} be used for. For instance, if the value of

3519

@code{no_proxy} is @samp{.mit.edu}, proxy will not be used to retrieve

3520

documents from MIT.

3521

@end table

3522

3523

In addition to the environment variables, proxy location and settings

3524

may be specified from within Wget itself.

3525

3526

@table @samp

3527

@itemx --no-proxy

3528

@itemx proxy = on/off

3529

This option and the corresponding command may be used to suppress the

3530

use of proxy, even if the appropriate environment variables are set.

3531

3532

@item http_proxy = @var{URL}

3533

@itemx https_proxy = @var{URL}

3534

@itemx ftp_proxy = @var{URL}

3535

@itemx no_proxy = @var{string}

3536

These startup file variables allow you to override the proxy settings

3537

specified by the environment.

3538

@end table

3539

3540

Some proxy servers require authorization to enable you to use them. The

3541

authorization consists of @dfn{username} and @dfn{password}, which must

3542

be sent by Wget. As with @sc{http} authorization, several

3543

authentication schemes exist. For proxy authorization only the

3544

@code{Basic} authentication scheme is currently implemented.

3545

3546

You may specify your username and password either through the proxy

3547

@sc{url} or through the command-line options. Assuming that the

3548

company's proxy is located at @samp{proxy.company.com} at port 8001, a

3549

proxy @sc{url} location containing authorization data might look like

3550

this:

3551

3552

@example

3553

http://hniksic:mypassword@@proxy.company.com:8001/

3554

@end example

3555

3556

Alternatively, you may use the @samp{proxy-user} and

3557

@samp{proxy-password} options, and the equivalent @file{.wgetrc}

3558

settings @code{proxy_user} and @code{proxy_password} to set the proxy

3559

username and password.

3560

3561

@node Distribution, Web Site, Proxies, Various

3562

@section Distribution

3563

@cindex latest version

3564

3565

Like all GNU utilities, the latest version of Wget can be found at the

3566

master GNU archive site ftp.gnu.org, and its mirrors. For example,

3567

Wget @value{VERSION} can be found at

3568

@url{ftp://ftp.gnu.org/pub/gnu/wget/wget-@value{VERSION}.tar.gz}

3569

3570

@node Web Site, Mailing Lists, Distribution, Various

3571

@section Web Site

3572

@cindex web site

3573

3574

The official web site for GNU Wget is at

3575

@url{http://www.gnu.org/software/wget/}. However, most useful

3576

information resides at ``The Wget Wgiki'',

3577

@url{http://wget.addictivecode.org/}.

3578

3579

@node Mailing Lists, Internet Relay Chat, Web Site, Various

3580

@section Mailing Lists

3581

@cindex mailing list

3582

@cindex list

3583

3584

@unnumberedsubsec Primary List

3585

3586

The primary mailinglist for discussion, bug-reports, or questions

3587

about GNU Wget is at @email{bug-wget@@gnu.org}. To subscribe, send an

3588

email to @email{bug-wget-join@@gnu.org}, or visit

3589

@url{http://lists.gnu.org/mailman/listinfo/bug-wget}.

3590

3591

You do not need to subscribe to send a message to the list; however,

3592

please note that unsubscribed messages are moderated, and may take a

3593

while before they hit the list---@strong{usually around a day}. If

3594

you want your message to show up immediately, please subscribe to the

3595

list before posting. Archives for the list may be found at

3596

@url{http://lists.gnu.org/pipermail/bug-wget/}.

3597

3598

An NNTP/Usenettish gateway is also available via

3599

@uref{http://gmane.org/about.php,Gmane}. You can see the Gmane

3600

archives at

3601

@url{http://news.gmane.org/gmane.comp.web.wget.general}. Note that the

3602

Gmane archives conveniently include messages from both the current

3603

list, and the previous one. Messages also show up in the Gmane

3604

archives sooner than they do at @url{lists.gnu.org}.

3605

3606

@unnumberedsubsec Bug Notices List

3607

3608

Additionally, there is the @email{wget-notify@@addictivecode.org} mailing

3609

list. This is a non-discussion list that receives bug report

3610

notifications from the bug-tracker. To subscribe to this list,

3611

send an email to @email{wget-notify-join@@addictivecode.org},

3612

or visit @url{http://addictivecode.org/mailman/listinfo/wget-notify}.

3613

3614

@unnumberedsubsec Obsolete Lists

3615

3616

Previously, the mailing list @email{wget@@sunsite.dk} was used as the

3617

main discussion list, and another list,

3618

@email{wget-patches@@sunsite.dk} was used for submitting and

3619

discussing patches to GNU Wget.

3620

3621

Messages from @email{wget@@sunsite.dk} are archived at

3622

@itemize @tie{}

3623

@item

3624

@url{http://www.mail-archive.com/wget%40sunsite.dk/} and at

3625

@item

3626

@url{http://news.gmane.org/gmane.comp.web.wget.general} (which also

3627

continues to archive the current list, @email{bug-wget@@gnu.org}).

3628

@end itemize

3629

3630

Messages from @email{wget-patches@@sunsite.dk} are archived at

3631

@itemize @tie{}

3632

@item

3633

@url{http://news.gmane.org/gmane.comp.web.wget.patches}.

3634

@end itemize

3635

3636

@node Internet Relay Chat, Reporting Bugs, Mailing Lists, Various

3637

@section Internet Relay Chat

3638

@cindex Internet Relay Chat

3639

@cindex IRC

3640

@cindex #wget

3641

3642

In addition to the mailinglists, we also have a support channel set up

3643

via IRC at @code{irc.freenode.org}, @code{#wget}. Come check it out!

3644

3645

@node Reporting Bugs, Portability, Internet Relay Chat, Various

3646

@section Reporting Bugs

3647

@cindex bugs

3648

@cindex reporting bugs

3649

@cindex bug reports

3650

3651

@c man begin BUGS

3652

You are welcome to submit bug reports via the GNU Wget bug tracker (see

3653

@url{http://wget.addictivecode.org/BugTracker}).

3654

3655

Before actually submitting a bug report, please try to follow a few

3656

simple guidelines.

3657

3658

@enumerate

3659

@item

3660

Please try to ascertain that the behavior you see really is a bug. If

3661

Wget crashes, it's a bug. If Wget does not behave as documented,

3662

it's a bug. If things work strange, but you are not sure about the way

3663

they are supposed to work, it might well be a bug, but you might want to

3664

double-check the documentation and the mailing lists (@pxref{Mailing

3665

Lists}).

3666

3667

@item

3668

Try to repeat the bug in as simple circumstances as possible. E.g. if

3669

Wget crashes while downloading @samp{wget -rl0 -kKE -t5 --no-proxy

3670

http://yoyodyne.com -o /tmp/log}, you should try to see if the crash is

3671

repeatable, and if will occur with a simpler set of options. You might

3672

even try to start the download at the page where the crash occurred to

3673

see if that page somehow triggered the crash.

3674

3675

Also, while I will probably be interested to know the contents of your

3676

@file{.wgetrc} file, just dumping it into the debug message is probably

3677

a bad idea. Instead, you should first try to see if the bug repeats

3678

with @file{.wgetrc} moved out of the way. Only if it turns out that

3679

@file{.wgetrc} settings affect the bug, mail me the relevant parts of

3680

the file.

3681

3682

@item

3683

Please start Wget with @samp{-d} option and send us the resulting

3684

output (or relevant parts thereof). If Wget was compiled without

3685

debug support, recompile it---it is @emph{much} easier to trace bugs

3686

with debug support on.

3687

3688

Note: please make sure to remove any potentially sensitive information

3689

from the debug log before sending it to the bug address. The

3690

@code{-d} won't go out of its way to collect sensitive information,

3691

but the log @emph{will} contain a fairly complete transcript of Wget's

3692

communication with the server, which may include passwords and pieces

3693

of downloaded data. Since the bug address is publically archived, you

3694

may assume that all bug reports are visible to the public.

3695

3696

@item

3697

If Wget has crashed, try to run it in a debugger, e.g. @code{gdb `which

3698

wget` core} and type @code{where} to get the backtrace. This may not

3699

work if the system administrator has disabled core files, but it is

3700

safe to try.

3701

@end enumerate

3702

@c man end

3703

3704

@node Portability, Signals, Reporting Bugs, Various

3705

@section Portability

3706

@cindex portability

3707

@cindex operating systems

3708

3709

Like all GNU software, Wget works on the GNU system. However, since it

3710

uses GNU Autoconf for building and configuring, and mostly avoids using

3711

``special'' features of any particular Unix, it should compile (and

3712

work) on all common Unix flavors.

3713

3714

Various Wget versions have been compiled and tested under many kinds of

3715

Unix systems, including GNU/Linux, Solaris, SunOS 4.x, Mac OS X, OSF

3716

(aka Digital Unix or Tru64), Ultrix, *BSD, IRIX, AIX, and others. Some

3717

of those systems are no longer in widespread use and may not be able to

3718

support recent versions of Wget. If Wget fails to compile on your

3719

system, we would like to know about it.

3720

3721

Thanks to kind contributors, this version of Wget compiles and works

3722

on 32-bit Microsoft Windows platforms. It has been compiled

3723

successfully using MS Visual C++ 6.0, Watcom, Borland C, and GCC

3724

compilers. Naturally, it is crippled of some features available on

3725

Unix, but it should work as a substitute for people stuck with

3726

Windows. Note that Windows-specific portions of Wget are not

3727

guaranteed to be supported in the future, although this has been the

3728

case in practice for many years now. All questions and problems in

3729

Windows usage should be reported to Wget mailing list at

3730

@email{wget@@sunsite.dk} where the volunteers who maintain the

3731

Windows-related features might look at them.

3732

3733

Support for building on MS-DOS via DJGPP has been contributed by Gisle

3734

Vanem; a port to VMS is maintained by Steven Schweda, and is available

3735

at @url{http://antinode.org/}.

3736

3737

@node Signals, , Portability, Various

3738

@section Signals

3739

@cindex signal handling

3740

@cindex hangup

3741

3742

Since the purpose of Wget is background work, it catches the hangup

3743

signal (@code{SIGHUP}) and ignores it. If the output was on standard

3744

output, it will be redirected to a file named @file{wget-log}.

3745

Otherwise, @code{SIGHUP} is ignored. This is convenient when you wish

3746

to redirect the output of Wget after having started it.

3747

3748

@example

3749

$ wget http://www.gnus.org/dist/gnus.tar.gz &

3750

...

3751

$ kill -HUP %%

3752

SIGHUP received, redirecting output to `wget-log'.

3753

@end example

3754

3755

Other than that, Wget will not try to interfere with signals in any way.

3756

@kbd{C-c}, @code{kill -TERM} and @code{kill -KILL} should kill it alike.

3757

3758

@node Appendices, Copying this manual, Various, Top

3759

@chapter Appendices

3760

3761

This chapter contains some references I consider useful.

3762

3763

@menu

3764

* Robot Exclusion:: Wget's support for RES.

3765

* Security Considerations:: Security with Wget.

3766

* Contributors:: People who helped.

3767

@end menu

3768

3769

@node Robot Exclusion, Security Considerations, Appendices, Appendices

3770

@section Robot Exclusion

3771

@cindex robot exclusion

3772

@cindex robots.txt

3773

@cindex server maintenance

3774

3775

It is extremely easy to make Wget wander aimlessly around a web site,

3776

sucking all the available data in progress. @samp{wget -r @var{site}},

3777

and you're set. Great? Not for the server admin.

3778

3779

As long as Wget is only retrieving static pages, and doing it at a

3780

reasonable rate (see the @samp{--wait} option), there's not much of a

3781

problem. The trouble is that Wget can't tell the difference between the

3782

smallest static page and the most demanding CGI. A site I know has a

3783

section handled by a CGI Perl script that converts Info files to @sc{html} on

3784

the fly. The script is slow, but works well enough for human users

3785

viewing an occasional Info file. However, when someone's recursive Wget

3786

download stumbles upon the index page that links to all the Info files

3787

through the script, the system is brought to its knees without providing

3788

anything useful to the user (This task of converting Info files could be

3789

done locally and access to Info documentation for all installed GNU

3790

software on a system is available from the @code{info} command).

3791

3792

To avoid this kind of accident, as well as to preserve privacy for

3793

documents that need to be protected from well-behaved robots, the

3794

concept of @dfn{robot exclusion} was invented. The idea is that

3795

the server administrators and document authors can specify which

3796

portions of the site they wish to protect from robots and those

3797

they will permit access.

3798

3799

The most popular mechanism, and the @i{de facto} standard supported by

3800

all the major robots, is the ``Robots Exclusion Standard'' (RES) written

3801

by Martijn Koster et al. in 1994. It specifies the format of a text

3802

file containing directives that instruct the robots which URL paths to

3803

avoid. To be found by the robots, the specifications must be placed in

3804

@file{/robots.txt} in the server root, which the robots are expected to

3805

download and parse.

3806

3807

Although Wget is not a web robot in the strictest sense of the word, it

3808

can download large parts of the site without the user's intervention to

3809

download an individual page. Because of that, Wget honors RES when

3810

downloading recursively. For instance, when you issue:

3811

3812

@example

3813

wget -r http://www.server.com/

3814

@end example

3815

3816

First the index of @samp{www.server.com} will be downloaded. If Wget

3817

finds that it wants to download more documents from that server, it will

3818

request @samp{http://www.server.com/robots.txt} and, if found, use it

3819

for further downloads. @file{robots.txt} is loaded only once per each

3820

server.

3821

3822

Until version 1.8, Wget supported the first version of the standard,

3823

written by Martijn Koster in 1994 and available at

3824

@url{http://www.robotstxt.org/wc/norobots.html}. As of version 1.8,

3825

Wget has supported the additional directives specified in the internet

3826

draft @samp{<draft-koster-robots-00.txt>} titled ``A Method for Web

3827

Robots Control''. The draft, which has as far as I know never made to

3828

an @sc{rfc}, is available at

3829

@url{http://www.robotstxt.org/wc/norobots-rfc.txt}.

3830

3831

This manual no longer includes the text of the Robot Exclusion Standard.

3832

3833

The second, less known mechanism, enables the author of an individual

3834

document to specify whether they want the links from the file to be

3835

followed by a robot. This is achieved using the @code{META} tag, like

3836

this:

3837

3838

@example

3839

3840

@end example

3841

3842

This is explained in some detail at

3843

@url{http://www.robotstxt.org/wc/meta-user.html}. Wget supports this

3844

method of robot exclusion in addition to the usual @file{/robots.txt}

3845

exclusion.

3846

3847

If you know what you are doing and really really wish to turn off the

3848

robot exclusion, set the @code{robots} variable to @samp{off} in your

3849

@file{.wgetrc}. You can achieve the same effect from the command line

3850

using the @code{-e} switch, e.g. @samp{wget -e robots=off @var{url}...}.

3851

3852

@node Security Considerations, Contributors, Robot Exclusion, Appendices

3853

@section Security Considerations

3854

@cindex security

3855

3856

When using Wget, you must be aware that it sends unencrypted passwords

3857

through the network, which may present a security problem. Here are the

3858

main issues, and some solutions.

3859

3860

@enumerate

3861

@item

3862

The passwords on the command line are visible using @code{ps}. The best

3863

way around it is to use @code{wget -i -} and feed the @sc{url}s to

3864

Wget's standard input, each on a separate line, terminated by @kbd{C-d}.

3865

Another workaround is to use @file{.netrc} to store passwords; however,

3866

storing unencrypted passwords is also considered a security risk.

3867

3868

@item

3869

Using the insecure @dfn{basic} authentication scheme, unencrypted

3870

passwords are transmitted through the network routers and gateways.

3871

3872

@item

3873

The @sc{ftp} passwords are also in no way encrypted. There is no good

3874

solution for this at the moment.

3875

3876

@item

3877

Although the ``normal'' output of Wget tries to hide the passwords,

3878

debugging logs show them, in all forms. This problem is avoided by

3879

being careful when you send debug logs (yes, even when you send them to

3880

me).

3881

@end enumerate

3882

3883

@node Contributors, , Security Considerations, Appendices

3884

@section Contributors

3885

@cindex contributors

3886

3887

@iftex

3888

GNU Wget was written by Hrvoje Nik@v{s}i@'{c} @email{hniksic@@xemacs.org},

3889

@end iftex

3890

@ifnottex

3891

GNU Wget was written by Hrvoje Niksic @email{hniksic@@xemacs.org},

3892

@end ifnottex

3893

and it is currently maintained by Micah Cowan @email{micah@@cowan.name}.

3894

3895

However, the development of Wget could never have gone as far as it has, were

3896

it not for the help of many people, either with bug reports, feature proposals,

3897

patches, or letters saying ``Thanks!''.

3898

3899

Special thanks goes to the following people (no particular order):

3900

3901

@itemize @bullet

3902

@item Dan Harkless---contributed a lot of code and documentation of

3903

extremely high quality, as well as the @code{--page-requisites} and

3904

related options. He was the principal maintainer for some time and

3905

released Wget 1.6.

3906

3907

@item Ian Abbott---contributed bug fixes, Windows-related fixes, and

3908

provided a prototype implementation of the breadth-first recursive

3909

download. Co-maintained Wget during the 1.8 release cycle.

3910

3911

@item

3912

The dotsrc.org crew, in particular Karsten Thygesen---donated system

3913

resources such as the mailing list, web space, @sc{ftp} space, and

3914

version control repositories, along with a lot of time to make these

3915

actually work. Christian Reiniger was of invaluable help with setting

3916

up Subversion.

3917

3918

@item

3919

Heiko Herold---provided high-quality Windows builds and contributed

3920

bug and build reports for many years.

3921

3922

@item

3923

Shawn McHorse---bug reports and patches.

3924

3925

@item

3926

Kaveh R. Ghazi---on-the-fly @code{ansi2knr}-ization. Lots of

3927

portability fixes.

3928

3929

@item

3930

Gordon Matzigkeit---@file{.netrc} support.

3931

3932

@item

3933

@iftex

3934

Zlatko @v{C}alu@v{s}i@'{c}, Tomislav Vujec and Dra@v{z}en

3935

Ka@v{c}ar---feature suggestions and ``philosophical'' discussions.

3936

@end iftex

3937

@ifnottex

3938

Zlatko Calusic, Tomislav Vujec and Drazen Kacar---feature suggestions

3939

and ``philosophical'' discussions.

3940

@end ifnottex

3941

3942

@item

3943

Darko Budor---initial port to Windows.

3944

3945

@item

3946

Antonio Rosella---help and suggestions, plus the initial Italian

3947

translation.

3948

3949

@item

3950

@iftex

3951

Tomislav Petrovi@'{c}, Mario Miko@v{c}evi@'{c}---many bug reports and

3952

suggestions.

3953

@end iftex

3954

@ifnottex

3955

Tomislav Petrovic, Mario Mikocevic---many bug reports and suggestions.

3956

@end ifnottex

3957

3958

@item

3959

@iftex

3960

Fran@,{c}ois Pinard---many thorough bug reports and discussions.

3961

@end iftex

3962

@ifnottex

3963

Francois Pinard---many thorough bug reports and discussions.

3964

@end ifnottex

3965

3966

@item

3967

Karl Eichwalder---lots of help with internationalization, Makefile

3968

layout and many other things.

3969

3970

@item

3971

Junio Hamano---donated support for Opie and @sc{http} @code{Digest}

3972

authentication.

3973

3974

@item

3975

Mauro Tortonesi---improved IPv6 support, adding support for dual

3976

family systems. Refactored and enhanced FTP IPv6 code. Maintained GNU

3977

Wget from 2004--2007.

3978

3979

@item

3980

Christopher G.@: Lewis---maintenance of the Windows version of GNU WGet.

3981

3982

@item

3983

Gisle Vanem---many helpful patches and improvements, especially for

3984

Windows and MS-DOS support.

3985

3986

@item

3987

Ralf Wildenhues---contributed patches to convert Wget to use Automake as

3988

part of its build process, and various bugfixes.

3989

3990

@item

3991

Steven Schubiger---Many helpful patches, bugfixes and improvements.

3992

Notably, conversion of Wget to use the Gnulib quotes and quoteargs

3993

modules, and the addition of password prompts at the console, via the

3994

Gnulib getpasswd-gnu module.

3995

3996

@item

3997

Ted Mielczarek---donated support for CSS.

3998

3999

@item

4000

Saint Xavier---Support for IRIs (RFC 3987).

4001

4002

@item

4003

People who provided donations for development---including Brian Gough.

4004

@end itemize

4005

4006

The following people have provided patches, bug/build reports, useful

4007

suggestions, beta testing services, fan mail and all the other things

4008

that make maintenance so much fun:

4009

4010

Tim Adam,

4011

Adrian Aichner,

4012

Martin Baehr,

4013

Dieter Baron,

4014

Roger Beeman,

4015

Dan Berger,

4016

T.@: Bharath,

4017

Christian Biere,

4018

Paul Bludov,

4019

Daniel Bodea,

4020

Mark Boyns,

4021

John Burden,

4022

Julien Buty,

4023

Wanderlei Cavassin,

4024

Gilles Cedoc,

4025

Tim Charron,

4026

Noel Cragg,

4027

@iftex

4028

Kristijan @v{C}onka@v{s},

4029

@end iftex

4030

@ifnottex

4031

Kristijan Conkas,

4032

@end ifnottex

4033

John Daily,

4034

Andreas Damm,

4035

Ahmon Dancy,

4036

Andrew Davison,

4037

Bertrand Demiddelaer,

4038

Alexander Dergachev,

4039

Andrew Deryabin,

4040

Ulrich Drepper,

4041

Marc Duponcheel,

4042

@iftex

4043

Damir D@v{z}eko,

4044

@end iftex

4045

@ifnottex

4046

Damir Dzeko,

4047

@end ifnottex

4048

Alan Eldridge,

4049

Hans-Andreas Engel,

4050

@iftex

4051

Aleksandar Erkalovi@'{c},

4052

@end iftex

4053

@ifnottex

4054

Aleksandar Erkalovic,

4055

@end ifnottex

4056

Andy Eskilsson,

4057

@iftex

4058

Jo@~{a}o Ferreira,

4059

@end iftex

4060

@ifnottex

4061

Joao Ferreira,

4062

@end ifnottex

4063

Christian Fraenkel,

4064

David Fritz,

4065

Mike Frysinger,

4066

Charles C.@: Fu,

4067

FUJISHIMA Satsuki,

4068

Masashi Fujita,

4069

Howard Gayle,

4070

Marcel Gerrits,

4071

Lemble Gregory,

4072

Hans Grobler,

4073

Alain Guibert,

4074

Mathieu Guillaume,

4075

Aaron Hawley,

4076

Jochen Hein,

4077

Karl Heuer,

4078

Madhusudan Hosaagrahara,

4079

HIROSE Masaaki,

4080

Ulf Harnhammar,

4081

Gregor Hoffleit,

4082

Erik Magnus Hulthen,

4083

Richard Huveneers,

4084

Jonas Jensen,

4085

Larry Jones,

4086

Simon Josefsson,

4087

@iftex

4088

Mario Juri@'{c},

4089

@end iftex

4090

@ifnottex

4091

Mario Juric,

4092

@end ifnottex

4093

@iftex

4094

Hack Kampbj@o rn,

4095

@end iftex

4096

@ifnottex

4097

Hack Kampbjorn,

4098

@end ifnottex

4099

Const Kaplinsky,

4100

@iftex

4101

Goran Kezunovi@'{c},

4102

@end iftex

4103

@ifnottex

4104

Goran Kezunovic,

4105

@end ifnottex

4106

Igor Khristophorov,

4107

Robert Kleine,

4108

KOJIMA Haime,

4109

Fila Kolodny,

4110

Alexander Kourakos,

4111

Martin Kraemer,

4112

Sami Krank,

4113

Jay Krell,

4114

@tex

4115

$\Sigma\acute{\iota}\mu o\varsigma\;

4116

\Xi\varepsilon\nu\iota\tau\acute{\epsilon}\lambda\lambda\eta\varsigma$

4117

(Simos KSenitellis),

4118

@end tex

4119

@ifnottex

4120

Simos KSenitellis,

4121

@end ifnottex

4122

Christian Lackas,

4123

Hrvoje Lacko,

4124

Daniel S.@: Lewart,

4125

@iftex

4126

Nicol@'{a}s Lichtmeier,

4127

@end iftex

4128

@ifnottex

4129

Nicolas Lichtmeier,

4130

@end ifnottex

4131

Dave Love,

4132

Alexander V.@: Lukyanov,

4133

@iftex

4134

Thomas Lu@ss{}nig,

4135

@end iftex

4136

@ifnottex

4137

Thomas Lussnig,

4138

@end ifnottex

4139

Andre Majorel,

4140

Aurelien Marchand,

4141

Matthew J.@: Mellon,

4142

Jordan Mendelson,

4143

Ted Mielczarek,

4144

Robert Millan,

4145

Lin Zhe Min,

4146

Jan Minar,

4147

Tim Mooney,

4148

Keith Moore,

4149

Adam D.@: Moss,

4150

Simon Munton,

4151

Charlie Negyesi,

4152

R.@: K.@: Owen,

4153

Jim Paris,

4154

Kenny Parnell,

4155

Leonid Petrov,

4156

Simone Piunno,

4157

Andrew Pollock,

4158

Steve Pothier,

4159

@iftex

4160

Jan P@v{r}ikryl,

4161

@end iftex

4162

@ifnottex

4163

Jan Prikryl,

4164

@end ifnottex

4165

Marin Purgar,

4166

@iftex

4167

Csaba R@'{a}duly,

4168

@end iftex

4169

@ifnottex

4170

Csaba Raduly,

4171

@end ifnottex

4172

Keith Refson,

4173

Bill Richardson,

4174

Tyler Riddle,

4175

Tobias Ringstrom,

4176

Jochen Roderburg,

4177

@c Texinfo doesn't grok @'{@i}, so we have to use TeX itself.

4178

@tex

4179

Juan Jos\'{e} Rodr\'{\i}guez,

4180

@end tex

4181

@ifnottex

4182

Juan Jose Rodriguez,

4183

@end ifnottex

4184

Maciej W.@: Rozycki,

4185

Edward J.@: Sabol,

4186

Heinz Salzmann,

4187

Robert Schmidt,

4188

Nicolas Schodet,

4189

Benno Schulenberg,

4190

Andreas Schwab,

4191

Steven M.@: Schweda,

4192

Chris Seawood,

4193

Pranab Shenoy,

4194

Dennis Smit,

4195

Toomas Soome,

4196

Tage Stabell-Kulo,

4197

Philip Stadermann,

4198

Daniel Stenberg,

4199

Sven Sternberger,

4200

Markus Strasser,

4201

John Summerfield,

4202

Szakacsits Szabolcs,

4203

Mike Thomas,

4204

Philipp Thomas,

4205

Mauro Tortonesi,

4206

Dave Turner,

4207

Gisle Vanem,

4208

Rabin Vincent,

4209

Russell Vincent,

4210

@iftex

4211

@v{Z}eljko Vrba,

4212

@end iftex

4213

@ifnottex

4214

Zeljko Vrba,

4215

@end ifnottex

4216

Charles G Waldman,

4217

Douglas E.@: Wegscheid,

4218

Ralf Wildenhues,

4219

Joshua David Williams,

4220

Benjamin Wolsey,

4221

Saint Xavier,

4222

YAMAZAKI Makoto,

4223

Jasmin Zainul,

4224

@iftex

4225

Bojan @v{Z}drnja,

4226

@end iftex

4227

@ifnottex

4228

Bojan Zdrnja,

4229

@end ifnottex

4230

Kristijan Zimmer,

4231

Xin Zou.

4232

4233

Apologies to all who I accidentally left out, and many thanks to all the

4234

subscribers of the Wget mailing list.

4235

4236

@node Copying this manual, Concept Index, Appendices, Top

4237

@appendix Copying this manual

4238

4239

@menu

4240

* GNU Free Documentation License:: Licnse for copying this manual.

4241

@end menu

4242

4243

@node GNU Free Documentation License, , Copying this manual, Copying this manual

4244

@appendixsec GNU Free Documentation License

4245

@cindex FDL, GNU Free Documentation License

4246

4247

@include fdl.texi

4248

4249

4250

@node Concept Index, , Copying this manual, Top

4251

@unnumbered Concept Index

4252

@printindex cp

4253

4254

@contents

4255

4256

@bye

Older »