1
This is wget.info, produced by makeinfo version 4.3 from ./wget.texi.
3
INFO-DIR-SECTION Network Applications
5
* Wget: (wget). The non-interactive network downloader.
8
This file documents the the GNU Wget utility for downloading network
11
Copyright (C) 1996, 1997, 1998, 2000, 2001, 2002, 2003 Free Software
14
Permission is granted to make and distribute verbatim copies of this
15
manual provided the copyright notice and this permission notice are
16
preserved on all copies.
18
Permission is granted to copy, distribute and/or modify this document
19
under the terms of the GNU Free Documentation License, Version 1.1 or
20
any later version published by the Free Software Foundation; with the
21
Invariant Sections being "GNU General Public License" and "GNU Free
22
Documentation License", with no Front-Cover Texts, and with no
23
Back-Cover Texts. A copy of the license is included in the section
24
entitled "GNU Free Documentation License".
27
File: wget.info, Node: Top, Next: Overview, Prev: (dir), Up: (dir)
32
This manual documents version 1.9.1 of GNU Wget, the freely
33
available utility for network downloads.
35
Copyright (C) 1996, 1997, 1998, 2000, 2001, 2003 Free Software
40
* Overview:: Features of Wget.
41
* Invoking:: Wget command-line arguments.
42
* Recursive Retrieval:: Description of recursive retrieval.
43
* Following Links:: The available methods of chasing links.
44
* Time-Stamping:: Mirroring according to time-stamps.
45
* Startup File:: Wget's initialization file.
46
* Examples:: Examples of usage.
47
* Various:: The stuff that doesn't fit anywhere else.
48
* Appendices:: Some useful references.
49
* Copying:: You may give out copies of Wget and of this manual.
50
* Concept Index:: Topics covered by this manual.
53
File: wget.info, Node: Overview, Next: Invoking, Prev: Top, Up: Top
58
GNU Wget is a free utility for non-interactive download of files from
59
the Web. It supports HTTP, HTTPS, and FTP protocols, as well as
60
retrieval through HTTP proxies.
62
This chapter is a partial overview of Wget's features.
64
* Wget is non-interactive, meaning that it can work in the
65
background, while the user is not logged on. This allows you to
66
start a retrieval and disconnect from the system, letting Wget
67
finish the work. By contrast, most of the Web browsers require
68
constant user's presence, which can be a great hindrance when
69
transferring a lot of data.
72
* Wget can follow links in HTML and XHTML pages and create local
73
versions of remote web sites, fully recreating the directory
74
structure of the original site. This is sometimes referred to as
75
"recursive downloading." While doing that, Wget respects the
76
Robot Exclusion Standard (`/robots.txt'). Wget can be instructed
77
to convert the links in downloaded HTML files to the local files
81
* File name wildcard matching and recursive mirroring of directories
82
are available when retrieving via FTP. Wget can read the
83
time-stamp information given by both HTTP and FTP servers, and
84
store it locally. Thus Wget can see if the remote file has
85
changed since last retrieval, and automatically retrieve the new
86
version if it has. This makes Wget suitable for mirroring of FTP
87
sites, as well as home pages.
90
* Wget has been designed for robustness over slow or unstable network
91
connections; if a download fails due to a network problem, it will
92
keep retrying until the whole file has been retrieved. If the
93
server supports regetting, it will instruct the server to continue
94
the download from where it left off.
97
* Wget supports proxy servers, which can lighten the network load,
98
speed up retrieval and provide access behind firewalls. However,
99
if you are behind a firewall that requires that you use a socks
100
style gateway, you can get the socks library and build Wget with
101
support for socks. Wget also supports the passive FTP downloading
105
* Built-in features offer mechanisms to tune which links you wish to
106
follow (*note Following Links::).
109
* The retrieval is conveniently traced with printing dots, each dot
110
representing a fixed amount of data received (1KB by default).
111
These representations can be customized to your preferences.
114
* Most of the features are fully configurable, either through
115
command line options, or via the initialization file `.wgetrc'
116
(*note Startup File::). Wget allows you to define "global"
117
startup files (`/usr/local/etc/wgetrc' by default) for site
121
* Finally, GNU Wget is free software. This means that everyone may
122
use it, redistribute it and/or modify it under the terms of the
123
GNU General Public License, as published by the Free Software
124
Foundation (*note Copying::).
127
File: wget.info, Node: Invoking, Next: Recursive Retrieval, Prev: Overview, Up: Top
132
By default, Wget is very simple to invoke. The basic syntax is:
134
wget [OPTION]... [URL]...
136
Wget will simply download all the URLs specified on the command
137
line. URL is a "Uniform Resource Locator", as defined below.
139
However, you may wish to change some of the default parameters of
140
Wget. You can do it two ways: permanently, adding the appropriate
141
command to `.wgetrc' (*note Startup File::), or specifying it on the
148
* Basic Startup Options::
149
* Logging and Input File Options::
151
* Directory Options::
154
* Recursive Retrieval Options::
155
* Recursive Accept/Reject Options::
158
File: wget.info, Node: URL Format, Next: Option Syntax, Prev: Invoking, Up: Invoking
163
"URL" is an acronym for Uniform Resource Locator. A uniform
164
resource locator is a compact string representation for a resource
165
available via the Internet. Wget recognizes the URL syntax as per
166
RFC1738. This is the most widely used form (square brackets denote
169
http://host[:port]/directory/file
170
ftp://host[:port]/directory/file
172
You can also encode your username and password within a URL:
174
ftp://user:password@host/path
175
http://user:password@host/path
177
Either USER or PASSWORD, or both, may be left out. If you leave out
178
either the HTTP username or password, no authentication will be sent.
179
If you leave out the FTP username, `anonymous' will be used. If you
180
leave out the FTP password, your email address will be supplied as a
183
*Important Note*: if you specify a password-containing URL on the
184
command line, the username and password will be plainly visible to all
185
users on the system, by way of `ps'. On multi-user systems, this is a
186
big security risk. To work around it, use `wget -i -' and feed the
187
URLs to Wget's standard input, each on a separate line, terminated by
190
You can encode unsafe characters in a URL as `%xy', `xy' being the
191
hexadecimal representation of the character's ASCII value. Some common
192
unsafe characters include `%' (quoted as `%25'), `:' (quoted as `%3A'),
193
and `@' (quoted as `%40'). Refer to RFC1738 for a comprehensive list
194
of unsafe characters.
196
Wget also supports the `type' feature for FTP URLs. By default, FTP
197
documents are retrieved in the binary mode (type `i'), which means that
198
they are downloaded unchanged. Another useful mode is the `a'
199
("ASCII") mode, which converts the line delimiters between the
200
different operating systems, and is thus useful for text files. Here
203
ftp://host/directory/file;type=a
205
Two alternative variants of URL specification are also supported,
206
because of historical (hysterical?) reasons and their widespreaded use.
208
FTP-only syntax (supported by `NcFTP'):
211
HTTP-only syntax (introduced by `Netscape'):
214
These two alternative forms are deprecated, and may cease being
215
supported in the future.
217
If you do not understand the difference between these notations, or
218
do not know which one to use, just use the plain ordinary format you use
219
with your favorite browser, like `Lynx' or `Netscape'.
221
---------- Footnotes ----------
223
(1) If you have a `.netrc' file in your home directory, password
224
will also be searched for there.
227
File: wget.info, Node: Option Syntax, Next: Basic Startup Options, Prev: URL Format, Up: Invoking
232
Since Wget uses GNU getopts to process its arguments, every option
233
has a short form and a long form. Long options are more convenient to
234
remember, but take time to type. You may freely mix different option
235
styles, or specify options after the command-line arguments. Thus you
238
wget -r --tries=10 http://fly.srk.fer.hr/ -o log
240
The space between the option accepting an argument and the argument
241
may be omitted. Instead `-o log' you can write `-olog'.
243
You may put several options that do not require arguments together,
248
This is a complete equivalent of:
252
Since the options can be specified after the arguments, you may
253
terminate them with `--'. So the following will try to download URL
254
`-x', reporting failure to `log':
258
The options that accept comma-separated lists all respect the
259
convention that specifying an empty list clears its value. This can be
260
useful to clear the `.wgetrc' settings. For instance, if your `.wgetrc'
261
sets `exclude_directories' to `/cgi-bin', the following example will
262
first reset it, and then set it to exclude `/~nobody' and `/~somebody'.
263
You can also clear the lists in `.wgetrc' (*note Wgetrc Syntax::).
265
wget -X '' -X /~nobody,/~somebody
268
File: wget.info, Node: Basic Startup Options, Next: Logging and Input File Options, Prev: Option Syntax, Up: Invoking
270
Basic Startup Options
271
=====================
275
Display the version of Wget.
279
Print a help message describing all of Wget's command-line options.
283
Go to background immediately after startup. If no output file is
284
specified via the `-o', output is redirected to `wget-log'.
288
Execute COMMAND as if it were a part of `.wgetrc' (*note Startup
289
File::). A command thus invoked will be executed _after_ the
290
commands in `.wgetrc', thus taking precedence over them.
293
File: wget.info, Node: Logging and Input File Options, Next: Download Options, Prev: Basic Startup Options, Up: Invoking
295
Logging and Input File Options
296
==============================
299
`--output-file=LOGFILE'
300
Log all messages to LOGFILE. The messages are normally reported
304
`--append-output=LOGFILE'
305
Append to LOGFILE. This is the same as `-o', only it appends to
306
LOGFILE instead of overwriting the old log file. If LOGFILE does
307
not exist, a new file is created.
311
Turn on debug output, meaning various information important to the
312
developers of Wget if it does not work properly. Your system
313
administrator may have chosen to compile Wget without debug
314
support, in which case `-d' will not work. Please note that
315
compiling with debug support is always safe--Wget compiled with
316
the debug support will _not_ print any debug info unless requested
317
with `-d'. *Note Reporting Bugs::, for more information on how to
318
use `-d' for sending bug reports.
322
Turn off Wget's output.
326
Turn on verbose output, with all the available data. The default
331
Non-verbose output--turn off verbose without being completely quiet
332
(use `-q' for that), which means that error messages and basic
333
information still get printed.
337
Read URLs from FILE, in which case no URLs need to be on the
338
command line. If there are URLs both on the command line and in
339
an input file, those on the command lines will be the first ones to
340
be retrieved. The FILE need not be an HTML document (but no harm
341
if it is)--it is enough if the URLs are just listed sequentially.
343
However, if you specify `--force-html', the document will be
344
regarded as `html'. In that case you may have problems with
345
relative links, which you can solve either by adding `<base
346
href="URL">' to the documents or by specifying `--base=URL' on the
351
When input is read from a file, force it to be treated as an HTML
352
file. This enables you to retrieve relative links from existing
353
HTML files on your local disk, by adding `<base href="URL">' to
354
HTML, or using the `--base' command-line option.
358
When used in conjunction with `-F', prepends URL to relative links
359
in the file specified by `-i'.
362
File: wget.info, Node: Download Options, Next: Directory Options, Prev: Logging and Input File Options, Up: Invoking
367
`--bind-address=ADDRESS'
368
When making client TCP/IP connections, `bind()' to ADDRESS on the
369
local machine. ADDRESS may be specified as a hostname or IP
370
address. This option can be useful if your machine is bound to
375
Set number of retries to NUMBER. Specify 0 or `inf' for infinite
376
retrying. The default is to retry 20 times, with the exception of
377
fatal errors like "connection refused" or "not found" (404), which
381
`--output-document=FILE'
382
The documents will not be written to the appropriate files, but
383
all will be concatenated together and written to FILE. If FILE
384
already exists, it will be overwritten. If the FILE is `-', the
385
documents will be written to standard output. Including this
386
option automatically sets the number of tries to 1.
390
If a file is downloaded more than once in the same directory,
391
Wget's behavior depends on a few options, including `-nc'. In
392
certain cases, the local file will be "clobbered", or overwritten,
393
upon repeated download. In other cases it will be preserved.
395
When running Wget without `-N', `-nc', or `-r', downloading the
396
same file in the same directory will result in the original copy
397
of FILE being preserved and the second copy being named `FILE.1'.
398
If that file is downloaded yet again, the third copy will be named
399
`FILE.2', and so on. When `-nc' is specified, this behavior is
400
suppressed, and Wget will refuse to download newer copies of
401
`FILE'. Therefore, "`no-clobber'" is actually a misnomer in this
402
mode--it's not clobbering that's prevented (as the numeric
403
suffixes were already preventing clobbering), but rather the
404
multiple version saving that's prevented.
406
When running Wget with `-r', but without `-N' or `-nc',
407
re-downloading a file will result in the new copy simply
408
overwriting the old. Adding `-nc' will prevent this behavior,
409
instead causing the original version to be preserved and any newer
410
copies on the server to be ignored.
412
When running Wget with `-N', with or without `-r', the decision as
413
to whether or not to download a newer copy of a file depends on
414
the local and remote timestamp and size of the file (*note
415
Time-Stamping::). `-nc' may not be specified at the same time as
418
Note that when `-nc' is specified, files with the suffixes `.html'
419
or (yuck) `.htm' will be loaded from the local disk and parsed as
420
if they had been retrieved from the Web.
424
Continue getting a partially-downloaded file. This is useful when
425
you want to finish up a download started by a previous instance of
426
Wget, or by another program. For instance:
428
wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z
430
If there is a file named `ls-lR.Z' in the current directory, Wget
431
will assume that it is the first portion of the remote file, and
432
will ask the server to continue the retrieval from an offset equal
433
to the length of the local file.
435
Note that you don't need to specify this option if you just want
436
the current invocation of Wget to retry downloading a file should
437
the connection be lost midway through. This is the default
438
behavior. `-c' only affects resumption of downloads started
439
_prior_ to this invocation of Wget, and whose local files are
440
still sitting around.
442
Without `-c', the previous example would just download the remote
443
file to `ls-lR.Z.1', leaving the truncated `ls-lR.Z' file alone.
445
Beginning with Wget 1.7, if you use `-c' on a non-empty file, and
446
it turns out that the server does not support continued
447
downloading, Wget will refuse to start the download from scratch,
448
which would effectively ruin existing contents. If you really
449
want the download to start from scratch, remove the file.
451
Also beginning with Wget 1.7, if you use `-c' on a file which is of
452
equal size as the one on the server, Wget will refuse to download
453
the file and print an explanatory message. The same happens when
454
the file is smaller on the server than locally (presumably because
455
it was changed on the server since your last download
456
attempt)--because "continuing" is not meaningful, no download
459
On the other side of the coin, while using `-c', any file that's
460
bigger on the server than locally will be considered an incomplete
461
download and only `(length(remote) - length(local))' bytes will be
462
downloaded and tacked onto the end of the local file. This
463
behavior can be desirable in certain cases--for instance, you can
464
use `wget -c' to download just the new portion that's been
465
appended to a data collection or log file.
467
However, if the file is bigger on the server because it's been
468
_changed_, as opposed to just _appended_ to, you'll end up with a
469
garbled file. Wget has no way of verifying that the local file is
470
really a valid prefix of the remote file. You need to be
471
especially careful of this when using `-c' in conjunction with
472
`-r', since every file will be considered as an "incomplete
475
Another instance where you'll get a garbled file if you try to use
476
`-c' is if you have a lame HTTP proxy that inserts a "transfer
477
interrupted" string into the local file. In the future a
478
"rollback" option may be added to deal with this case.
480
Note that `-c' only works with FTP servers and with HTTP servers
481
that support the `Range' header.
484
Select the type of the progress indicator you wish to use. Legal
485
indicators are "dot" and "bar".
487
The "bar" indicator is used by default. It draws an ASCII progress
488
bar graphics (a.k.a "thermometer" display) indicating the status of
489
retrieval. If the output is not a TTY, the "dot" bar will be used
492
Use `--progress=dot' to switch to the "dot" display. It traces
493
the retrieval by printing dots on the screen, each dot
494
representing a fixed amount of downloaded data.
496
When using the dotted retrieval, you may also set the "style" by
497
specifying the type as `dot:STYLE'. Different styles assign
498
different meaning to one dot. With the `default' style each dot
499
represents 1K, there are ten dots in a cluster and 50 dots in a
500
line. The `binary' style has a more "computer"-like
501
orientation--8K dots, 16-dots clusters and 48 dots per line (which
502
makes for 384K lines). The `mega' style is suitable for
503
downloading very large files--each dot represents 64K retrieved,
504
there are eight dots in a cluster, and 48 dots on each line (so
505
each line contains 3M).
507
Note that you can set the default style using the `progress'
508
command in `.wgetrc'. That setting may be overridden from the
509
command line. The exception is that, when the output is not a
510
TTY, the "dot" progress will be favored over "bar". To force the
511
bar output, use `--progress=bar:force'.
515
Turn on time-stamping. *Note Time-Stamping::, for details.
519
Print the headers sent by HTTP servers and responses sent by FTP
523
When invoked with this option, Wget will behave as a Web "spider",
524
which means that it will not download the pages, just check that
525
they are there. For example, you can use Wget to check your
528
wget --spider --force-html -i bookmarks.html
530
This feature needs much more work for Wget to get close to the
531
functionality of real web spiders.
535
Set the network timeout to SECONDS seconds. This is equivalent to
536
specifying `--dns-timeout', `--connect-timeout', and
537
`--read-timeout', all at the same time.
539
Whenever Wget connects to or reads from a remote host, it checks
540
for a timeout and aborts the operation if the time expires. This
541
prevents anomalous occurrences such as hanging reads or infinite
542
connects. The only timeout enabled by default is a 900-second
543
timeout for reading. Setting timeout to 0 disables checking for
546
Unless you know what you are doing, it is best not to set any of
547
the timeout-related options.
549
`--dns-timeout=SECONDS'
550
Set the DNS lookup timeout to SECONDS seconds. DNS lookups that
551
don't complete within the specified time will fail. By default,
552
there is no timeout on DNS lookups, other than that implemented by
555
`--connect-timeout=SECONDS'
556
Set the connect timeout to SECONDS seconds. TCP connections that
557
take longer to establish will be aborted. By default, there is no
558
connect timeout, other than that implemented by system libraries.
560
`--read-timeout=SECONDS'
561
Set the read (and write) timeout to SECONDS seconds. Reads that
562
take longer will fail. The default value for read timeout is 900
565
`--limit-rate=AMOUNT'
566
Limit the download speed to AMOUNT bytes per second. Amount may
567
be expressed in bytes, kilobytes with the `k' suffix, or megabytes
568
with the `m' suffix. For example, `--limit-rate=20k' will limit
569
the retrieval rate to 20KB/s. This kind of thing is useful when,
570
for whatever reason, you don't want Wget to consume the entire
573
Note that Wget implements the limiting by sleeping the appropriate
574
amount of time after a network read that took less time than
575
specified by the rate. Eventually this strategy causes the TCP
576
transfer to slow down to approximately the specified rate.
577
However, it may take some time for this balance to be achieved, so
578
don't be surprised if limiting the rate doesn't work well with
583
Wait the specified number of seconds between the retrievals. Use
584
of this option is recommended, as it lightens the server load by
585
making the requests less frequent. Instead of in seconds, the
586
time can be specified in minutes using the `m' suffix, in hours
587
using `h' suffix, or in days using `d' suffix.
589
Specifying a large value for this option is useful if the network
590
or the destination host is down, so that Wget can wait long enough
591
to reasonably expect the network error to be fixed before the
594
`--waitretry=SECONDS'
595
If you don't want Wget to wait between _every_ retrieval, but only
596
between retries of failed downloads, you can use this option.
597
Wget will use "linear backoff", waiting 1 second after the first
598
failure on a given file, then waiting 2 seconds after the second
599
failure on that file, up to the maximum number of SECONDS you
600
specify. Therefore, a value of 10 will actually make Wget wait up
601
to (1 + 2 + ... + 10) = 55 seconds per file.
603
Note that this option is turned on by default in the global
607
Some web sites may perform log analysis to identify retrieval
608
programs such as Wget by looking for statistically significant
609
similarities in the time between requests. This option causes the
610
time between requests to vary between 0 and 2 * WAIT seconds,
611
where WAIT was specified using the `--wait' option, in order to
612
mask Wget's presence from such analysis.
614
A recent article in a publication devoted to development on a
615
popular consumer platform provided code to perform this analysis
616
on the fly. Its author suggested blocking at the class C address
617
level to ensure automated retrieval programs were blocked despite
618
changing DHCP-supplied addresses.
620
The `--random-wait' option was inspired by this ill-advised
621
recommendation to block many unrelated users from a web site due
622
to the actions of one.
626
Turn proxy support on or off. The proxy is on by default if the
627
appropriate environment variable is defined.
629
For more information about the use of proxies with Wget, *Note
634
Specify download quota for automatic retrievals. The value can be
635
specified in bytes (default), kilobytes (with `k' suffix), or
636
megabytes (with `m' suffix).
638
Note that quota will never affect downloading a single file. So
639
if you specify `wget -Q10k ftp://wuarchive.wustl.edu/ls-lR.gz',
640
all of the `ls-lR.gz' will be downloaded. The same goes even when
641
several URLs are specified on the command-line. However, quota is
642
respected when retrieving either recursively, or from an input
643
file. Thus you may safely type `wget -Q2m -i sites'--download
644
will be aborted when the quota is exceeded.
646
Setting quota to 0 or to `inf' unlimits the download quota.
649
Turn off caching of DNS lookups. Normally, Wget remembers the
650
addresses it looked up from DNS so it doesn't have to repeatedly
651
contact the DNS server for the same (typically small) set of
652
addresses it retrieves from. This cache exists in memory only; a
653
new Wget run will contact DNS again.
655
However, in some cases it is not desirable to cache host names,
656
even for the duration of a short-running application like Wget.
657
For example, some HTTP servers are hosted on machines with
658
dynamically allocated IP addresses that change from time to time.
659
Their DNS entries are updated along with each change. When Wget's
660
download from such a host gets interrupted by IP address change,
661
Wget retries the download, but (due to DNS caching) it contacts
662
the old address. With the DNS cache turned off, Wget will repeat
663
the DNS lookup for every connect and will thus get the correct
664
dynamic address every time--at the cost of additional DNS lookups
665
where they're probably not needed.
667
If you don't understand the above description, you probably won't
670
`--restrict-file-names=MODE'
671
Change which characters found in remote URLs may show up in local
672
file names generated from those URLs. Characters that are
673
"restricted" by this option are escaped, i.e. replaced with `%HH',
674
where `HH' is the hexadecimal number that corresponds to the
675
restricted character.
677
By default, Wget escapes the characters that are not valid as part
678
of file names on your operating system, as well as control
679
characters that are typically unprintable. This option is useful
680
for changing these defaults, either because you are downloading to
681
a non-native partition, or because you want to disable escaping of
682
the control characters.
684
When mode is set to "unix", Wget escapes the character `/' and the
685
control characters in the ranges 0-31 and 128-159. This is the
686
default on Unix-like OS'es.
688
When mode is set to "windows", Wget escapes the characters `\',
689
`|', `/', `:', `?', `"', `*', `<', `>', and the control characters
690
in the ranges 0-31 and 128-159. In addition to this, Wget in
691
Windows mode uses `+' instead of `:' to separate host and port in
692
local file names, and uses `@' instead of `?' to separate the
693
query portion of the file name from the rest. Therefore, a URL
694
that would be saved as `www.xemacs.org:4300/search.pl?input=blah'
695
in Unix mode would be saved as
696
`www.xemacs.org+4300/search.pl@input=blah' in Windows mode. This
697
mode is the default on Windows.
699
If you append `,nocontrol' to the mode, as in `unix,nocontrol',
700
escaping of the control characters is also switched off. You can
701
use `--restrict-file-names=nocontrol' to turn off escaping of
702
control characters without affecting the choice of the OS to use
703
as file name restriction mode.
706
File: wget.info, Node: Directory Options, Next: HTTP Options, Prev: Download Options, Up: Invoking
713
Do not create a hierarchy of directories when retrieving
714
recursively. With this option turned on, all files will get saved
715
to the current directory, without clobbering (if a name shows up
716
more than once, the filenames will get extensions `.n').
719
`--force-directories'
720
The opposite of `-nd'--create a hierarchy of directories, even if
721
one would not have been created otherwise. E.g. `wget -x
722
http://fly.srk.fer.hr/robots.txt' will save the downloaded file to
723
`fly.srk.fer.hr/robots.txt'.
726
`--no-host-directories'
727
Disable generation of host-prefixed directories. By default,
728
invoking Wget with `-r http://fly.srk.fer.hr/' will create a
729
structure of directories beginning with `fly.srk.fer.hr/'. This
730
option disables such behavior.
733
Ignore NUMBER directory components. This is useful for getting a
734
fine-grained control over the directory where recursive retrieval
737
Take, for example, the directory at
738
`ftp://ftp.xemacs.org/pub/xemacs/'. If you retrieve it with `-r',
739
it will be saved locally under `ftp.xemacs.org/pub/xemacs/'.
740
While the `-nH' option can remove the `ftp.xemacs.org/' part, you
741
are still stuck with `pub/xemacs'. This is where `--cut-dirs'
742
comes in handy; it makes Wget not "see" NUMBER remote directory
743
components. Here are several examples of how `--cut-dirs' option
746
No options -> ftp.xemacs.org/pub/xemacs/
748
-nH --cut-dirs=1 -> xemacs/
749
-nH --cut-dirs=2 -> .
751
--cut-dirs=1 -> ftp.xemacs.org/xemacs/
754
If you just want to get rid of the directory structure, this
755
option is similar to a combination of `-nd' and `-P'. However,
756
unlike `-nd', `--cut-dirs' does not lose with subdirectories--for
757
instance, with `-nH --cut-dirs=1', a `beta/' subdirectory will be
758
placed to `xemacs/beta', as one would expect.
761
`--directory-prefix=PREFIX'
762
Set directory prefix to PREFIX. The "directory prefix" is the
763
directory where all other files and subdirectories will be saved
764
to, i.e. the top of the retrieval tree. The default is `.' (the
768
File: wget.info, Node: HTTP Options, Next: FTP Options, Prev: Directory Options, Up: Invoking
775
If a file of type `application/xhtml+xml' or `text/html' is
776
downloaded and the URL does not end with the regexp
777
`\.[Hh][Tt][Mm][Ll]?', this option will cause the suffix `.html'
778
to be appended to the local filename. This is useful, for
779
instance, when you're mirroring a remote site that uses `.asp'
780
pages, but you want the mirrored pages to be viewable on your
781
stock Apache server. Another good use for this is when you're
782
downloading CGI-generated materials. A URL like
783
`http://site.com/article.cgi?25' will be saved as
784
`article.cgi?25.html'.
786
Note that filenames changed in this way will be re-downloaded
787
every time you re-mirror a site, because Wget can't tell that the
788
local `X.html' file corresponds to remote URL `X' (since it
789
doesn't yet know that the URL produces output of type `text/html'
790
or `application/xhtml+xml'. To prevent this re-downloading, you
791
must use `-k' and `-K' so that the original version of the file
792
will be saved as `X.orig' (*note Recursive Retrieval Options::).
795
`--http-passwd=PASSWORD'
796
Specify the username USER and password PASSWORD on an HTTP server.
797
According to the type of the challenge, Wget will encode them
798
using either the `basic' (insecure) or the `digest' authentication
801
Another way to specify username and password is in the URL itself
802
(*note URL Format::). Either method reveals your password to
803
anyone who bothers to run `ps'. To prevent the passwords from
804
being seen, store them in `.wgetrc' or `.netrc', and make sure to
805
protect those files from other users with `chmod'. If the
806
passwords are really important, do not leave them lying in those
807
files either--edit the files and delete them after Wget has
808
started the download.
810
For more information about security issues with Wget, *Note
811
Security Considerations::.
815
When set to off, disable server-side cache. In this case, Wget
816
will send the remote server an appropriate directive (`Pragma:
817
no-cache') to get the file from the remote service, rather than
818
returning the cached version. This is especially useful for
819
retrieving and flushing out-of-date documents on proxy servers.
821
Caching is allowed by default.
824
When set to off, disable the use of cookies. Cookies are a
825
mechanism for maintaining server-side state. The server sends the
826
client a cookie using the `Set-Cookie' header, and the client
827
responds with the same cookie upon further requests. Since
828
cookies allow the server owners to keep track of visitors and for
829
sites to exchange this information, some consider them a breach of
830
privacy. The default is to use cookies; however, _storing_
831
cookies is not on by default.
833
`--load-cookies FILE'
834
Load cookies from FILE before the first HTTP retrieval. FILE is a
835
textual file in the format originally used by Netscape's
838
You will typically use this option when mirroring sites that
839
require that you be logged in to access some or all of their
840
content. The login process typically works by the web server
841
issuing an HTTP cookie upon receiving and verifying your
842
credentials. The cookie is then resent by the browser when
843
accessing that part of the site, and so proves your identity.
845
Mirroring such a site requires Wget to send the same cookies your
846
browser sends when communicating with the site. This is achieved
847
by `--load-cookies'--simply point Wget to the location of the
848
`cookies.txt' file, and it will send the same cookies your browser
849
would send in the same situation. Different browsers keep textual
850
cookie files in different locations:
853
The cookies are in `~/.netscape/cookies.txt'.
855
Mozilla and Netscape 6.x.
856
Mozilla's cookie file is also named `cookies.txt', located
857
somewhere under `~/.mozilla', in the directory of your
858
profile. The full path usually ends up looking somewhat like
859
`~/.mozilla/default/SOME-WEIRD-STRING/cookies.txt'.
862
You can produce a cookie file Wget can use by using the File
863
menu, Import and Export, Export Cookies. This has been
864
tested with Internet Explorer 5; it is not guaranteed to work
865
with earlier versions.
868
If you are using a different browser to create your cookies,
869
`--load-cookies' will only work if you can locate or produce a
870
cookie file in the Netscape format that Wget expects.
872
If you cannot use `--load-cookies', there might still be an
873
alternative. If your browser supports a "cookie manager", you can
874
use it to view the cookies used when accessing the site you're
875
mirroring. Write down the name and value of the cookie, and
876
manually instruct Wget to send those cookies, bypassing the
877
"official" cookie support:
879
wget --cookies=off --header "Cookie: NAME=VALUE"
881
`--save-cookies FILE'
882
Save cookies to FILE at the end of session. Cookies whose expiry
883
time is not specified, or those that have already expired, are not
887
Unfortunately, some HTTP servers (CGI programs, to be more
888
precise) send out bogus `Content-Length' headers, which makes Wget
889
go wild, as it thinks not all the document was retrieved. You can
890
spot this syndrome if Wget retries getting the same document again
891
and again, each time claiming that the (otherwise normal)
892
connection has closed on the very same byte.
894
With this option, Wget will ignore the `Content-Length' header--as
897
`--header=ADDITIONAL-HEADER'
898
Define an ADDITIONAL-HEADER to be passed to the HTTP servers.
899
Headers must contain a `:' preceded by one or more non-blank
900
characters, and must not contain newlines.
902
You may define more than one additional header by specifying
903
`--header' more than once.
905
wget --header='Accept-Charset: iso-8859-2' \
906
--header='Accept-Language: hr' \
907
http://fly.srk.fer.hr/
909
Specification of an empty string as the header value will clear all
910
previous user-defined headers.
913
`--proxy-passwd=PASSWORD'
914
Specify the username USER and password PASSWORD for authentication
915
on a proxy server. Wget will encode them using the `basic'
916
authentication scheme.
918
Security considerations similar to those with `--http-passwd'
919
pertain here as well.
922
Include `Referer: URL' header in HTTP request. Useful for
923
retrieving documents with server-side processing that assume they
924
are always being retrieved by interactive web browsers and only
925
come out properly when Referer is set to one of the pages that
930
Save the headers sent by the HTTP server to the file, preceding the
931
actual contents, with an empty line as the separator.
934
`--user-agent=AGENT-STRING'
935
Identify as AGENT-STRING to the HTTP server.
937
The HTTP protocol allows the clients to identify themselves using a
938
`User-Agent' header field. This enables distinguishing the WWW
939
software, usually for statistical purposes or for tracing of
940
protocol violations. Wget normally identifies as `Wget/VERSION',
941
VERSION being the current version number of Wget.
943
However, some sites have been known to impose the policy of
944
tailoring the output according to the `User-Agent'-supplied
945
information. While conceptually this is not such a bad idea, it
946
has been abused by servers denying information to clients other
947
than `Mozilla' or Microsoft `Internet Explorer'. This option
948
allows you to change the `User-Agent' line issued by Wget. Use of
949
this option is discouraged, unless you really know what you are
954
Use POST as the method for all HTTP requests and send the
955
specified data in the request body. `--post-data' sends STRING as
956
data, whereas `--post-file' sends the contents of FILE. Other than
957
that, they work in exactly the same way.
959
Please be aware that Wget needs to know the size of the POST data
960
in advance. Therefore the argument to `--post-file' must be a
961
regular file; specifying a FIFO or something like `/dev/stdin'
962
won't work. It's not quite clear how to work around this
963
limitation inherent in HTTP/1.0. Although HTTP/1.1 introduces
964
"chunked" transfer that doesn't require knowing the request length
965
in advance, a client can't use chunked unless it knows it's
966
talking to an HTTP/1.1 server. And it can't know that until it
967
receives a response, which in turn requires the request to have
968
been completed - a chicken-and-egg problem.
970
Note: if Wget is redirected after the POST request is completed,
971
it will not send the POST data to the redirected URL. This is
972
because URLs that process POST often respond with a redirection to
973
a regular page (although that's technically disallowed), which
974
does not desire or accept POST. It is not yet clear that this
975
behavior is optimal; if it doesn't work out, it will be changed.
977
This example shows how to log to a server using POST and then
978
proceed to download the desired pages, presumably only accessible
981
# Log in to the server. This can be done only once.
982
wget --save-cookies cookies.txt \
983
--post-data 'user=foo&password=bar' \
984
http://server.com/auth.php
986
# Now grab the page or pages we care about.
987
wget --load-cookies cookies.txt \
988
-p http://server.com/interesting/article.php
991
File: wget.info, Node: FTP Options, Next: Recursive Retrieval Options, Prev: HTTP Options, Up: Invoking
997
`--dont-remove-listing'
998
Don't remove the temporary `.listing' files generated by FTP
999
retrievals. Normally, these files contain the raw directory
1000
listings received from FTP servers. Not removing them can be
1001
useful for debugging purposes, or when you want to be able to
1002
easily check on the contents of remote server directories (e.g. to
1003
verify that a mirror you're running is complete).
1005
Note that even though Wget writes to a known filename for this
1006
file, this is not a security hole in the scenario of a user making
1007
`.listing' a symbolic link to `/etc/passwd' or something and
1008
asking `root' to run Wget in his or her directory. Depending on
1009
the options used, either Wget will refuse to write to `.listing',
1010
making the globbing/recursion/time-stamping operation fail, or the
1011
symbolic link will be deleted and replaced with the actual
1012
`.listing' file, or the listing will be written to a
1013
`.listing.NUMBER' file.
1015
Even though this situation isn't a problem, though, `root' should
1016
never run Wget in a non-trusted user's directory. A user could do
1017
something as simple as linking `index.html' to `/etc/passwd' and
1018
asking `root' to run Wget with `-N' or `-r' so the file will be
1023
Turn FTP globbing on or off. Globbing means you may use the
1024
shell-like special characters ("wildcards"), like `*', `?', `['
1025
and `]' to retrieve more than one file from the same directory at
1028
wget ftp://gnjilux.srk.fer.hr/*.msg
1030
By default, globbing will be turned on if the URL contains a
1031
globbing character. This option may be used to turn globbing on
1034
You may have to quote the URL to protect it from being expanded by
1035
your shell. Globbing makes Wget look for a directory listing,
1036
which is system-specific. This is why it currently works only
1037
with Unix FTP servers (and the ones emulating Unix `ls' output).
1040
Use the "passive" FTP retrieval scheme, in which the client
1041
initiates the data connection. This is sometimes required for FTP
1042
to work behind firewalls.
1045
Usually, when retrieving FTP directories recursively and a symbolic
1046
link is encountered, the linked-to file is not downloaded.
1047
Instead, a matching symbolic link is created on the local
1048
filesystem. The pointed-to file will not be downloaded unless
1049
this recursive retrieval would have encountered it separately and
1050
downloaded it anyway.
1052
When `--retr-symlinks' is specified, however, symbolic links are
1053
traversed and the pointed-to files are retrieved. At this time,
1054
this option does not cause Wget to traverse symlinks to
1055
directories and recurse through them, but in the future it should
1056
be enhanced to do this.
1058
Note that when retrieving a file (not a directory) because it was
1059
specified on the command-line, rather than because it was recursed
1060
to, this option has no effect. Symbolic links are always
1061
traversed in this case.