~ubuntu-branches/ubuntu/trusty/wget/trusty-updates

Viewing changes to doc/wget.pod

Committer: Bazaar Package Importer
Author(s): Marc Deslauriers
Date: 2009-12-12 08:15:59 UTC
mfrom: (2.1.5 squeeze)
Revision ID: james.westby@ubuntu.com-20091212081559-mvccl4kzdqb138y3

Tags: 1.12-1.1ubuntu1

* Merge from debian testing, remaining changes:
  - Add wget-udeb to ship wget.gnu as alternative to busybox wget
    implementation.
* Keep build dependencies in main:
  - debian/control: remove info2man build-dep
  - debian/patches/00list: disable wget-infopod_generated_manpage.dpatch

files added:
ABOUT-NLS

GNUmakefile

Makefile.am

aclocal.m4

build-aux

build-aux/announce-gen

build-aux/build_info.pl

build-aux/compile

build-aux/config.guess

build-aux/config.rpath

build-aux/config.sub

build-aux/depcomp

build-aux/gnupload

build-aux/install-sh

build-aux/link-warning.h

build-aux/mdate-sh

build-aux/missing

build-aux/mkinstalldirs

build-aux/texinfo.tex

build-aux/update-copyright

build-aux/useless-if-before-free

build-aux/vc-list-files

build-aux/ylwrap

configure.ac

debian/patches/wget-infopod_generated_manpage.dpatch

doc/Makefile.am

doc/stamp-vti

lib/Makefile.am

lib/Makefile.in

lib/alloca.c

lib/alloca.in.h

lib/c-ctype.c

lib/c-ctype.h

lib/config.charset

lib/errno.in.h

lib/error.c

lib/error.h

lib/exitfail.c

lib/exitfail.h

lib/fseeko.c

lib/getdelim.c

lib/getline.c

lib/getopt.c

lib/getopt.in.h

lib/getopt1.c

lib/getopt_int.h

lib/getpagesize.c

lib/getpass.c

lib/getpass.h

lib/gettext.h

lib/intprops.h

lib/localcharset.c

lib/localcharset.h

lib/lseek.c

lib/mbrtowc.c

lib/mbsinit.c

lib/memchr.c

lib/memchr.valgrind

lib/quote.c

lib/quote.h

lib/quotearg.c

lib/quotearg.h

lib/realloc.c

lib/ref-add.sin

lib/ref-del.sin

lib/stdbool.in.h

lib/stddef.in.h

lib/stdint.in.h

lib/stdio-impl.h

lib/stdio-write.c

lib/stdio.in.h

lib/stdlib.in.h

lib/str-two-way.h

lib/strcasecmp.c

lib/strcasestr.c

lib/streq.h

lib/strerror.c

lib/string.in.h

lib/strings.in.h

lib/strncasecmp.c

lib/unistd.in.h

lib/verify.h

lib/wchar.in.h

lib/wctype.in.h

lib/xalloc-die.c

lib/xalloc.h

lib/xmalloc.c

m4/00gnulib.m4

m4/alloca.m4

m4/codeset.m4

m4/errno_h.m4

m4/error.m4

m4/exitfail.m4

m4/extensions.m4

m4/fseeko.m4

m4/getdelim.m4

m4/getline.m4

m4/getopt.m4

m4/getpagesize.m4

m4/getpass.m4

m4/gettext.m4

m4/glibc21.m4

m4/gnulib-common.m4

m4/gnulib-comp.m4

m4/iconv.m4

m4/include_next.m4

m4/inline.m4

m4/localcharset.m4

m4/locale-fr.m4

m4/locale-ja.m4

m4/locale-zh.m4

m4/longlong.m4

m4/lseek.m4

m4/malloc.m4

m4/mbrtowc.m4

m4/mbsinit.m4

m4/mbstate_t.m4

m4/memchr.m4

m4/mmap-anon.m4

m4/multiarch.m4

m4/nls.m4

m4/po.m4

m4/quote.m4

m4/quotearg.m4

m4/realloc.m4

m4/stdbool.m4

m4/stddef_h.m4

m4/stdint.m4

m4/stdio_h.m4

m4/stdlib_h.m4

m4/strcase.m4

m4/strcasestr.m4

m4/strerror.m4

m4/string_h.m4

m4/strings_h.m4

m4/unistd_h.m4

m4/wchar.m4

m4/wchar_t.m4

m4/wctype.m4

m4/wint_t.m4

m4/xalloc.m4

maint.mk

md5/Makefile.am

md5/Makefile.in

md5/dummy.c

md5/m4

md5/m4/gnulib-cache.m4

md5/m4/gnulib-comp.m4

md5/m4/md5.m4

md5/md5.c

md5/md5.h

md5/stddef.in.h

md5/stdint.in.h

md5/wchar.in.h

po/Makevars

po/Rules-quot

po/be.gmo

po/bg.gmo

po/boldquot.sed

po/ca.gmo

po/cs.gmo

po/da.gmo

po/de.gmo

po/el.gmo

po/en@boldquot.gmo

po/en@boldquot.header

po/en@boldquot.po

po/en@quot.gmo

po/en@quot.header

po/en@quot.po

po/en_GB.gmo

po/en_US.gmo

po/en_US.po

po/eo.gmo

po/es.gmo

po/et.gmo

po/eu.gmo

po/fi.gmo

po/fr.gmo

po/ga.gmo

po/gl.gmo

po/he.gmo

po/hr.gmo

po/hu.gmo

po/id.gmo

po/insert-header.sin

po/it.gmo

po/ja.gmo

po/lt.gmo

po/lt.po

po/nb.gmo

po/nl.gmo

po/pl.gmo

po/pt.gmo

po/pt_BR.gmo

po/quot.sed

po/remove-potcdate.sin

po/ro.gmo

po/ru.gmo

po/sk.gmo

po/sl.gmo

po/sr.gmo

po/stamp-po

po/sv.gmo

po/tr.gmo

po/uk.gmo

po/vi.gmo

po/zh_CN.gmo

po/zh_TW.gmo

src/Makefile.am

src/build_info.c

src/build_info.c.in

src/css-tokens.h

src/css-url.c

src/css-url.h

src/css.c

src/css.l

src/exits.c

src/exits.h

src/gettext.h

src/html-url.h

src/iri.c

src/iri.h

tests/Makefile.am

tests/Test-N-no-info.px

tests/Test-N-smaller.px

tests/Test-O-nc.px

tests/Test-auth-no-challenge-url.px

tests/Test-auth-no-challenge.px

tests/Test-auth-with-content-disposition.px

tests/Test-c-shorter.px

tests/Test-cookies-401.px

tests/Test-cookies.px

tests/Test-ftp-bad-list.px

tests/Test-ftp-iri-disabled.px

tests/Test-ftp-iri-fallback.px

tests/Test-ftp-iri-recursive.px

tests/Test-ftp-iri.px

tests/Test-ftp-pasv-fail.px

tests/Test-ftp-recursive.px

tests/Test-idn-cmd.px

tests/Test-idn-headers.px

tests/Test-idn-meta.px

tests/Test-idn-robots.px

tests/Test-iri-disabled.px

tests/Test-iri-forced-remote.px

tests/Test-iri-list.px

tests/Test-iri-percent.px

tests/Test-iri.px

tests/Test-k.px

tests/Test-meta-robots.px

tests/Test-proxied-https-auth.px

tests/Test-proxy-auth-basic.px

tests/Test-restrict-ascii.px

tests/WgetFeature.cfg

tests/WgetFeature.pm

tests/certs/server-cert.pem

tests/certs/server-key.pem

tests/run-px

util/Makefile.am

util/trunc.c

windows/Makefile.am

files removed:
DISTFILES

autom4te.cache

config.rpath

configure.in

debian/patches/00template

debian/patches/security-CVE-2009-3490.dpatch

doc/texinfo.tex

doc/wget.1

doc/wget.pod

install-sh

mkinstalldirs

src/alloca.c

src/config-post.h

src/getopt.c

src/getopt.h

src/gnu-md5.c

src/gnu-md5.h

src/safe-ctype.c

src/safe-ctype.h

src/version.c

src/xmalloc.c

src/xmalloc.h

stamp-h.in

tests/README

tests/Test--spider--no-content-disposition-trivial.px

tests/Test--spider--no-content-disposition.px

tests/Test--spider-HTTP-Content-Disposition.px

tests/WgetTest.pm

files modified:
AUTHORS

ChangeLog

ChangeLog.README

INSTALL

MAILING-LIST

Makefile.in

NEWS

README

autogen.sh

config.guess *

config.sub *

configure

configure.bat

debian/changelog

debian/compat

debian/control

debian/copyright

debian/patches/00list

debian/patches/wget-doc-remove-usr-local-in-sample.wgetrc

debian/rules

doc/ChangeLog

doc/Makefile.in

doc/fdl.texi

doc/sample.wgetrc

doc/sample.wgetrc.munged_for_texi_inclusion

doc/texi2pod.pl

doc/version.texi

doc/wget.info

doc/wget.texi

m4/lib-ld.m4

m4/lib-link.m4

m4/lib-prefix.m4

m4/wget.m4

msdos/ChangeLog

msdos/Makefile.DJ

msdos/Makefile.WC

msdos/config.h

po/Makefile.in.in

po/POTFILES.in

po/be.po

po/bg.po

po/ca.po

po/cs.po

po/da.po

po/de.po

po/el.po

po/en_GB.po

po/eo.po

po/es.po

po/et.po

po/eu.po

po/fi.po

po/fr.po

po/ga.po

po/gl.po

po/he.po

po/hr.po

po/hu.po

po/id.po

po/it.po

po/ja.po

po/nb.po

po/nl.po

po/pl.po

po/pt.po

po/pt_BR.po

po/ro.po

po/ru.po

po/sk.po

po/sl.po

po/sr.po

po/sv.po

po/tr.po

po/uk.po

po/vi.po

po/wget.pot

po/zh_CN.po

po/zh_TW.po

src/ChangeLog

src/Makefile.in

src/cmpt.c

src/config.h.in

src/connect.c

src/connect.h

src/convert.c

src/convert.h

src/cookies.c

src/cookies.h

src/ftp-basic.c

src/ftp-ls.c

src/ftp-opie.c

src/ftp.c

src/ftp.h

src/gen-md5.c

src/gen-md5.h

src/gnutls.c

src/hash.c

src/hash.h

src/host.c

src/host.h

src/html-parse.c

src/html-parse.h

src/html-url.c

src/http-ntlm.c

src/http-ntlm.h

src/http.c

src/http.h

src/init.c

src/init.h

src/log.c

src/log.h

src/main.c

src/mswindows.c

src/mswindows.h

src/netrc.c

src/netrc.h

src/openssl.c

src/options.h

src/progress.c

src/progress.h

src/ptimer.c

src/ptimer.h

src/recur.c

src/recur.h

src/res.c

src/res.h

src/retr.c

src/retr.h

src/snprintf.c

src/spider.c

src/spider.h

src/ssl.h

src/sysdep.h

src/test.c

src/test.h

src/url.c

src/url.h

src/utils.c

src/utils.h

src/wget.h

tests/ChangeLog

tests/FTPServer.pm

tests/FTPTest.pm

tests/HTTPServer.pm

tests/HTTPTest.pm

tests/Makefile.in

tests/Test--no-content-disposition-trivial.px

tests/Test--no-content-disposition.px

tests/Test--spider-fail.px

tests/Test--spider-r--no-content-disposition-trivial.px

tests/Test--spider-r--no-content-disposition.px

tests/Test--spider-r-HTTP-Content-Disposition.px

tests/Test--spider-r.px

tests/Test--spider.px

tests/Test-E-k-K.px

tests/Test-E-k.px

tests/Test-HTTP-Content-Disposition-1.px

tests/Test-HTTP-Content-Disposition-2.px

tests/Test-HTTP-Content-Disposition.px

tests/Test-N--no-content-disposition-trivial.px

tests/Test-N--no-content-disposition.px

tests/Test-N-HTTP-Content-Disposition.px

tests/Test-N-current.px

tests/Test-N-old.px

tests/Test-N.px

tests/Test-O--no-content-disposition-trivial.px

tests/Test-O--no-content-disposition.px

tests/Test-O-HTTP-Content-Disposition.px

tests/Test-O-nonexisting.px

tests/Test-O.px

tests/Test-Restrict-Lowercase.px

tests/Test-Restrict-Uppercase.px

tests/Test-auth-basic.px

tests/Test-c-full.px

tests/Test-c-partial.px

tests/Test-c.px

tests/Test-ftp.px

tests/Test-nonexisting-quiet.px

tests/Test-noop.px

tests/Test-np.px

tests/WgetTest.pm.in

util/Makefile.in

util/README

util/rmold.pl

windows/ChangeLog

windows/Makefile.doc

windows/Makefile.in

windows/Makefile.src

windows/Makefile.top

windows/Makefile.top.bor

windows/Makefile.top.mingw

windows/README

windows/config-compiler.h

windows/config.h

Show diffs side-by-side

added added

removed removed

doc/wget.pod

=head1 NAME

Wget - The non-interactive network downloader.

=head1 SYNOPSIS

wget [I<option>]... [I<URL>]...

=head1 DESCRIPTION

GNU Wget is a free utility for non-interactive download of files from

the Web. It supports HTTP, HTTPS, and FTP protocols, as

well as retrieval through HTTP proxies.

Wget is non-interactive, meaning that it can work in the background,

while the user is not logged on. This allows you to start a retrieval

and disconnect from the system, letting Wget finish the work. By

contrast, most of the Web browsers require constant user's presence,

which can be a great hindrance when transferring a lot of data.

Wget can follow links in HTML and XHTML pages and create local

versions of remote web sites, fully recreating the directory structure of

the original site. This is sometimes referred to as "recursive

downloading." While doing that, Wget respects the Robot Exclusion

Standard (F</robots.txt>). Wget can be instructed to convert the

links in downloaded HTML files to the local files for offline

viewing.

Wget has been designed for robustness over slow or unstable network

connections; if a download fails due to a network problem, it will

keep retrying until the whole file has been retrieved. If the server

supports regetting, it will instruct the server to continue the

download from where it left off.

=head1 OPTIONS

=head2 Option Syntax

Since Wget uses GNU getopt to process command-line arguments, every

option has a long form along with the short one. Long options are

more convenient to remember, but take time to type. You may freely

mix different option styles, or specify options after the command-line

arguments. Thus you may write:

wget -r --tries=10 http://fly.srk.fer.hr/ -o log

The space between the option accepting an argument and the argument may

be omitted. Instead of B<-o log> you can write B<-olog>.

You may put several options that do not require arguments together,

like:

wget -drc <URL>

This is a complete equivalent of:

wget -d -r -c <URL>

Since the options can be specified after the arguments, you may

terminate them with B<-->. So the following will try to download

URL B<-x>, reporting failure to F<log>:

wget -o log -- -x

The options that accept comma-separated lists all respect the convention

that specifying an empty list clears its value. This can be useful to

clear the F<.wgetrc> settings. For instance, if your F<.wgetrc>

sets C<exclude_directories> to F</cgi-bin>, the following

example will first reset it, and then set it to exclude F</~nobody>

and F</~somebody>. You can also clear the lists in F<.wgetrc>.

wget -X " -X /~nobody,/~somebody

Most options that do not accept arguments are I<boolean> options,

so named because their state can be captured with a yes-or-no

("boolean") variable. For example, B<--follow-ftp> tells Wget

to follow FTP links from HTML files and, on the other hand,

B<--no-glob> tells it not to perform file globbing on FTP URLs. A

boolean option is either I<affirmative> or I<negative>

(beginning with B<--no>). All such options share several

properties.

Unless stated otherwise, it is assumed that the default behavior is

the opposite of what the option accomplishes. For example, the

documented existence of B<--follow-ftp> assumes that the default

is to I<not> follow FTP links from HTML pages.

100

101

Affirmative options can be negated by prepending the B<--no-> to

102

the option name; negative options can be negated by omitting the

103

B<--no-> prefix. This might seem superfluous---if the default for

104

an affirmative option is to not do something, then why provide a way

105

to explicitly turn it off? But the startup file may in fact change

106

the default. For instance, using C<follow_ftp = off> in

107

F<.wgetrc> makes Wget I<not> follow FTP links by default, and

108

using B<--no-follow-ftp> is the only way to restore the factory

109

default from the command line.

110

111

112

=head2 Basic Startup Options

113

114

115

116

=over 4

117

118

119

=item B<-V>

120

121

122

=item B<--version>

123

124

Display the version of Wget.

125

126

127

=item B<-h>

128

129

130

=item B<--help>

131

132

Print a help message describing all of Wget's command-line options.

133

134

135

=item B<-b>

136

137

138

=item B<--background>

139

140

Go to background immediately after startup. If no output file is

141

specified via the B<-o>, output is redirected to F<wget-log>.

142

143

144

=item B<-e> I<command>

145

146

147

=item B<--execute> I<command>

148

149

Execute I<command> as if it were a part of F<.wgetrc>. A command thus invoked will be executed

150

I<after> the commands in F<.wgetrc>, thus taking precedence over

151

them. If you need to specify more than one wgetrc command, use multiple

152

instances of B<-e>.

153

154

155

=back

156

157

158

159

=head2 Logging and Input File Options

160

161

162

163

=over 4

164

165

166

=item B<-o> I<logfile>

167

168

169

=item B<--output-file=>I<logfile>

170

171

Log all messages to I<logfile>. The messages are normally reported

172

to standard error.

173

174

175

=item B<-a> I<logfile>

176

177

178

=item B<--append-output=>I<logfile>

179

180

Append to I<logfile>. This is the same as B<-o>, only it appends

181

to I<logfile> instead of overwriting the old log file. If

182

I<logfile> does not exist, a new file is created.

183

184

185

=item B<-d>

186

187

188

=item B<--debug>

189

190

Turn on debug output, meaning various information important to the

191

developers of Wget if it does not work properly. Your system

192

administrator may have chosen to compile Wget without debug support, in

193

which case B<-d> will not work. Please note that compiling with

194

debug support is always safe---Wget compiled with the debug support will

195

I<not> print any debug info unless requested with B<-d>.

196

197

198

199

=item B<-q>

200

201

202

=item B<--quiet>

203

204

Turn off Wget's output.

205

206

207

=item B<-v>

208

209

210

=item B<--verbose>

211

212

Turn on verbose output, with all the available data. The default output

213

is verbose.

214

215

216

=item B<-nv>

217

218

219

=item B<--no-verbose>

220

221

Turn off verbose without being completely quiet (use B<-q> for

222

that), which means that error messages and basic information still get

223

printed.

224

225

226

=item B<-i> I<file>

227

228

229

=item B<--input-file=>I<file>

230

231

Read URLs from I<file>. If B<-> is specified as

232

I<file>, URLs are read from the standard input. (Use

233

B<./-> to read from a file literally named B<->.)

234

235

If this function is used, no URLs need be present on the command

236

line. If there are URLs both on the command line and in an input

237

file, those on the command lines will be the first ones to be

238

retrieved. The I<file> need not be an HTML document (but no

239

harm if it is)---it is enough if the URLs are just listed

240

sequentially.

241

242

However, if you specify B<--force-html>, the document will be

243

regarded as B<html>. In that case you may have problems with

244

relative links, which you can solve either by adding C<E<lt>base

245

href="I<url>"E<gt>> to the documents or by specifying

246

B<--base=>I<url> on the command line.

247

248

249

=item B<-F>

250

251

252

=item B<--force-html>

253

254

When input is read from a file, force it to be treated as an HTML

255

file. This enables you to retrieve relative links from existing

256

HTML files on your local disk, by adding C<E<lt>base

257

href="I<url>"E<gt>> to HTML, or using the B<--base> command-line

258

option.

259

260

261

=item B<-B> I<URL>

262

263

264

=item B<--base=>I<URL>

265

266

Prepends I<URL> to relative links read from the file specified with

267

the B<-i> option.

268

269

=back

270

271

272

273

=head2 Download Options

274

275

276

277

=over 4

278

279

280

=item B<--bind-address=>I<ADDRESS>

281

282

When making client TCP/IP connections, bind to I<ADDRESS> on

283

the local machine. I<ADDRESS> may be specified as a hostname or IP

284

address. This option can be useful if your machine is bound to multiple

285

IPs.

286

287

288

=item B<-t> I<number>

289

290

291

=item B<--tries=>I<number>

292

293

Set number of retries to I<number>. Specify 0 or B<inf> for

294

infinite retrying. The default is to retry 20 times, with the exception

295

of fatal errors like "connection refused" or "not found" (404),

296

which are not retried.

297

298

299

=item B<-O> I<file>

300

301

302

=item B<--output-document=>I<file>

303

304

The documents will not be written to the appropriate files, but all

305

will be concatenated together and written to I<file>. If B<->

306

is used as I<file>, documents will be printed to standard output,

307

disabling link conversion. (Use B<./-> to print to a file

308

literally named B<->.)

309

310

Use of B<-O> is I<not> intended to mean simply "use the name

311

I<file> instead of the one in the URL;" rather, it is

312

analogous to shell redirection:

313

B<wget -O file http://foo> is intended to work like

314

B<wget -O - http://foo E<gt> file>; F<file> will be truncated

315

immediately, and I<all> downloaded content will be written there.

316

317

For this reason, B<-N> (for timestamp-checking) is not supported

318

in combination with B<-O>: since I<file> is always newly

319

created, it will always have a very new timestamp. A warning will be

320

issued if this combination is used.

321

322

Similarly, using B<-r> or B<-p> with B<-O> may not work as

323

you expect: Wget won't just download the first file to I<file> and

324

then download the rest to their normal names: I<all> downloaded

325

content will be placed in I<file>. This was disabled in version

326

1.11, but has been reinstated (with a warning) in 1.11.2, as there are

327

some cases where this behavior can actually have some use.

328

329

Note that a combination with B<-k> is only permitted when

330

downloading a single document, as in that case it will just convert

331

all relative URIs to external ones; B<-k> makes no sense for

332

multiple URIs when they're all being downloaded to a single file.

333

334

335

=item B<-nc>

336

337

338

=item B<--no-clobber>

339

340

If a file is downloaded more than once in the same directory, Wget's

341

behavior depends on a few options, including B<-nc>. In certain

342

cases, the local file will be I<clobbered>, or overwritten, upon

343

repeated download. In other cases it will be preserved.

344

345

When running Wget without B<-N>, B<-nc>, B<-r>, or B<p>,

346

downloading the same file in the same directory will result in the

347

original copy of I<file> being preserved and the second copy being

348

named I<file>B<.1>. If that file is downloaded yet again, the

349

third copy will be named I<file>B<.2>, and so on. When

350

B<-nc> is specified, this behavior is suppressed, and Wget will

351

refuse to download newer copies of I<file>. Therefore,

352

"C<no-clobber>" is actually a misnomer in this mode---it's not

353

clobbering that's prevented (as the numeric suffixes were already

354

preventing clobbering), but rather the multiple version saving that's

355

prevented.

356

357

When running Wget with B<-r> or B<-p>, but without B<-N>

358

or B<-nc>, re-downloading a file will result in the new copy

359

simply overwriting the old. Adding B<-nc> will prevent this

360

behavior, instead causing the original version to be preserved and any

361

newer copies on the server to be ignored.

362

363

When running Wget with B<-N>, with or without B<-r> or

364

B<-p>, the decision as to whether or not to download a newer copy

365

of a file depends on the local and remote timestamp and size of the

366

file. B<-nc> may not be specified at the

367

same time as B<-N>.

368

369

Note that when B<-nc> is specified, files with the suffixes

370

B<.html> or B<.htm> will be loaded from the local disk and

371

parsed as if they had been retrieved from the Web.

372

373

374

=item B<-c>

375

376

377

=item B<--continue>

378

379

Continue getting a partially-downloaded file. This is useful when you

380

want to finish up a download started by a previous instance of Wget, or

381

by another program. For instance:

382

383

384

wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z

385

386

387

If there is a file named F<ls-lR.Z> in the current directory, Wget

388

will assume that it is the first portion of the remote file, and will

389

ask the server to continue the retrieval from an offset equal to the

390

length of the local file.

391

392

Note that you don't need to specify this option if you just want the

393

current invocation of Wget to retry downloading a file should the

394

connection be lost midway through. This is the default behavior.

395

B<-c> only affects resumption of downloads started I<prior> to

396

this invocation of Wget, and whose local files are still sitting around.

397

398

Without B<-c>, the previous example would just download the remote

399

file to F<ls-lR.Z.1>, leaving the truncated F<ls-lR.Z> file

400

alone.

401

402

Beginning with Wget 1.7, if you use B<-c> on a non-empty file, and

403

it turns out that the server does not support continued downloading,

404

Wget will refuse to start the download from scratch, which would

405

effectively ruin existing contents. If you really want the download to

406

start from scratch, remove the file.

407

408

Also beginning with Wget 1.7, if you use B<-c> on a file which is of

409

equal size as the one on the server, Wget will refuse to download the

410

file and print an explanatory message. The same happens when the file

411

is smaller on the server than locally (presumably because it was changed

412

on the server since your last download attempt)---because "continuing"

413

is not meaningful, no download occurs.

414

415

On the other side of the coin, while using B<-c>, any file that's

416

bigger on the server than locally will be considered an incomplete

417

download and only C<(length(remote) - length(local))> bytes will be

418

downloaded and tacked onto the end of the local file. This behavior can

419

be desirable in certain cases---for instance, you can use B<wget -c>

420

to download just the new portion that's been appended to a data

421

collection or log file.

422

423

However, if the file is bigger on the server because it's been

424

I<changed>, as opposed to just I<appended> to, you'll end up

425

with a garbled file. Wget has no way of verifying that the local file

426

is really a valid prefix of the remote file. You need to be especially

427

careful of this when using B<-c> in conjunction with B<-r>,

428

since every file will be considered as an "incomplete download" candidate.

429

430

Another instance where you'll get a garbled file if you try to use

431

B<-c> is if you have a lame HTTP proxy that inserts a

432

"transfer interrupted" string into the local file. In the future a

433

"rollback" option may be added to deal with this case.

434

435

Note that B<-c> only works with FTP servers and with HTTP

436

servers that support the C<Range> header.

437

438

439

=item B<--progress=>I<type>

440

441

Select the type of the progress indicator you wish to use. Legal

442

indicators are "dot" and "bar".

443

444

The "bar" indicator is used by default. It draws an ASCII progress

445

bar graphics (a.k.a "thermometer" display) indicating the status of

446

retrieval. If the output is not a TTY, the "dot" bar will be used by

447

default.

448

449

Use B<--progress=dot> to switch to the "dot" display. It traces

450

the retrieval by printing dots on the screen, each dot representing a

451

fixed amount of downloaded data.

452

453

When using the dotted retrieval, you may also set the I<style> by

454

specifying the type as B<dot:>I<style>. Different styles assign

455

different meaning to one dot. With the C<default> style each dot

456

represents 1K, there are ten dots in a cluster and 50 dots in a line.

457

The C<binary> style has a more "computer"-like orientation---8K

458

dots, 16-dots clusters and 48 dots per line (which makes for 384K

459

lines). The C<mega> style is suitable for downloading very large

460

files---each dot represents 64K retrieved, there are eight dots in a

461

cluster, and 48 dots on each line (so each line contains 3M).

462

463

Note that you can set the default style using the C<progress>

464

command in F<.wgetrc>. That setting may be overridden from the

465

command line. The exception is that, when the output is not a TTY, the

466

"dot" progress will be favored over "bar". To force the bar output,

467

use B<--progress=bar:force>.

468

469

470

=item B<-N>

471

472

473

=item B<--timestamping>

474

475

Turn on time-stamping.

476

477

478

=item B<-S>

479

480

481

=item B<--server-response>

482

483

Print the headers sent by HTTP servers and responses sent by

484

FTP servers.

485

486

487

=item B<--spider>

488

489

When invoked with this option, Wget will behave as a Web I<spider>,

490

which means that it will not download the pages, just check that they

491

are there. For example, you can use Wget to check your bookmarks:

492

493

494

wget --spider --force-html -i bookmarks.html

495

496

497

This feature needs much more work for Wget to get close to the

498

functionality of real web spiders.

499

500

501

=item B<-T seconds>

502

503

504

=item B<--timeout=>I<seconds>

505

506

Set the network timeout to I<seconds> seconds. This is equivalent

507

to specifying B<--dns-timeout>, B<--connect-timeout>, and

508

B<--read-timeout>, all at the same time.

509

510

When interacting with the network, Wget can check for timeout and

511

abort the operation if it takes too long. This prevents anomalies

512

like hanging reads and infinite connects. The only timeout enabled by

513

default is a 900-second read timeout. Setting a timeout to 0 disables

514

it altogether. Unless you know what you are doing, it is best not to

515

change the default timeout settings.

516

517

All timeout-related options accept decimal values, as well as

518

subsecond values. For example, B<0.1> seconds is a legal (though

519

unwise) choice of timeout. Subsecond timeouts are useful for checking

520

server response times or for testing network latency.

521

522

523

=item B<--dns-timeout=>I<seconds>

524

525

Set the DNS lookup timeout to I<seconds> seconds. DNS lookups that

526

don't complete within the specified time will fail. By default, there

527

is no timeout on DNS lookups, other than that implemented by system

528

libraries.

529

530

531

=item B<--connect-timeout=>I<seconds>

532

533

Set the connect timeout to I<seconds> seconds. TCP connections that

534

take longer to establish will be aborted. By default, there is no

535

connect timeout, other than that implemented by system libraries.

536

537

538

=item B<--read-timeout=>I<seconds>

539

540

Set the read (and write) timeout to I<seconds> seconds. The

541

"time" of this timeout refers to I<idle time>: if, at any point in

542

the download, no data is received for more than the specified number

543

of seconds, reading fails and the download is restarted. This option

544

does not directly affect the duration of the entire download.

545

546

Of course, the remote server may choose to terminate the connection

547

sooner than this option requires. The default read timeout is 900

548

seconds.

549

550

551

=item B<--limit-rate=>I<amount>

552

553

Limit the download speed to I<amount> bytes per second. Amount may

554

be expressed in bytes, kilobytes with the B<k> suffix, or megabytes

555

with the B<m> suffix. For example, B<--limit-rate=20k> will

556

limit the retrieval rate to 20KB/s. This is useful when, for whatever

557

reason, you don't want Wget to consume the entire available bandwidth.

558

559

This option allows the use of decimal numbers, usually in conjunction

560

with power suffixes; for example, B<--limit-rate=2.5k> is a legal

561

value.

562

563

Note that Wget implements the limiting by sleeping the appropriate

564

amount of time after a network read that took less time than specified

565

by the rate. Eventually this strategy causes the TCP transfer to slow

566

down to approximately the specified rate. However, it may take some

567

time for this balance to be achieved, so don't be surprised if limiting

568

the rate doesn't work well with very small files.

569

570

571

=item B<-w> I<seconds>

572

573

574

=item B<--wait=>I<seconds>

575

576

Wait the specified number of seconds between the retrievals. Use of

577

this option is recommended, as it lightens the server load by making the

578

requests less frequent. Instead of in seconds, the time can be

579

specified in minutes using the C<m> suffix, in hours using C<h>

580

suffix, or in days using C<d> suffix.

581

582

Specifying a large value for this option is useful if the network or the

583

destination host is down, so that Wget can wait long enough to

584

reasonably expect the network error to be fixed before the retry. The

585

waiting interval specified by this function is influenced by

586

C<--random-wait>, which see.

587

588

589

=item B<--waitretry=>I<seconds>

590

591

If you don't want Wget to wait between I<every> retrieval, but only

592

between retries of failed downloads, you can use this option. Wget will

593

use I<linear backoff>, waiting 1 second after the first failure on a

594

given file, then waiting 2 seconds after the second failure on that

595

file, up to the maximum number of I<seconds> you specify. Therefore,

596

a value of 10 will actually make Wget wait up to (1 + 2 + ... + 10) = 55

597

seconds per file.

598

599

Note that this option is turned on by default in the global

600

F<wgetrc> file.

601

602

603

=item B<--random-wait>

604

605

Some web sites may perform log analysis to identify retrieval programs

606

such as Wget by looking for statistically significant similarities in

607

the time between requests. This option causes the time between requests

608

to vary between 0.5 and 1.5 * I<wait> seconds, where I<wait> was

609

specified using the B<--wait> option, in order to mask Wget's

610

presence from such analysis.

611

612

A 2001 article in a publication devoted to development on a popular

613

consumer platform provided code to perform this analysis on the fly.

614

Its author suggested blocking at the class C address level to ensure

615

automated retrieval programs were blocked despite changing DHCP-supplied

616

addresses.

617

618

The B<--random-wait> option was inspired by this ill-advised

619

recommendation to block many unrelated users from a web site due to the

620

actions of one.

621

622

623

=item B<--no-proxy>

624

625

Don't use proxies, even if the appropriate C<*_proxy> environment

626

variable is defined.

627

628

629

630

=item B<-Q> I<quota>

631

632

633

=item B<--quota=>I<quota>

634

635

Specify download quota for automatic retrievals. The value can be

636

specified in bytes (default), kilobytes (with B<k> suffix), or

637

megabytes (with B<m> suffix).

638

639

Note that quota will never affect downloading a single file. So if you

640

specify B<wget -Q10k ftp://wuarchive.wustl.edu/ls-lR.gz>, all of the

641

F<ls-lR.gz> will be downloaded. The same goes even when several

642

URLs are specified on the command-line. However, quota is

643

respected when retrieving either recursively, or from an input file.

644

Thus you may safely type B<wget -Q2m -i sites>---download will be

645

aborted when the quota is exceeded.

646

647

Setting quota to 0 or to B<inf> unlimits the download quota.

648

649

650

=item B<--no-dns-cache>

651

652

Turn off caching of DNS lookups. Normally, Wget remembers the IP

653

addresses it looked up from DNS so it doesn't have to repeatedly

654

contact the DNS server for the same (typically small) set of hosts it

655

retrieves from. This cache exists in memory only; a new Wget run will

656

contact DNS again.

657

658

However, it has been reported that in some situations it is not

659

desirable to cache host names, even for the duration of a

660

short-running application like Wget. With this option Wget issues a

661

new DNS lookup (more precisely, a new call to C<gethostbyname> or

662

C<getaddrinfo>) each time it makes a new connection. Please note

663

that this option will I<not> affect caching that might be

664

performed by the resolving library or by an external caching layer,

665

such as NSCD.

666

667

If you don't understand exactly what this option does, you probably

668

won't need it.

669

670

671

=item B<--restrict-file-names=>I<mode>

672

673

Change which characters found in remote URLs may show up in local file

674

names generated from those URLs. Characters that are I<restricted>

675

by this option are escaped, i.e. replaced with B<%HH>, where

676

B<HH> is the hexadecimal number that corresponds to the restricted

677

character.

678

679

By default, Wget escapes the characters that are not valid as part of

680

file names on your operating system, as well as control characters that

681

are typically unprintable. This option is useful for changing these

682

defaults, either because you are downloading to a non-native partition,

683

or because you want to disable escaping of the control characters.

684

685

When mode is set to "unix", Wget escapes the character B</> and

686

the control characters in the ranges 0--31 and 128--159. This is the

687

default on Unix-like OS'es.

688

689

When mode is set to "windows", Wget escapes the characters B<\>,

690

B<|>, B</>, B<:>, B<?>, B<">, B<*>, B<E<lt>>,

691

B<E<gt>>, and the control characters in the ranges 0--31 and 128--159.

692

In addition to this, Wget in Windows mode uses B<+> instead of

693

B<:> to separate host and port in local file names, and uses

694

B<@> instead of B<?> to separate the query portion of the file

695

name from the rest. Therefore, a URL that would be saved as

696

B<www.xemacs.org:4300/search.pl?input=blah> in Unix mode would be

697

saved as B<www.xemacs.org+4300/search.pl@input=blah> in Windows

698

mode. This mode is the default on Windows.

699

700

If you append B<,nocontrol> to the mode, as in

701

B<unix,nocontrol>, escaping of the control characters is also

702

switched off. You can use B<--restrict-file-names=nocontrol> to

703

turn off escaping of control characters without affecting the choice of

704

the OS to use as file name restriction mode.

705

706

707

=item B<-4>

708

709

710

=item B<--inet4-only>

711

712

713

=item B<-6>

714

715

716

=item B<--inet6-only>

717

718

Force connecting to IPv4 or IPv6 addresses. With B<--inet4-only>

719

or B<-4>, Wget will only connect to IPv4 hosts, ignoring AAAA

720

records in DNS, and refusing to connect to IPv6 addresses specified in

721

URLs. Conversely, with B<--inet6-only> or B<-6>, Wget will

722

only connect to IPv6 hosts and ignore A records and IPv4 addresses.

723

724

Neither options should be needed normally. By default, an IPv6-aware

725

Wget will use the address family specified by the host's DNS record.

726

If the DNS responds with both IPv4 and IPv6 addresses, Wget will try

727

them in sequence until it finds one it can connect to. (Also see

728

C<--prefer-family> option described below.)

729

730

These options can be used to deliberately force the use of IPv4 or

731

IPv6 address families on dual family systems, usually to aid debugging

732

or to deal with broken network configuration. Only one of

733

B<--inet6-only> and B<--inet4-only> may be specified at the

734

same time. Neither option is available in Wget compiled without IPv6

735

support.

736

737

738

=item B<--prefer-family=IPv4/IPv6/none>

739

740

When given a choice of several addresses, connect to the addresses

741

with specified address family first. IPv4 addresses are preferred by

742

default.

743

744

This avoids spurious errors and connect attempts when accessing hosts

745

that resolve to both IPv6 and IPv4 addresses from IPv4 networks. For

746

example, B<www.kame.net> resolves to

747

B<2001:200:0:8002:203:47ff:fea5:3085> and to

748

B<203.178.141.194>. When the preferred family is C<IPv4>, the

749

IPv4 address is used first; when the preferred family is C<IPv6>,

750

the IPv6 address is used first; if the specified value is C<none>,

751

the address order returned by DNS is used without change.

752

753

Unlike B<-4> and B<-6>, this option doesn't inhibit access to

754

any address family, it only changes the I<order> in which the

755

addresses are accessed. Also note that the reordering performed by

756

this option is I<stable>---it doesn't affect order of addresses of

757

the same family. That is, the relative order of all IPv4 addresses

758

and of all IPv6 addresses remains intact in all cases.

759

760

761

=item B<--retry-connrefused>

762

763

Consider "connection refused" a transient error and try again.

764

Normally Wget gives up on a URL when it is unable to connect to the

765

site because failure to connect is taken as a sign that the server is

766

not running at all and that retries would not help. This option is

767

for mirroring unreliable sites whose servers tend to disappear for

768

short periods of time.

769

770

771

=item B<--user=>I<user>

772

773

774

=item B<--password=>I<password>

775

776

Specify the username I<user> and password I<password> for both

777

FTP and HTTP file retrieval. These parameters can be overridden

778

using the B<--ftp-user> and B<--ftp-password> options for

779

FTP connections and the B<--http-user> and B<--http-password>

780

options for HTTP connections.

781

782

=back

783

784

785

786

=head2 Directory Options

787

788

789

790

=over 4

791

792

793

=item B<-nd>

794

795

796

=item B<--no-directories>

797

798

Do not create a hierarchy of directories when retrieving recursively.

799

With this option turned on, all files will get saved to the current

800

directory, without clobbering (if a name shows up more than once, the

801

filenames will get extensions B<.n>).

802

803

804

=item B<-x>

805

806

807

=item B<--force-directories>

808

809

The opposite of B<-nd>---create a hierarchy of directories, even if

810

one would not have been created otherwise. E.g. B<wget -x

811

http://fly.srk.fer.hr/robots.txt> will save the downloaded file to

812

F<fly.srk.fer.hr/robots.txt>.

813

814

815

=item B<-nH>

816

817

818

=item B<--no-host-directories>

819

820

Disable generation of host-prefixed directories. By default, invoking

821

Wget with B<-r http://fly.srk.fer.hr/> will create a structure of

822

directories beginning with F<fly.srk.fer.hr/>. This option disables

823

such behavior.

824

825

826

=item B<--protocol-directories>

827

828

Use the protocol name as a directory component of local file names. For

829

example, with this option, B<wget -r http://>I<host> will save to

830

B<http/>I<host>B</...> rather than just to I<host>B</...>.

831

832

833

=item B<--cut-dirs=>I<number>

834

835

Ignore I<number> directory components. This is useful for getting a

836

fine-grained control over the directory where recursive retrieval will

837

be saved.

838

839

Take, for example, the directory at

840

B<ftp://ftp.xemacs.org/pub/xemacs/>. If you retrieve it with

841

B<-r>, it will be saved locally under

842

F<ftp.xemacs.org/pub/xemacs/>. While the B<-nH> option can

843

remove the F<ftp.xemacs.org/> part, you are still stuck with

844

F<pub/xemacs>. This is where B<--cut-dirs> comes in handy; it

845

makes Wget not "see" I<number> remote directory components. Here

846

are several examples of how B<--cut-dirs> option works.

847

848

849

No options -> ftp.xemacs.org/pub/xemacs/

850

-nH -> pub/xemacs/

851

-nH --cut-dirs=1 -> xemacs/

852

-nH --cut-dirs=2 -> .

853

854

--cut-dirs=1 -> ftp.xemacs.org/xemacs/

855

...

856

857

858

If you just want to get rid of the directory structure, this option is

859

similar to a combination of B<-nd> and B<-P>. However, unlike

860

B<-nd>, B<--cut-dirs> does not lose with subdirectories---for

861

instance, with B<-nH --cut-dirs=1>, a F<beta/> subdirectory will

862

be placed to F<xemacs/beta>, as one would expect.

863

864

865

=item B<-P> I<prefix>

866

867

868

=item B<--directory-prefix=>I<prefix>

869

870

Set directory prefix to I<prefix>. The I<directory prefix> is the

871

directory where all other files and subdirectories will be saved to,

872

i.e. the top of the retrieval tree. The default is B<.> (the

873

current directory).

874

875

=back

876

877

878

879

=head2 HTTP Options

880

881

882

883

=over 4

884

885

886

=item B<-E>

887

888

889

=item B<--html-extension>

890

891

If a file of type B<application/xhtml+xml> or B<text/html> is

892

downloaded and the URL does not end with the regexp

893

B<\.[Hh][Tt][Mm][Ll]?>, this option will cause the suffix B<.html>

894

to be appended to the local filename. This is useful, for instance, when

895

you're mirroring a remote site that uses B<.asp> pages, but you want

896

the mirrored pages to be viewable on your stock Apache server. Another

897

good use for this is when you're downloading CGI-generated materials. A URL

898

like B<http://site.com/article.cgi?25> will be saved as

899

F<article.cgi?25.html>.

900

901

Note that filenames changed in this way will be re-downloaded every time

902

you re-mirror a site, because Wget can't tell that the local

903

F<I<X>.html> file corresponds to remote URL I<X> (since

904

it doesn't yet know that the URL produces output of type

905

B<text/html> or B<application/xhtml+xml>. To prevent this

906

re-downloading, you must use B<-k> and B<-K> so that the original

907

version of the file will be saved as F<I<X>.orig>.

908

909

910

=item B<--http-user=>I<user>

911

912

913

=item B<--http-password=>I<password>

914

915

Specify the username I<user> and password I<password> on an

916

HTTP server. According to the type of the challenge, Wget will

917

encode them using either the C<basic> (insecure),

918

the C<digest>, or the Windows C<NTLM> authentication scheme.

919

920

Another way to specify username and password is in the URL itself. Either method reveals your password to anyone who

921

bothers to run C<ps>. To prevent the passwords from being seen,

922

store them in F<.wgetrc> or F<.netrc>, and make sure to protect

923

those files from other users with C<chmod>. If the passwords are

924

really important, do not leave them lying in those files either---edit

925

the files and delete them after Wget has started the download.

926

927

928

929

=item B<--no-cache>

930

931

Disable server-side cache. In this case, Wget will send the remote

932

server an appropriate directive (B<Pragma: no-cache>) to get the

933

file from the remote service, rather than returning the cached version.

934

This is especially useful for retrieving and flushing out-of-date

935

documents on proxy servers.

936

937

Caching is allowed by default.

938

939

940

=item B<--no-cookies>

941

942

Disable the use of cookies. Cookies are a mechanism for maintaining

943

server-side state. The server sends the client a cookie using the

944

C<Set-Cookie> header, and the client responds with the same cookie

945

upon further requests. Since cookies allow the server owners to keep

946

track of visitors and for sites to exchange this information, some

947

consider them a breach of privacy. The default is to use cookies;

948

however, I<storing> cookies is not on by default.

949

950

951

=item B<--load-cookies> I<file>

952

953

Load cookies from I<file> before the first HTTP retrieval.

954

I<file> is a textual file in the format originally used by Netscape's

955

F<cookies.txt> file.

956

957

You will typically use this option when mirroring sites that require

958

that you be logged in to access some or all of their content. The login

959

process typically works by the web server issuing an HTTP cookie

960

upon receiving and verifying your credentials. The cookie is then

961

resent by the browser when accessing that part of the site, and so

962

proves your identity.

963

964

Mirroring such a site requires Wget to send the same cookies your

965

browser sends when communicating with the site. This is achieved by

966

B<--load-cookies>---simply point Wget to the location of the

967

F<cookies.txt> file, and it will send the same cookies your browser

968

would send in the same situation. Different browsers keep textual

969

cookie files in different locations:

970

971

972

=over 4

973

974

975

=item @asis<Netscape 4.x.>

976

977

The cookies are in F<~/.netscape/cookies.txt>.

978

979

980

=item @asis<Mozilla and Netscape 6.x.>

981

982

Mozilla's cookie file is also named F<cookies.txt>, located

983

somewhere under F<~/.mozilla>, in the directory of your profile.

984

The full path usually ends up looking somewhat like

985

F<~/.mozilla/default/I<some-weird-string>/cookies.txt>.

986

987

988

=item @asis<Internet Explorer.>

989

990

You can produce a cookie file Wget can use by using the File menu,

991

Import and Export, Export Cookies. This has been tested with Internet

992

Explorer 5; it is not guaranteed to work with earlier versions.

993

994

995

=item @asis<Other browsers.>

996

997

If you are using a different browser to create your cookies,

998

B<--load-cookies> will only work if you can locate or produce a

999

cookie file in the Netscape format that Wget expects.

1000

1001

=back

1002

1003

1004

If you cannot use B<--load-cookies>, there might still be an

1005

alternative. If your browser supports a "cookie manager", you can use

1006

it to view the cookies used when accessing the site you're mirroring.

1007

Write down the name and value of the cookie, and manually instruct Wget

1008

to send those cookies, bypassing the "official" cookie support:

1009

1010

1011

wget --no-cookies --header "Cookie: <name>=<value>"

1012

1013

1014

1015

=item B<--save-cookies> I<file>

1016

1017

Save cookies to I<file> before exiting. This will not save cookies

1018

that have expired or that have no expiry time (so-called "session

1019

cookies"), but also see B<--keep-session-cookies>.

1020

1021

1022

=item B<--keep-session-cookies>

1023

1024

When specified, causes B<--save-cookies> to also save session

1025

cookies. Session cookies are normally not saved because they are

1026

meant to be kept in memory and forgotten when you exit the browser.

1027

Saving them is useful on sites that require you to log in or to visit

1028

the home page before you can access some pages. With this option,

1029

multiple Wget runs are considered a single browser session as far as

1030

the site is concerned.

1031

1032

Since the cookie file format does not normally carry session cookies,

1033

Wget marks them with an expiry timestamp of 0. Wget's

1034

B<--load-cookies> recognizes those as session cookies, but it might

1035

confuse other browsers. Also note that cookies so loaded will be

1036

treated as other session cookies, which means that if you want

1037

B<--save-cookies> to preserve them again, you must use

1038

B<--keep-session-cookies> again.

1039

1040

1041

=item B<--ignore-length>

1042

1043

Unfortunately, some HTTP servers (CGI programs, to be more

1044

precise) send out bogus C<Content-Length> headers, which makes Wget

1045

go wild, as it thinks not all the document was retrieved. You can spot

1046

this syndrome if Wget retries getting the same document again and again,

1047

each time claiming that the (otherwise normal) connection has closed on

1048

the very same byte.

1049

1050

With this option, Wget will ignore the C<Content-Length> header---as

1051

if it never existed.

1052

1053

1054

=item B<--header=>I<header-line>

1055

1056

Send I<header-line> along with the rest of the headers in each

1057

HTTP request. The supplied header is sent as-is, which means it

1058

must contain name and value separated by colon, and must not contain

1059

newlines.

1060

1061

You may define more than one additional header by specifying

1062

B<--header> more than once.

1063

1064

1065

wget --header='Accept-Charset: iso-8859-2' \

1066

--header='Accept-Language: hr' \

1067

http://fly.srk.fer.hr/

1068

1069

1070

Specification of an empty string as the header value will clear all

1071

previous user-defined headers.

1072

1073

As of Wget 1.10, this option can be used to override headers otherwise

1074

generated automatically. This example instructs Wget to connect to

1075

localhost, but to specify B<foo.bar> in the C<Host> header:

1076

1077

1078

wget --header="Host: foo.bar" http://localhost/

1079

1080

1081

In versions of Wget prior to 1.10 such use of B<--header> caused

1082

sending of duplicate headers.

1083

1084

1085

=item B<--max-redirect=>I<number>

1086

1087

Specifies the maximum number of redirections to follow for a resource.

1088

The default is 20, which is usually far more than necessary. However, on

1089

those occasions where you want to allow more (or fewer), this is the

1090

option to use.

1091

1092

1093

=item B<--proxy-user=>I<user>

1094

1095

1096

=item B<--proxy-password=>I<password>

1097

1098

Specify the username I<user> and password I<password> for

1099

authentication on a proxy server. Wget will encode them using the

1100

C<basic> authentication scheme.

1101

1102

Security considerations similar to those with B<--http-password>

1103

pertain here as well.

1104

1105

1106

=item B<--referer=>I<url>

1107

1108

Include `Referer: I<url>' header in HTTP request. Useful for

1109

retrieving documents with server-side processing that assume they are

1110

always being retrieved by interactive web browsers and only come out

1111

properly when Referer is set to one of the pages that point to them.

1112

1113

1114

=item B<--save-headers>

1115

1116

Save the headers sent by the HTTP server to the file, preceding the

1117

actual contents, with an empty line as the separator.

1118

1119

1120

=item B<-U> I<agent-string>

1121

1122

1123

=item B<--user-agent=>I<agent-string>

1124

1125

Identify as I<agent-string> to the HTTP server.

1126

1127

The HTTP protocol allows the clients to identify themselves using a

1128

C<User-Agent> header field. This enables distinguishing the

1129

WWW software, usually for statistical purposes or for tracing of

1130

protocol violations. Wget normally identifies as

1131

B<Wget/>I<version>, I<version> being the current version

1132

number of Wget.

1133

1134

However, some sites have been known to impose the policy of tailoring

1135

the output according to the C<User-Agent>-supplied information.

1136

While this is not such a bad idea in theory, it has been abused by

1137

servers denying information to clients other than (historically)

1138

Netscape or, more frequently, Microsoft Internet Explorer. This

1139

option allows you to change the C<User-Agent> line issued by Wget.

1140

Use of this option is discouraged, unless you really know what you are

1141

doing.

1142

1143

Specifying empty user agent with B<--user-agent=""> instructs Wget

1144

not to send the C<User-Agent> header in HTTP requests.

1145

1146

1147

=item B<--post-data=>I<string>

1148

1149

1150

=item B<--post-file=>I<file>

1151

1152

Use POST as the method for all HTTP requests and send the specified data

1153

in the request body. C<--post-data> sends I<string> as data,

1154

whereas C<--post-file> sends the contents of I<file>. Other than

1155

that, they work in exactly the same way.

1156

1157

Please be aware that Wget needs to know the size of the POST data in

1158

advance. Therefore the argument to C<--post-file> must be a regular

1159

file; specifying a FIFO or something like F</dev/stdin> won't work.

1160

It's not quite clear how to work around this limitation inherent in

1161

HTTP/1.0. Although HTTP/1.1 introduces I<chunked> transfer that

1162

doesn't require knowing the request length in advance, a client can't

1163

use chunked unless it knows it's talking to an HTTP/1.1 server. And it

1164

can't know that until it receives a response, which in turn requires the

1165

request to have been completed -- a chicken-and-egg problem.

1166

1167

Note: if Wget is redirected after the POST request is completed, it

1168

will not send the POST data to the redirected URL. This is because

1169

URLs that process POST often respond with a redirection to a regular

1170

page, which does not desire or accept POST. It is not completely

1171

clear that this behavior is optimal; if it doesn't work out, it might

1172

be changed in the future.

1173

1174

This example shows how to log to a server using POST and then proceed to

1175

download the desired pages, presumably only accessible to authorized

1176

users:

1177

1178

1179

# Log in to the server. This can be done only once.

1180

wget --save-cookies cookies.txt \

1181

--post-data 'user=foo&password=bar' \

1182

http://server.com/auth.php

1183

1184

# Now grab the page or pages we care about.

1185

wget --load-cookies cookies.txt \

1186

-p http://server.com/interesting/article.php

1187

1188

1189

If the server is using session cookies to track user authentication,

1190

the above will not work because B<--save-cookies> will not save

1191

them (and neither will browsers) and the F<cookies.txt> file will

1192

be empty. In that case use B<--keep-session-cookies> along with

1193

B<--save-cookies> to force saving of session cookies.

1194

1195

1196

=item B<--content-disposition>

1197

1198

1199

If this is set to on, experimental (not fully-functional) support for

1200

C<Content-Disposition> headers is enabled. This can currently result in

1201

extra round-trips to the server for a C<HEAD> request, and is known

1202

to suffer from a few bugs, which is why it is not currently enabled by default.

1203

1204

This option is useful for some file-downloading CGI programs that use

1205

C<Content-Disposition> headers to describe what the name of a

1206

downloaded file should be.

1207

1208

1209

=item B<--auth-no-challenge>

1210

1211

1212

If this option is given, Wget will send Basic HTTP authentication

1213

information (plaintext username and password) for all requests, just

1214

like Wget 1.10.2 and prior did by default.

1215

1216

Use of this option is not recommended, and is intended only to support

1217

some few obscure servers, which never send HTTP authentication

1218

challenges, but accept unsolicited auth info, say, in addition to

1219

form-based authentication.

1220

1221

1222

=back

1223

1224

1225

1226

=head2 HTTPS (SSL/TLS) Options

1227

1228

1229

To support encrypted HTTP (HTTPS) downloads, Wget must be compiled

1230

with an external SSL library, currently OpenSSL. If Wget is compiled

1231

without SSL support, none of these options are available.

1232

1233

1234

=over 4

1235

1236

1237

=item B<--secure-protocol=>I<protocol>

1238

1239

Choose the secure protocol to be used. Legal values are B<auto>,

1240

B<SSLv2>, B<SSLv3>, and B<TLSv1>. If B<auto> is used,

1241

the SSL library is given the liberty of choosing the appropriate

1242

protocol automatically, which is achieved by sending an SSLv2 greeting

1243

and announcing support for SSLv3 and TLSv1. This is the default.

1244

1245

Specifying B<SSLv2>, B<SSLv3>, or B<TLSv1> forces the use

1246

of the corresponding protocol. This is useful when talking to old and

1247

buggy SSL server implementations that make it hard for OpenSSL to

1248

choose the correct protocol version. Fortunately, such servers are

1249

quite rare.

1250

1251

1252

=item B<--no-check-certificate>

1253

1254

Don't check the server certificate against the available certificate

1255

authorities. Also don't require the URL host name to match the common

1256

name presented by the certificate.

1257

1258

As of Wget 1.10, the default is to verify the server's certificate

1259

against the recognized certificate authorities, breaking the SSL

1260

handshake and aborting the download if the verification fails.

1261

Although this provides more secure downloads, it does break

1262

interoperability with some sites that worked with previous Wget

1263

versions, particularly those using self-signed, expired, or otherwise

1264

invalid certificates. This option forces an "insecure" mode of

1265

operation that turns the certificate verification errors into warnings

1266

and allows you to proceed.

1267

1268

If you encounter "certificate verification" errors or ones saying

1269

that "common name doesn't match requested host name", you can use

1270

this option to bypass the verification and proceed with the download.

1271

I<Only use this option if you are otherwise convinced of the

1272

site's authenticity, or if you really don't care about the validity of

1273

its certificate.> It is almost always a bad idea not to check the

1274

certificates when transmitting confidential or important data.

1275

1276

1277

=item B<--certificate=>I<file>

1278

1279

Use the client certificate stored in I<file>. This is needed for

1280

servers that are configured to require certificates from the clients

1281

that connect to them. Normally a certificate is not required and this

1282

switch is optional.

1283

1284

1285

=item B<--certificate-type=>I<type>

1286

1287

Specify the type of the client certificate. Legal values are

1288

B<PEM> (assumed by default) and B<DER>, also known as

1289

B<ASN1>.

1290

1291

1292

=item B<--private-key=>I<file>

1293

1294

Read the private key from I<file>. This allows you to provide the

1295

private key in a file separate from the certificate.

1296

1297

1298

=item B<--private-key-type=>I<type>

1299

1300

Specify the type of the private key. Accepted values are B<PEM>

1301

(the default) and B<DER>.

1302

1303

1304

=item B<--ca-certificate=>I<file>

1305

1306

Use I<file> as the file with the bundle of certificate authorities

1307

("CA") to verify the peers. The certificates must be in PEM format.

1308

1309

Without this option Wget looks for CA certificates at the

1310

system-specified locations, chosen at OpenSSL installation time.

1311

1312

1313

=item B<--ca-directory=>I<directory>

1314

1315

Specifies directory containing CA certificates in PEM format. Each

1316

file contains one CA certificate, and the file name is based on a hash

1317

value derived from the certificate. This is achieved by processing a

1318

certificate directory with the C<c_rehash> utility supplied with

1319

OpenSSL. Using B<--ca-directory> is more efficient than

1320

B<--ca-certificate> when many certificates are installed because

1321

it allows Wget to fetch certificates on demand.

1322

1323

Without this option Wget looks for CA certificates at the

1324

system-specified locations, chosen at OpenSSL installation time.

1325

1326

1327

=item B<--random-file=>I<file>

1328

1329

Use I<file> as the source of random data for seeding the

1330

pseudo-random number generator on systems without F</dev/random>.

1331

1332

On such systems the SSL library needs an external source of randomness

1333

to initialize. Randomness may be provided by EGD (see

1334

B<--egd-file> below) or read from an external source specified by

1335

the user. If this option is not specified, Wget looks for random data

1336

in C<$RANDFILE> or, if that is unset, in F<$HOME/.rnd>. If

1337

none of those are available, it is likely that SSL encryption will not

1338

be usable.

1339

1340

If you're getting the "Could not seed OpenSSL PRNG; disabling SSL."

1341

error, you should provide random data using some of the methods

1342

described above.

1343

1344

1345

=item B<--egd-file=>I<file>

1346

1347

Use I<file> as the EGD socket. EGD stands for I<Entropy

1348

Gathering Daemon>, a user-space program that collects data from

1349

various unpredictable system sources and makes it available to other

1350

programs that might need it. Encryption software, such as the SSL

1351

library, needs sources of non-repeating randomness to seed the random

1352

number generator used to produce cryptographically strong keys.

1353

1354

OpenSSL allows the user to specify his own source of entropy using the

1355

C<RAND_FILE> environment variable. If this variable is unset, or

1356

if the specified file does not produce enough randomness, OpenSSL will

1357

read random data from EGD socket specified using this option.

1358

1359

If this option is not specified (and the equivalent startup command is

1360

not used), EGD is never contacted. EGD is not needed on modern Unix

1361

systems that support F</dev/random>.

1362

1363

=back

1364

1365

1366

1367

=head2 FTP Options

1368

1369

1370

1371

=over 4

1372

1373

1374

=item B<--ftp-user=>I<user>

1375

1376

1377

=item B<--ftp-password=>I<password>

1378

1379

Specify the username I<user> and password I<password> on an

1380

FTP server. Without this, or the corresponding startup option,

1381

the password defaults to B<-wget@>, normally used for anonymous

1382

FTP.

1383

1384

Another way to specify username and password is in the URL itself. Either method reveals your password to anyone who

1385

bothers to run C<ps>. To prevent the passwords from being seen,

1386

store them in F<.wgetrc> or F<.netrc>, and make sure to protect

1387

those files from other users with C<chmod>. If the passwords are

1388

really important, do not leave them lying in those files either---edit

1389

the files and delete them after Wget has started the download.

1390

1391

1392

1393

=item B<--no-remove-listing>

1394

1395

Don't remove the temporary F<.listing> files generated by FTP

1396

retrievals. Normally, these files contain the raw directory listings

1397

received from FTP servers. Not removing them can be useful for

1398

debugging purposes, or when you want to be able to easily check on the

1399

contents of remote server directories (e.g. to verify that a mirror

1400

you're running is complete).

1401

1402

Note that even though Wget writes to a known filename for this file,

1403

this is not a security hole in the scenario of a user making

1404

F<.listing> a symbolic link to F</etc/passwd> or something and

1405

asking C<root> to run Wget in his or her directory. Depending on

1406

the options used, either Wget will refuse to write to F<.listing>,

1407

making the globbing/recursion/time-stamping operation fail, or the

1408

symbolic link will be deleted and replaced with the actual

1409

F<.listing> file, or the listing will be written to a

1410

F<.listing.I<number>> file.

1411

1412

Even though this situation isn't a problem, though, C<root> should

1413

never run Wget in a non-trusted user's directory. A user could do

1414

something as simple as linking F<index.html> to F</etc/passwd>

1415

and asking C<root> to run Wget with B<-N> or B<-r> so the file

1416

will be overwritten.

1417

1418

1419

=item B<--no-glob>

1420

1421

Turn off FTP globbing. Globbing refers to the use of shell-like

1422

special characters (I<wildcards>), like B<*>, B<?>, B<[>

1423

and B<]> to retrieve more than one file from the same directory at

1424

once, like:

1425

1426

1427

wget ftp://gnjilux.srk.fer.hr/*.msg

1428

1429

1430

By default, globbing will be turned on if the URL contains a

1431

globbing character. This option may be used to turn globbing on or off

1432

permanently.

1433

1434

You may have to quote the URL to protect it from being expanded by

1435

your shell. Globbing makes Wget look for a directory listing, which is

1436

system-specific. This is why it currently works only with Unix FTP

1437

servers (and the ones emulating Unix C<ls> output).

1438

1439

1440

=item B<--no-passive-ftp>

1441

1442

Disable the use of the I<passive> FTP transfer mode. Passive FTP

1443

mandates that the client connect to the server to establish the data

1444

connection rather than the other way around.

1445

1446

If the machine is connected to the Internet directly, both passive and

1447

active FTP should work equally well. Behind most firewall and NAT

1448

configurations passive FTP has a better chance of working. However,

1449

in some rare firewall configurations, active FTP actually works when

1450

passive FTP doesn't. If you suspect this to be the case, use this

1451

option, or set C<passive_ftp=off> in your init file.

1452

1453

1454

=item B<--retr-symlinks>

1455

1456

Usually, when retrieving FTP directories recursively and a symbolic

1457

link is encountered, the linked-to file is not downloaded. Instead, a

1458

matching symbolic link is created on the local filesystem. The

1459

pointed-to file will not be downloaded unless this recursive retrieval

1460

would have encountered it separately and downloaded it anyway.

1461

1462

When B<--retr-symlinks> is specified, however, symbolic links are

1463

traversed and the pointed-to files are retrieved. At this time, this

1464

option does not cause Wget to traverse symlinks to directories and

1465

recurse through them, but in the future it should be enhanced to do

1466

this.

1467

1468

Note that when retrieving a file (not a directory) because it was

1469

specified on the command-line, rather than because it was recursed to,

1470

this option has no effect. Symbolic links are always traversed in this

1471

case.

1472

1473

1474

=item B<--no-http-keep-alive>

1475

1476

Turn off the "keep-alive" feature for HTTP downloads. Normally, Wget

1477

asks the server to keep the connection open so that, when you download

1478

more than one document from the same server, they get transferred over

1479

the same TCP connection. This saves time and at the same time reduces

1480

the load on the server.

1481

1482

This option is useful when, for some reason, persistent (keep-alive)

1483

connections don't work for you, for example due to a server bug or due

1484

to the inability of server-side scripts to cope with the connections.

1485

1486

=back

1487

1488

1489

1490

=head2 Recursive Retrieval Options

1491

1492

1493

1494

=over 4

1495

1496

1497

=item B<-r>

1498

1499

1500

=item B<--recursive>

1501

1502

Turn on recursive retrieving.

1503

1504

1505

=item B<-l> I<depth>

1506

1507

1508

=item B<--level=>I<depth>

1509

1510

Specify recursion maximum depth level I<depth>. The default maximum depth is 5.

1511

1512

1513

=item B<--delete-after>

1514

1515

This option tells Wget to delete every single file it downloads,

1516

I<after> having done so. It is useful for pre-fetching popular

1517

pages through a proxy, e.g.:

1518

1519

1520

wget -r -nd --delete-after http://whatever.com/~popular/page/

1521

1522

1523

The B<-r> option is to retrieve recursively, and B<-nd> to not

1524

create directories.

1525

1526

Note that B<--delete-after> deletes files on the local machine. It

1527

does not issue the B<DELE> command to remote FTP sites, for

1528

instance. Also note that when B<--delete-after> is specified,

1529

B<--convert-links> is ignored, so B<.orig> files are simply not

1530

created in the first place.

1531

1532

1533

=item B<-k>

1534

1535

1536

=item B<--convert-links>

1537

1538

After the download is complete, convert the links in the document to

1539

make them suitable for local viewing. This affects not only the visible

1540

hyperlinks, but any part of the document that links to external content,

1541

such as embedded images, links to style sheets, hyperlinks to non-HTML

1542

content, etc.

1543

1544

Each link will be changed in one of the two ways:

1545

1546

1547

=over 4

1548

1549

1550

=item *

1551

1552

The links to files that have been downloaded by Wget will be changed to

1553

refer to the file they point to as a relative link.

1554

1555

Example: if the downloaded file F</foo/doc.html> links to

1556

F</bar/img.gif>, also downloaded, then the link in F<doc.html>

1557

will be modified to point to B<../bar/img.gif>. This kind of

1558

transformation works reliably for arbitrary combinations of directories.

1559

1560

1561

=item *

1562

1563

The links to files that have not been downloaded by Wget will be changed

1564

to include host name and absolute path of the location they point to.

1565

1566

Example: if the downloaded file F</foo/doc.html> links to

1567

F</bar/img.gif> (or to F<../bar/img.gif>), then the link in

1568

F<doc.html> will be modified to point to

1569

F<http://I<hostname>/bar/img.gif>.

1570

1571

=back

1572

1573

1574

Because of this, local browsing works reliably: if a linked file was

1575

downloaded, the link will refer to its local name; if it was not

1576

downloaded, the link will refer to its full Internet address rather than

1577

presenting a broken link. The fact that the former links are converted

1578

to relative links ensures that you can move the downloaded hierarchy to

1579

another directory.

1580

1581

Note that only at the end of the download can Wget know which links have

1582

been downloaded. Because of that, the work done by B<-k> will be

1583

performed at the end of all the downloads.

1584

1585

1586

=item B<-K>

1587

1588

1589

=item B<--backup-converted>

1590

1591

When converting a file, back up the original version with a B<.orig>

1592

suffix. Affects the behavior of B<-N>.

1593

1594

1595

=item B<-m>

1596

1597

1598

=item B<--mirror>

1599

1600

Turn on options suitable for mirroring. This option turns on recursion

1601

and time-stamping, sets infinite recursion depth and keeps FTP

1602

directory listings. It is currently equivalent to

1603

B<-r -N -l inf --no-remove-listing>.

1604

1605

1606

=item B<-p>

1607

1608

1609

=item B<--page-requisites>

1610

1611

This option causes Wget to download all the files that are necessary to

1612

properly display a given HTML page. This includes such things as

1613

inlined images, sounds, and referenced stylesheets.

1614

1615

Ordinarily, when downloading a single HTML page, any requisite documents

1616

that may be needed to display it properly are not downloaded. Using

1617

B<-r> together with B<-l> can help, but since Wget does not

1618

ordinarily distinguish between external and inlined documents, one is

1619

generally left with "leaf documents" that are missing their

1620

requisites.

1621

1622

For instance, say document F<1.html> contains an C<E<lt>IMGE<gt>> tag

1623

referencing F<1.gif> and an C<E<lt>AE<gt>> tag pointing to external

1624

document F<2.html>. Say that F<2.html> is similar but that its

1625

image is F<2.gif> and it links to F<3.html>. Say this

1626

continues up to some arbitrarily high number.

1627

1628

If one executes the command:

1629

1630

1631

wget -r -l 2 http://<site>/1.html

1632

1633

1634

then F<1.html>, F<1.gif>, F<2.html>, F<2.gif>, and

1635

F<3.html> will be downloaded. As you can see, F<3.html> is

1636

without its requisite F<3.gif> because Wget is simply counting the

1637

number of hops (up to 2) away from F<1.html> in order to determine

1638

where to stop the recursion. However, with this command:

1639

1640

1641

wget -r -l 2 -p http://<site>/1.html

1642

1643

1644

all the above files I<and> F<3.html>'s requisite F<3.gif>

1645

will be downloaded. Similarly,

1646

1647

1648

wget -r -l 1 -p http://<site>/1.html

1649

1650

1651

will cause F<1.html>, F<1.gif>, F<2.html>, and F<2.gif>

1652

to be downloaded. One might think that:

1653

1654

1655

wget -r -l 0 -p http://<site>/1.html

1656

1657

1658

would download just F<1.html> and F<1.gif>, but unfortunately

1659

this is not the case, because B<-l 0> is equivalent to

1660

B<-l inf>---that is, infinite recursion. To download a single HTML

1661

page (or a handful of them, all specified on the command-line or in a

1662

B<-i> URL input file) and its (or their) requisites, simply leave off

1663

B<-r> and B<-l>:

1664

1665

1666

wget -p http://<site>/1.html

1667

1668

1669

Note that Wget will behave as if B<-r> had been specified, but only

1670

that single page and its requisites will be downloaded. Links from that

1671

page to external documents will not be followed. Actually, to download

1672

a single page and all its requisites (even if they exist on separate

1673

websites), and make sure the lot displays properly locally, this author

1674

likes to use a few options in addition to B<-p>:

1675

1676

1677

wget -E -H -k -K -p http://<site>/<document>

1678

1679

1680

To finish off this topic, it's worth knowing that Wget's idea of an

1681

external document link is any URL specified in an C<E<lt>AE<gt>> tag, an

1682

C<E<lt>AREAE<gt>> tag, or a C<E<lt>LINKE<gt>> tag other than C<E<lt>LINK

1683

REL="stylesheet"E<gt>>.

1684

1685

1686

=item B<--strict-comments>

1687

1688

Turn on strict parsing of HTML comments. The default is to terminate

1689

comments at the first occurrence of B<--E<gt>>.

1690

1691

According to specifications, HTML comments are expressed as SGML

1692

I<declarations>. Declaration is special markup that begins with

1693

B<E<lt>!> and ends with B<E<gt>>, such as B<E<lt>!DOCTYPE ...E<gt>>, that

1694

may contain comments between a pair of B<--> delimiters. HTML

1695

comments are "empty declarations", SGML declarations without any

1696

non-comment text. Therefore, B<E<lt>!--foo--E<gt>> is a valid comment, and

1697

so is B<E<lt>!--one-- --two--E<gt>>, but B<E<lt>!--1--2--E<gt>> is not.

1698

1699

On the other hand, most HTML writers don't perceive comments as anything

1700

other than text delimited with B<E<lt>!--> and B<--E<gt>>, which is not

1701

quite the same. For example, something like B<E<lt>!------------E<gt>>

1702

works as a valid comment as long as the number of dashes is a multiple

1703

of four (!). If not, the comment technically lasts until the next

1704

B<-->, which may be at the other end of the document. Because of

1705

this, many popular browsers completely ignore the specification and

1706

implement what users have come to expect: comments delimited with

1707

B<E<lt>!--> and B<--E<gt>>.

1708

1709

Until version 1.9, Wget interpreted comments strictly, which resulted in

1710

missing links in many web pages that displayed fine in browsers, but had

1711

the misfortune of containing non-compliant comments. Beginning with

1712

version 1.9, Wget has joined the ranks of clients that implements

1713

"naive" comments, terminating each comment at the first occurrence of

1714

B<--E<gt>>.

1715

1716

If, for whatever reason, you want strict comment parsing, use this

1717

option to turn it on.

1718

1719

=back

1720

1721

1722

1723

=head2 Recursive Accept/Reject Options

1724

1725

1726

1727

=over 4

1728

1729

1730

=item B<-A> I<acclist> B<--accept> I<acclist>

1731

1732

1733

=item B<-R> I<rejlist> B<--reject> I<rejlist>

1734

1735

Specify comma-separated lists of file name suffixes or patterns to

1736

accept or reject. Note that if

1737

any of the wildcard characters, B<*>, B<?>, B<[> or

1738

B<]>, appear in an element of I<acclist> or I<rejlist>,

1739

it will be treated as a pattern, rather than a suffix.

1740

1741

1742

=item B<-D> I<domain-list>

1743

1744

1745

=item B<--domains=>I<domain-list>

1746

1747

Set domains to be followed. I<domain-list> is a comma-separated list

1748

of domains. Note that it does I<not> turn on B<-H>.

1749

1750

1751

=item B<--exclude-domains> I<domain-list>

1752

1753

Specify the domains that are I<not> to be followed..

1754

1755

1756

=item B<--follow-ftp>

1757

1758

Follow FTP links from HTML documents. Without this option,

1759

Wget will ignore all the FTP links.

1760

1761

1762

=item B<--follow-tags=>I<list>

1763

1764

Wget has an internal table of HTML tag / attribute pairs that it

1765

considers when looking for linked documents during a recursive

1766

retrieval. If a user wants only a subset of those tags to be

1767

considered, however, he or she should be specify such tags in a

1768

comma-separated I<list> with this option.

1769

1770

1771

=item B<--ignore-tags=>I<list>

1772

1773

This is the opposite of the B<--follow-tags> option. To skip

1774

certain HTML tags when recursively looking for documents to download,

1775

specify them in a comma-separated I<list>.

1776

1777

In the past, this option was the best bet for downloading a single page

1778

and its requisites, using a command-line like:

1779

1780

1781

wget --ignore-tags=a,area -H -k -K -r http://<site>/<document>

1782

1783

1784

However, the author of this option came across a page with tags like

1785

C<E<lt>LINK REL="home" HREF="/"E<gt>> and came to the realization that

1786

specifying tags to ignore was not enough. One can't just tell Wget to

1787

ignore C<E<lt>LINKE<gt>>, because then stylesheets will not be downloaded.

1788

Now the best bet for downloading a single page and its requisites is the

1789

dedicated B<--page-requisites> option.

1790

1791

1792

=item B<--ignore-case>

1793

1794

Ignore case when matching files and directories. This influences the

1795

behavior of -R, -A, -I, and -X options, as well as globbing

1796

implemented when downloading from FTP sites. For example, with this

1797

option, B<-A *.txt> will match B<file1.txt>, but also

1798

B<file2.TXT>, B<file3.TxT>, and so on.

1799

1800

1801

=item B<-H>

1802

1803

1804

=item B<--span-hosts>

1805

1806

Enable spanning across hosts when doing recursive retrieving.

1807

1808

1809

=item B<-L>

1810

1811

1812

=item B<--relative>

1813

1814

Follow relative links only. Useful for retrieving a specific home page

1815

without any distractions, not even those from the same hosts.

1816

1817

1818

=item B<-I> I<list>

1819

1820

1821

=item B<--include-directories=>I<list>

1822

1823

Specify a comma-separated list of directories you wish to follow when

1824

downloading. Elements

1825

of I<list> may contain wildcards.

1826

1827

1828

=item B<-X> I<list>

1829

1830

1831

=item B<--exclude-directories=>I<list>

1832

1833

Specify a comma-separated list of directories you wish to exclude from

1834

download. Elements of

1835

I<list> may contain wildcards.

1836

1837

1838

=item B<-np>

1839

1840

1841

=item B<--no-parent>

1842

1843

Do not ever ascend to the parent directory when retrieving recursively.

1844

This is a useful option, since it guarantees that only the files

1845

I<below> a certain hierarchy will be downloaded.

1846

1847

1848

=back

1849

1850

1851

1852

=head1 FILES

1853

1854

1855

=over 4

1856

1857

1858

=item B</usr/local/etc/wgetrc>

1859

1860

Default location of the I<global> startup file.

1861

1862

1863

=item B<.wgetrc>

1864

1865

User startup file.

1866

1867

=back

1868

1869

1870

=head1 BUGS

1871

1872

You are welcome to submit bug reports via the GNU Wget bug tracker (see

1873

E<lt>B<http://wget.addictivecode.org/BugTracker>E<gt>).

1874

1875

Before actually submitting a bug report, please try to follow a few

1876

simple guidelines.

1877

1878

1879

=over 4

1880

1881

1882

=item 1.

1883

1884

Please try to ascertain that the behavior you see really is a bug. If

1885

Wget crashes, it's a bug. If Wget does not behave as documented,

1886

it's a bug. If things work strange, but you are not sure about the way

1887

they are supposed to work, it might well be a bug, but you might want to

1888

double-check the documentation and the mailing lists.

1889

1890

1891

=item 2.

1892

1893

Try to repeat the bug in as simple circumstances as possible. E.g. if

1894

Wget crashes while downloading B<wget -rl0 -kKE -t5 --no-proxy

1895

http://yoyodyne.com -o /tmp/log>, you should try to see if the crash is

1896

repeatable, and if will occur with a simpler set of options. You might

1897

even try to start the download at the page where the crash occurred to

1898

see if that page somehow triggered the crash.

1899

1900

Also, while I will probably be interested to know the contents of your

1901

F<.wgetrc> file, just dumping it into the debug message is probably

1902

a bad idea. Instead, you should first try to see if the bug repeats

1903

with F<.wgetrc> moved out of the way. Only if it turns out that

1904

F<.wgetrc> settings affect the bug, mail me the relevant parts of

1905

the file.

1906

1907

1908

=item 3.

1909

1910

Please start Wget with B<-d> option and send us the resulting

1911

output (or relevant parts thereof). If Wget was compiled without

1912

debug support, recompile it---it is I<much> easier to trace bugs

1913

with debug support on.

1914

1915

Note: please make sure to remove any potentially sensitive information

1916

from the debug log before sending it to the bug address. The

1917

C<-d> won't go out of its way to collect sensitive information,

1918

but the log I<will> contain a fairly complete transcript of Wget's

1919

communication with the server, which may include passwords and pieces

1920

of downloaded data. Since the bug address is publically archived, you

1921

may assume that all bug reports are visible to the public.

1922

1923

1924

=item 4.

1925

1926

If Wget has crashed, try to run it in a debugger, e.g. C<gdb `which

1927

wget` core> and type C<where> to get the backtrace. This may not

1928

work if the system administrator has disabled core files, but it is

1929

safe to try.

1930

1931

=back

1932

1933

1934

=head1 SEE ALSO

1935

1936

This is B<not> the complete manual for GNU Wget.

1937

For more complete information, including more detailed explanations of

1938

some of the options, and a number of commands available

1939

for use with F<.wgetrc> files and the B<-e> option, see the GNU

1940

Info entry for F<wget>.

1941

1942

=head1 AUTHOR

1943

1944

Originally written by Hrvoje Niksic E<lt>hniksic@xemacs.orgE<gt>.

1945

Currently maintained by Micah Cowan E<lt>micah@cowan.nameE<gt>.

1946

1947

=head1 COPYRIGHT

1948

1949

1950

2004, 2005, 2006, 2007, 2008 Free Software Foundation, Inc.

1951

1952

1953

Permission is granted to copy, distribute and/or modify this document

1954

under the terms of the GNU Free Documentation License, Version 1.2 or

1955

any later version published by the Free Software Foundation; with no

1956

Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A

1957

copy of the license is included in the section entitled "GNU Free

1958

Documentation License".

1959

Older »