~ubuntu-branches/ubuntu/hardy/wget/hardy-security

« back to all changes in this revision

Viewing changes to doc/wget.info-1

Committer: Bazaar Package Importer
Author(s): Noèl Köthe
Date: 2004-02-13 20:26:44 UTC
Revision ID: james.westby@ubuntu.com-20040213202644-skxj93qs15sskqfy

Tags: upstream-1.9.1

Import upstream version 1.9.1

files added:

AUTHORS

COPYING

ChangeLog

ChangeLog-branches

ChangeLog-branches/1.6_branch.ChangeLog

ChangeLog.README

INSTALL

MACHINES

MAILING-LIST

Makefile.cvs

Makefile.in

NEWS

PATCHES

README

README.cvs

TODO

aclocal.m4

config.guess

config.sub

configure

configure.bat

configure.bat.in

configure.in

doc/ChangeLog

doc/ChangeLog-branches

doc/ChangeLog-branches/1.6_branch.ChangeLog

doc/Makefile.in

doc/ansi2knr.1

doc/sample.wgetrc

doc/sample.wgetrc.munged_for_texi_inclusion

doc/texi2pod.pl.in

doc/texinfo.tex

doc/version.texi

doc/wget.info

doc/wget.info-1

doc/wget.info-2

doc/wget.info-3

doc/wget.info-4

doc/wget.texi

install-sh

libtool.m4

ltmain.sh

mkinstalldirs

po/Makefile.in.in

po/POTFILES.in

po/bg.gmo

po/bg.po

po/ca.gmo

po/ca.po

po/cs.gmo

po/cs.po

po/da.gmo

po/da.po

po/de.gmo

po/de.po

po/el.gmo

po/el.po

po/es.gmo

po/es.po

po/et.gmo

po/et.po

po/fr.gmo

po/fr.po

po/gl.gmo

po/gl.po

po/he.gmo

po/he.po

po/hr.gmo

po/hr.po

po/hu.gmo

po/hu.po

po/it.gmo

po/it.po

po/ja.gmo

po/ja.po

po/nl.gmo

po/nl.po

po/no.gmo

po/no.po

po/pl.gmo

po/pl.po

po/pt_BR.gmo

po/pt_BR.po

po/ro.gmo

po/ro.po

po/ru.gmo

po/ru.po

po/sk.gmo

po/sk.po

po/sl.gmo

po/sl.po

po/sv.gmo

po/sv.po

po/tr.gmo

po/tr.po

po/uk.gmo

po/uk.po

po/wget.pot

po/zh_CN.gmo

po/zh_CN.po

po/zh_TW.gmo

po/zh_TW.po

src/ChangeLog

src/ChangeLog-branches

src/ChangeLog-branches/1.6_branch.ChangeLog

src/ChangeLog-branches/1.8_branch.ChangeLog

src/Makefile.in

src/alloca.c

src/ansi2knr.c

src/cmpt.c

src/config.h.in

src/connect.c

src/connect.h

src/convert.c

src/convert.h

src/cookies.c

src/cookies.h

src/ftp-basic.c

src/ftp-ls.c

src/ftp-opie.c

src/ftp.c

src/ftp.h

src/gen-md5.c

src/gen-md5.h

src/gen_sslfunc.c

src/gen_sslfunc.h

src/getopt.c

src/getopt.h

src/gnu-md5.c

src/gnu-md5.h

src/hash.c

src/hash.h

src/headers.c

src/headers.h

src/host.c

src/host.h

src/html-parse.c

src/html-parse.h

src/html-url.c

src/http.c

src/init.c

src/init.h

src/log.c

src/main.c

src/mswindows.c

src/mswindows.h

src/netrc.c

src/netrc.h

src/options.h

src/progress.c

src/progress.h

src/rbuf.c

src/rbuf.h

src/recur.c

src/recur.h

src/res.c

src/res.h

src/retr.c

src/retr.h

src/safe-ctype.c

src/safe-ctype.h

src/snprintf.c

src/sysdep.h

src/url.c

src/url.h

src/utils.c

src/utils.h

src/version.c

src/wget.h

stamp-h.in

util

util/Makefile.in

util/README

util/dist-wget

util/download-netscape.html

util/download.html

util/rmold.pl

util/wget.spec

windows

windows/Makefile.doc

windows/Makefile.in

windows/Makefile.src

windows/Makefile.src.bor

windows/Makefile.top

windows/Makefile.top.bor

windows/Makefile.watcom

windows/README

windows/config.h.bor

windows/config.h.ms

windows/wget.dep

Show diffs side-by-side

added added

removed removed

doc/wget.info-1

This is wget.info, produced by makeinfo version 4.3 from ./wget.texi.

INFO-DIR-SECTION Network Applications

START-INFO-DIR-ENTRY

* Wget: (wget). The non-interactive network downloader.

END-INFO-DIR-ENTRY

This file documents the the GNU Wget utility for downloading network

data.

Foundation, Inc.

Permission is granted to make and distribute verbatim copies of this

manual provided the copyright notice and this permission notice are

preserved on all copies.

Permission is granted to copy, distribute and/or modify this document

under the terms of the GNU Free Documentation License, Version 1.1 or

any later version published by the Free Software Foundation; with the

Invariant Sections being "GNU General Public License" and "GNU Free

Documentation License", with no Front-Cover Texts, and with no

Back-Cover Texts. A copy of the license is included in the section

entitled "GNU Free Documentation License".

File: wget.info, Node: Top, Next: Overview, Prev: (dir), Up: (dir)

Wget 1.9.1

**********

This manual documents version 1.9.1 of GNU Wget, the freely

available utility for network downloads.

Foundation, Inc.

* Menu:

* Overview:: Features of Wget.

* Invoking:: Wget command-line arguments.

* Recursive Retrieval:: Description of recursive retrieval.

* Following Links:: The available methods of chasing links.

* Time-Stamping:: Mirroring according to time-stamps.

* Startup File:: Wget's initialization file.

* Examples:: Examples of usage.

* Various:: The stuff that doesn't fit anywhere else.

* Appendices:: Some useful references.

* Copying:: You may give out copies of Wget and of this manual.

* Concept Index:: Topics covered by this manual.

File: wget.info, Node: Overview, Next: Invoking, Prev: Top, Up: Top

Overview

********

GNU Wget is a free utility for non-interactive download of files from

the Web. It supports HTTP, HTTPS, and FTP protocols, as well as

retrieval through HTTP proxies.

This chapter is a partial overview of Wget's features.

* Wget is non-interactive, meaning that it can work in the

background, while the user is not logged on. This allows you to

start a retrieval and disconnect from the system, letting Wget

finish the work. By contrast, most of the Web browsers require

constant user's presence, which can be a great hindrance when

transferring a lot of data.

* Wget can follow links in HTML and XHTML pages and create local

versions of remote web sites, fully recreating the directory

structure of the original site. This is sometimes referred to as

"recursive downloading." While doing that, Wget respects the

Robot Exclusion Standard (`/robots.txt'). Wget can be instructed

to convert the links in downloaded HTML files to the local files

for offline viewing.

* File name wildcard matching and recursive mirroring of directories

are available when retrieving via FTP. Wget can read the

time-stamp information given by both HTTP and FTP servers, and

store it locally. Thus Wget can see if the remote file has

changed since last retrieval, and automatically retrieve the new

version if it has. This makes Wget suitable for mirroring of FTP

sites, as well as home pages.

* Wget has been designed for robustness over slow or unstable network

connections; if a download fails due to a network problem, it will

keep retrying until the whole file has been retrieved. If the

server supports regetting, it will instruct the server to continue

the download from where it left off.

* Wget supports proxy servers, which can lighten the network load,

speed up retrieval and provide access behind firewalls. However,

if you are behind a firewall that requires that you use a socks

100

style gateway, you can get the socks library and build Wget with

101

support for socks. Wget also supports the passive FTP downloading

102

as an option.

103

104

105

* Built-in features offer mechanisms to tune which links you wish to

106

follow (*note Following Links::).

107

108

109

* The retrieval is conveniently traced with printing dots, each dot

110

representing a fixed amount of data received (1KB by default).

111

These representations can be customized to your preferences.

112

113

114

* Most of the features are fully configurable, either through

115

command line options, or via the initialization file `.wgetrc'

116

(*note Startup File::). Wget allows you to define "global"

117

startup files (`/usr/local/etc/wgetrc' by default) for site

118

settings.

119

120

121

* Finally, GNU Wget is free software. This means that everyone may

122

use it, redistribute it and/or modify it under the terms of the

123

GNU General Public License, as published by the Free Software

124

Foundation (*note Copying::).

125

126

127

File: wget.info, Node: Invoking, Next: Recursive Retrieval, Prev: Overview, Up: Top

128

129

Invoking

130

********

131

132

By default, Wget is very simple to invoke. The basic syntax is:

133

134

wget [OPTION]... [URL]...

135

136

Wget will simply download all the URLs specified on the command

137

line. URL is a "Uniform Resource Locator", as defined below.

138

139

However, you may wish to change some of the default parameters of

140

Wget. You can do it two ways: permanently, adding the appropriate

141

command to `.wgetrc' (*note Startup File::), or specifying it on the

142

command line.

143

144

* Menu:

145

146

* URL Format::

147

* Option Syntax::

148

* Basic Startup Options::

149

* Logging and Input File Options::

150

* Download Options::

151

* Directory Options::

152

* HTTP Options::

153

* FTP Options::

154

* Recursive Retrieval Options::

155

* Recursive Accept/Reject Options::

156

157

158

File: wget.info, Node: URL Format, Next: Option Syntax, Prev: Invoking, Up: Invoking

159

160

URL Format

161

==========

162

163

"URL" is an acronym for Uniform Resource Locator. A uniform

164

resource locator is a compact string representation for a resource

165

available via the Internet. Wget recognizes the URL syntax as per

166

RFC1738. This is the most widely used form (square brackets denote

167

optional parts):

168

169

http://host[:port]/directory/file

170

ftp://host[:port]/directory/file

171

172

You can also encode your username and password within a URL:

173

174

ftp://user:password@host/path

175

http://user:password@host/path

176

177

Either USER or PASSWORD, or both, may be left out. If you leave out

178

either the HTTP username or password, no authentication will be sent.

179

If you leave out the FTP username, `anonymous' will be used. If you

180

leave out the FTP password, your email address will be supplied as a

181

default password.(1)

182

183

*Important Note*: if you specify a password-containing URL on the

184

command line, the username and password will be plainly visible to all

185

users on the system, by way of `ps'. On multi-user systems, this is a

186

big security risk. To work around it, use `wget -i -' and feed the

187

URLs to Wget's standard input, each on a separate line, terminated by

188

`C-d'.

189

190

You can encode unsafe characters in a URL as `%xy', `xy' being the

191

hexadecimal representation of the character's ASCII value. Some common

192

unsafe characters include `%' (quoted as `%25'), `:' (quoted as `%3A'),

193

and `@' (quoted as `%40'). Refer to RFC1738 for a comprehensive list

194

of unsafe characters.

195

196

Wget also supports the `type' feature for FTP URLs. By default, FTP

197

documents are retrieved in the binary mode (type `i'), which means that

198

they are downloaded unchanged. Another useful mode is the `a'

199

("ASCII") mode, which converts the line delimiters between the

200

different operating systems, and is thus useful for text files. Here

201

is an example:

202

203

ftp://host/directory/file;type=a

204

205

Two alternative variants of URL specification are also supported,

206

because of historical (hysterical?) reasons and their widespreaded use.

207

208

FTP-only syntax (supported by `NcFTP'):

209

host:/dir/file

210

211

HTTP-only syntax (introduced by `Netscape'):

212

host[:port]/dir/file

213

214

These two alternative forms are deprecated, and may cease being

215

supported in the future.

216

217

If you do not understand the difference between these notations, or

218

do not know which one to use, just use the plain ordinary format you use

219

with your favorite browser, like `Lynx' or `Netscape'.

220

221

---------- Footnotes ----------

222

223

(1) If you have a `.netrc' file in your home directory, password

224

will also be searched for there.

225

226

227

File: wget.info, Node: Option Syntax, Next: Basic Startup Options, Prev: URL Format, Up: Invoking

228

229

Option Syntax

230

=============

231

232

Since Wget uses GNU getopts to process its arguments, every option

233

has a short form and a long form. Long options are more convenient to

234

remember, but take time to type. You may freely mix different option

235

styles, or specify options after the command-line arguments. Thus you

236

may write:

237

238

wget -r --tries=10 http://fly.srk.fer.hr/ -o log

239

240

The space between the option accepting an argument and the argument

241

may be omitted. Instead `-o log' you can write `-olog'.

242

243

You may put several options that do not require arguments together,

244

like:

245

246

wget -drc URL

247

248

This is a complete equivalent of:

249

250

wget -d -r -c URL

251

252

Since the options can be specified after the arguments, you may

253

terminate them with `--'. So the following will try to download URL

254

`-x', reporting failure to `log':

255

256

wget -o log -- -x

257

258

The options that accept comma-separated lists all respect the

259

convention that specifying an empty list clears its value. This can be

260

useful to clear the `.wgetrc' settings. For instance, if your `.wgetrc'

261

sets `exclude_directories' to `/cgi-bin', the following example will

262

first reset it, and then set it to exclude `/~nobody' and `/~somebody'.

263

You can also clear the lists in `.wgetrc' (*note Wgetrc Syntax::).

264

265

wget -X '' -X /~nobody,/~somebody

266

267

268

File: wget.info, Node: Basic Startup Options, Next: Logging and Input File Options, Prev: Option Syntax, Up: Invoking

269

270

Basic Startup Options

271

=====================

272

273

`-V'

274

`--version'

275

Display the version of Wget.

276

277

`-h'

278

`--help'

279

Print a help message describing all of Wget's command-line options.

280

281

`-b'

282

`--background'

283

Go to background immediately after startup. If no output file is

284

specified via the `-o', output is redirected to `wget-log'.

285

286

`-e COMMAND'

287

`--execute COMMAND'

288

Execute COMMAND as if it were a part of `.wgetrc' (*note Startup

289

File::). A command thus invoked will be executed _after_ the

290

commands in `.wgetrc', thus taking precedence over them.

291

292

293

File: wget.info, Node: Logging and Input File Options, Next: Download Options, Prev: Basic Startup Options, Up: Invoking

294

295

Logging and Input File Options

296

==============================

297

298

`-o LOGFILE'

299

`--output-file=LOGFILE'

300

Log all messages to LOGFILE. The messages are normally reported

301

to standard error.

302

303

`-a LOGFILE'

304

`--append-output=LOGFILE'

305

Append to LOGFILE. This is the same as `-o', only it appends to

306

LOGFILE instead of overwriting the old log file. If LOGFILE does

307

not exist, a new file is created.

308

309

`-d'

310

`--debug'

311

Turn on debug output, meaning various information important to the

312

developers of Wget if it does not work properly. Your system

313

administrator may have chosen to compile Wget without debug

314

support, in which case `-d' will not work. Please note that

315

compiling with debug support is always safe--Wget compiled with

316

the debug support will _not_ print any debug info unless requested

317

with `-d'. *Note Reporting Bugs::, for more information on how to

318

use `-d' for sending bug reports.

319

320

`-q'

321

`--quiet'

322

Turn off Wget's output.

323

324

`-v'

325

`--verbose'

326

Turn on verbose output, with all the available data. The default

327

output is verbose.

328

329

`-nv'

330

`--non-verbose'

331

Non-verbose output--turn off verbose without being completely quiet

332

(use `-q' for that), which means that error messages and basic

333

information still get printed.

334

335

`-i FILE'

336

`--input-file=FILE'

337

Read URLs from FILE, in which case no URLs need to be on the

338

command line. If there are URLs both on the command line and in

339

an input file, those on the command lines will be the first ones to

340

be retrieved. The FILE need not be an HTML document (but no harm

341

if it is)--it is enough if the URLs are just listed sequentially.

342

343

However, if you specify `--force-html', the document will be

344

regarded as `html'. In that case you may have problems with

345

relative links, which you can solve either by adding `<base

346

href="URL">' to the documents or by specifying `--base=URL' on the

347

command line.

348

349

`-F'

350

`--force-html'

351

When input is read from a file, force it to be treated as an HTML

352

file. This enables you to retrieve relative links from existing

353

HTML files on your local disk, by adding `<base href="URL">' to

354

HTML, or using the `--base' command-line option.

355

356

`-B URL'

357

`--base=URL'

358

When used in conjunction with `-F', prepends URL to relative links

359

in the file specified by `-i'.

360

361

362

File: wget.info, Node: Download Options, Next: Directory Options, Prev: Logging and Input File Options, Up: Invoking

363

364

Download Options

365

================

366

367

`--bind-address=ADDRESS'

368

When making client TCP/IP connections, `bind()' to ADDRESS on the

369

local machine. ADDRESS may be specified as a hostname or IP

370

address. This option can be useful if your machine is bound to

371

multiple IPs.

372

373

`-t NUMBER'

374

`--tries=NUMBER'

375

Set number of retries to NUMBER. Specify 0 or `inf' for infinite

376

retrying. The default is to retry 20 times, with the exception of

377

fatal errors like "connection refused" or "not found" (404), which

378

are not retried.

379

380

`-O FILE'

381

`--output-document=FILE'

382

The documents will not be written to the appropriate files, but

383

all will be concatenated together and written to FILE. If FILE

384

already exists, it will be overwritten. If the FILE is `-', the

385

documents will be written to standard output. Including this

386

option automatically sets the number of tries to 1.

387

388

`-nc'

389

`--no-clobber'

390

If a file is downloaded more than once in the same directory,

391

Wget's behavior depends on a few options, including `-nc'. In

392

certain cases, the local file will be "clobbered", or overwritten,

393

upon repeated download. In other cases it will be preserved.

394

395

When running Wget without `-N', `-nc', or `-r', downloading the

396

same file in the same directory will result in the original copy

397

of FILE being preserved and the second copy being named `FILE.1'.

398

If that file is downloaded yet again, the third copy will be named

399

`FILE.2', and so on. When `-nc' is specified, this behavior is

400

suppressed, and Wget will refuse to download newer copies of

401

`FILE'. Therefore, "`no-clobber'" is actually a misnomer in this

402

mode--it's not clobbering that's prevented (as the numeric

403

suffixes were already preventing clobbering), but rather the

404

multiple version saving that's prevented.

405

406

When running Wget with `-r', but without `-N' or `-nc',

407

re-downloading a file will result in the new copy simply

408

overwriting the old. Adding `-nc' will prevent this behavior,

409

instead causing the original version to be preserved and any newer

410

copies on the server to be ignored.

411

412

When running Wget with `-N', with or without `-r', the decision as

413

to whether or not to download a newer copy of a file depends on

414

the local and remote timestamp and size of the file (*note

415

Time-Stamping::). `-nc' may not be specified at the same time as

416

`-N'.

417

418

Note that when `-nc' is specified, files with the suffixes `.html'

419

or (yuck) `.htm' will be loaded from the local disk and parsed as

420

if they had been retrieved from the Web.

421

422

`-c'

423

`--continue'

424

Continue getting a partially-downloaded file. This is useful when

425

you want to finish up a download started by a previous instance of

426

Wget, or by another program. For instance:

427

428

wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z

429

430

If there is a file named `ls-lR.Z' in the current directory, Wget

431

will assume that it is the first portion of the remote file, and

432

will ask the server to continue the retrieval from an offset equal

433

to the length of the local file.

434

435

Note that you don't need to specify this option if you just want

436

the current invocation of Wget to retry downloading a file should

437

the connection be lost midway through. This is the default

438

behavior. `-c' only affects resumption of downloads started

439

_prior_ to this invocation of Wget, and whose local files are

440

still sitting around.

441

442

Without `-c', the previous example would just download the remote

443

file to `ls-lR.Z.1', leaving the truncated `ls-lR.Z' file alone.

444

445

Beginning with Wget 1.7, if you use `-c' on a non-empty file, and

446

it turns out that the server does not support continued

447

downloading, Wget will refuse to start the download from scratch,

448

which would effectively ruin existing contents. If you really

449

want the download to start from scratch, remove the file.

450

451

Also beginning with Wget 1.7, if you use `-c' on a file which is of

452

equal size as the one on the server, Wget will refuse to download

453

the file and print an explanatory message. The same happens when

454

the file is smaller on the server than locally (presumably because

455

it was changed on the server since your last download

456

attempt)--because "continuing" is not meaningful, no download

457

occurs.

458

459

On the other side of the coin, while using `-c', any file that's

460

bigger on the server than locally will be considered an incomplete

461

download and only `(length(remote) - length(local))' bytes will be

462

downloaded and tacked onto the end of the local file. This

463

behavior can be desirable in certain cases--for instance, you can

464

use `wget -c' to download just the new portion that's been

465

appended to a data collection or log file.

466

467

However, if the file is bigger on the server because it's been

468

_changed_, as opposed to just _appended_ to, you'll end up with a

469

garbled file. Wget has no way of verifying that the local file is

470

really a valid prefix of the remote file. You need to be

471

especially careful of this when using `-c' in conjunction with

472

`-r', since every file will be considered as an "incomplete

473

download" candidate.

474

475

Another instance where you'll get a garbled file if you try to use

476

`-c' is if you have a lame HTTP proxy that inserts a "transfer

477

interrupted" string into the local file. In the future a

478

"rollback" option may be added to deal with this case.

479

480

Note that `-c' only works with FTP servers and with HTTP servers

481

that support the `Range' header.

482

483

`--progress=TYPE'

484

Select the type of the progress indicator you wish to use. Legal

485

indicators are "dot" and "bar".

486

487

The "bar" indicator is used by default. It draws an ASCII progress

488

bar graphics (a.k.a "thermometer" display) indicating the status of

489

retrieval. If the output is not a TTY, the "dot" bar will be used

490

by default.

491

492

Use `--progress=dot' to switch to the "dot" display. It traces

493

the retrieval by printing dots on the screen, each dot

494

representing a fixed amount of downloaded data.

495

496

When using the dotted retrieval, you may also set the "style" by

497

specifying the type as `dot:STYLE'. Different styles assign

498

different meaning to one dot. With the `default' style each dot

499

represents 1K, there are ten dots in a cluster and 50 dots in a

500

line. The `binary' style has a more "computer"-like

501

orientation--8K dots, 16-dots clusters and 48 dots per line (which

502

makes for 384K lines). The `mega' style is suitable for

503

downloading very large files--each dot represents 64K retrieved,

504

there are eight dots in a cluster, and 48 dots on each line (so

505

each line contains 3M).

506

507

Note that you can set the default style using the `progress'

508

command in `.wgetrc'. That setting may be overridden from the

509

command line. The exception is that, when the output is not a

510

TTY, the "dot" progress will be favored over "bar". To force the

511

bar output, use `--progress=bar:force'.

512

513

`-N'

514

`--timestamping'

515

Turn on time-stamping. *Note Time-Stamping::, for details.

516

517

`-S'

518

`--server-response'

519

Print the headers sent by HTTP servers and responses sent by FTP

520

servers.

521

522

`--spider'

523

When invoked with this option, Wget will behave as a Web "spider",

524

which means that it will not download the pages, just check that

525

they are there. For example, you can use Wget to check your

526

bookmarks:

527

528

wget --spider --force-html -i bookmarks.html

529

530

This feature needs much more work for Wget to get close to the

531

functionality of real web spiders.

532

533

`-T seconds'

534

`--timeout=SECONDS'

535

Set the network timeout to SECONDS seconds. This is equivalent to

536

specifying `--dns-timeout', `--connect-timeout', and

537

`--read-timeout', all at the same time.

538

539

Whenever Wget connects to or reads from a remote host, it checks

540

for a timeout and aborts the operation if the time expires. This

541

prevents anomalous occurrences such as hanging reads or infinite

542

connects. The only timeout enabled by default is a 900-second

543

timeout for reading. Setting timeout to 0 disables checking for

544

timeouts.

545

546

Unless you know what you are doing, it is best not to set any of

547

the timeout-related options.

548

549

`--dns-timeout=SECONDS'

550

Set the DNS lookup timeout to SECONDS seconds. DNS lookups that

551

don't complete within the specified time will fail. By default,

552

there is no timeout on DNS lookups, other than that implemented by

553

system libraries.

554

555

`--connect-timeout=SECONDS'

556

Set the connect timeout to SECONDS seconds. TCP connections that

557

take longer to establish will be aborted. By default, there is no

558

connect timeout, other than that implemented by system libraries.

559

560

`--read-timeout=SECONDS'

561

Set the read (and write) timeout to SECONDS seconds. Reads that

562

take longer will fail. The default value for read timeout is 900

563

seconds.

564

565

`--limit-rate=AMOUNT'

566

Limit the download speed to AMOUNT bytes per second. Amount may

567

be expressed in bytes, kilobytes with the `k' suffix, or megabytes

568

with the `m' suffix. For example, `--limit-rate=20k' will limit

569

the retrieval rate to 20KB/s. This kind of thing is useful when,

570

for whatever reason, you don't want Wget to consume the entire

571

available bandwidth.

572

573

Note that Wget implements the limiting by sleeping the appropriate

574

amount of time after a network read that took less time than

575

specified by the rate. Eventually this strategy causes the TCP

576

transfer to slow down to approximately the specified rate.

577

However, it may take some time for this balance to be achieved, so

578

don't be surprised if limiting the rate doesn't work well with

579

very small files.

580

581

`-w SECONDS'

582

`--wait=SECONDS'

583

Wait the specified number of seconds between the retrievals. Use

584

of this option is recommended, as it lightens the server load by

585

making the requests less frequent. Instead of in seconds, the

586

time can be specified in minutes using the `m' suffix, in hours

587

using `h' suffix, or in days using `d' suffix.

588

589

Specifying a large value for this option is useful if the network

590

or the destination host is down, so that Wget can wait long enough

591

to reasonably expect the network error to be fixed before the

592

retry.

593

594

`--waitretry=SECONDS'

595

If you don't want Wget to wait between _every_ retrieval, but only

596

between retries of failed downloads, you can use this option.

597

Wget will use "linear backoff", waiting 1 second after the first

598

failure on a given file, then waiting 2 seconds after the second

599

failure on that file, up to the maximum number of SECONDS you

600

specify. Therefore, a value of 10 will actually make Wget wait up

601

to (1 + 2 + ... + 10) = 55 seconds per file.

602

603

Note that this option is turned on by default in the global

604

`wgetrc' file.

605

606

`--random-wait'

607

Some web sites may perform log analysis to identify retrieval

608

programs such as Wget by looking for statistically significant

609

similarities in the time between requests. This option causes the

610

time between requests to vary between 0 and 2 * WAIT seconds,

611

where WAIT was specified using the `--wait' option, in order to

612

mask Wget's presence from such analysis.

613

614

A recent article in a publication devoted to development on a

615

popular consumer platform provided code to perform this analysis

616

on the fly. Its author suggested blocking at the class C address

617

level to ensure automated retrieval programs were blocked despite

618

changing DHCP-supplied addresses.

619

620

The `--random-wait' option was inspired by this ill-advised

621

recommendation to block many unrelated users from a web site due

622

to the actions of one.

623

624

`-Y on/off'

625

`--proxy=on/off'

626

Turn proxy support on or off. The proxy is on by default if the

627

appropriate environment variable is defined.

628

629

For more information about the use of proxies with Wget, *Note

630

Proxies::.

631

632

`-Q QUOTA'

633

`--quota=QUOTA'

634

Specify download quota for automatic retrievals. The value can be

635

specified in bytes (default), kilobytes (with `k' suffix), or

636

megabytes (with `m' suffix).

637

638

Note that quota will never affect downloading a single file. So

639

if you specify `wget -Q10k ftp://wuarchive.wustl.edu/ls-lR.gz',

640

all of the `ls-lR.gz' will be downloaded. The same goes even when

641

several URLs are specified on the command-line. However, quota is

642

respected when retrieving either recursively, or from an input

643

file. Thus you may safely type `wget -Q2m -i sites'--download

644

will be aborted when the quota is exceeded.

645

646

Setting quota to 0 or to `inf' unlimits the download quota.

647

648

`--dns-cache=off'

649

Turn off caching of DNS lookups. Normally, Wget remembers the

650

addresses it looked up from DNS so it doesn't have to repeatedly

651

contact the DNS server for the same (typically small) set of

652

addresses it retrieves from. This cache exists in memory only; a

653

new Wget run will contact DNS again.

654

655

However, in some cases it is not desirable to cache host names,

656

even for the duration of a short-running application like Wget.

657

For example, some HTTP servers are hosted on machines with

658

dynamically allocated IP addresses that change from time to time.

659

Their DNS entries are updated along with each change. When Wget's

660

download from such a host gets interrupted by IP address change,

661

Wget retries the download, but (due to DNS caching) it contacts

662

the old address. With the DNS cache turned off, Wget will repeat

663

the DNS lookup for every connect and will thus get the correct

664

dynamic address every time--at the cost of additional DNS lookups

665

where they're probably not needed.

666

667

If you don't understand the above description, you probably won't

668

need this option.

669

670

`--restrict-file-names=MODE'

671

Change which characters found in remote URLs may show up in local

672

file names generated from those URLs. Characters that are

673

"restricted" by this option are escaped, i.e. replaced with `%HH',

674

where `HH' is the hexadecimal number that corresponds to the

675

restricted character.

676

677

By default, Wget escapes the characters that are not valid as part

678

of file names on your operating system, as well as control

679

characters that are typically unprintable. This option is useful

680

for changing these defaults, either because you are downloading to

681

a non-native partition, or because you want to disable escaping of

682

the control characters.

683

684

When mode is set to "unix", Wget escapes the character `/' and the

685

control characters in the ranges 0-31 and 128-159. This is the

686

default on Unix-like OS'es.

687

688

When mode is set to "windows", Wget escapes the characters `\',

689

`|', `/', `:', `?', `"', `*', `<', `>', and the control characters

690

in the ranges 0-31 and 128-159. In addition to this, Wget in

691

Windows mode uses `+' instead of `:' to separate host and port in

692

local file names, and uses `@' instead of `?' to separate the

693

query portion of the file name from the rest. Therefore, a URL

694

that would be saved as `www.xemacs.org:4300/search.pl?input=blah'

695

in Unix mode would be saved as

696

`www.xemacs.org+4300/search.pl@input=blah' in Windows mode. This

697

mode is the default on Windows.

698

699

If you append `,nocontrol' to the mode, as in `unix,nocontrol',

700

escaping of the control characters is also switched off. You can

701

use `--restrict-file-names=nocontrol' to turn off escaping of

702

control characters without affecting the choice of the OS to use

703

as file name restriction mode.

704

705

706

File: wget.info, Node: Directory Options, Next: HTTP Options, Prev: Download Options, Up: Invoking

707

708

Directory Options

709

=================

710

711

`-nd'

712

`--no-directories'

713

Do not create a hierarchy of directories when retrieving

714

recursively. With this option turned on, all files will get saved

715

to the current directory, without clobbering (if a name shows up

716

more than once, the filenames will get extensions `.n').

717

718

`-x'

719

`--force-directories'

720

The opposite of `-nd'--create a hierarchy of directories, even if

721

one would not have been created otherwise. E.g. `wget -x

722

http://fly.srk.fer.hr/robots.txt' will save the downloaded file to

723

`fly.srk.fer.hr/robots.txt'.

724

725

`-nH'

726

`--no-host-directories'

727

Disable generation of host-prefixed directories. By default,

728

invoking Wget with `-r http://fly.srk.fer.hr/' will create a

729

structure of directories beginning with `fly.srk.fer.hr/'. This

730

option disables such behavior.

731

732

`--cut-dirs=NUMBER'

733

Ignore NUMBER directory components. This is useful for getting a

734

fine-grained control over the directory where recursive retrieval

735

will be saved.

736

737

Take, for example, the directory at

738

`ftp://ftp.xemacs.org/pub/xemacs/'. If you retrieve it with `-r',

739

it will be saved locally under `ftp.xemacs.org/pub/xemacs/'.

740

While the `-nH' option can remove the `ftp.xemacs.org/' part, you

741

are still stuck with `pub/xemacs'. This is where `--cut-dirs'

742

comes in handy; it makes Wget not "see" NUMBER remote directory

743

components. Here are several examples of how `--cut-dirs' option

744

works.

745

746

No options -> ftp.xemacs.org/pub/xemacs/

747

-nH -> pub/xemacs/

748

-nH --cut-dirs=1 -> xemacs/

749

-nH --cut-dirs=2 -> .

750

751

--cut-dirs=1 -> ftp.xemacs.org/xemacs/

752

...

753

754

If you just want to get rid of the directory structure, this

755

option is similar to a combination of `-nd' and `-P'. However,

756

unlike `-nd', `--cut-dirs' does not lose with subdirectories--for

757

instance, with `-nH --cut-dirs=1', a `beta/' subdirectory will be

758

placed to `xemacs/beta', as one would expect.

759

760

`-P PREFIX'

761

`--directory-prefix=PREFIX'

762

Set directory prefix to PREFIX. The "directory prefix" is the

763

directory where all other files and subdirectories will be saved

764

to, i.e. the top of the retrieval tree. The default is `.' (the

765

current directory).

766

767

768

File: wget.info, Node: HTTP Options, Next: FTP Options, Prev: Directory Options, Up: Invoking

769

770

HTTP Options

771

============

772

773

`-E'

774

`--html-extension'

775

If a file of type `application/xhtml+xml' or `text/html' is

776

downloaded and the URL does not end with the regexp

777

`\.[Hh][Tt][Mm][Ll]?', this option will cause the suffix `.html'

778

to be appended to the local filename. This is useful, for

779

instance, when you're mirroring a remote site that uses `.asp'

780

pages, but you want the mirrored pages to be viewable on your

781

stock Apache server. Another good use for this is when you're

782

downloading CGI-generated materials. A URL like

783

`http://site.com/article.cgi?25' will be saved as

784

`article.cgi?25.html'.

785

786

Note that filenames changed in this way will be re-downloaded

787

every time you re-mirror a site, because Wget can't tell that the

788

local `X.html' file corresponds to remote URL `X' (since it

789

doesn't yet know that the URL produces output of type `text/html'

790

or `application/xhtml+xml'. To prevent this re-downloading, you

791

must use `-k' and `-K' so that the original version of the file

792

will be saved as `X.orig' (*note Recursive Retrieval Options::).

793

794

`--http-user=USER'

795

`--http-passwd=PASSWORD'

796

Specify the username USER and password PASSWORD on an HTTP server.

797

According to the type of the challenge, Wget will encode them

798

using either the `basic' (insecure) or the `digest' authentication

799

scheme.

800

801

Another way to specify username and password is in the URL itself

802

(*note URL Format::). Either method reveals your password to

803

anyone who bothers to run `ps'. To prevent the passwords from

804

being seen, store them in `.wgetrc' or `.netrc', and make sure to

805

protect those files from other users with `chmod'. If the

806

passwords are really important, do not leave them lying in those

807

files either--edit the files and delete them after Wget has

808

started the download.

809

810

For more information about security issues with Wget, *Note

811

Security Considerations::.

812

813

`-C on/off'

814

`--cache=on/off'

815

When set to off, disable server-side cache. In this case, Wget

816

will send the remote server an appropriate directive (`Pragma:

817

no-cache') to get the file from the remote service, rather than

818

returning the cached version. This is especially useful for

819

retrieving and flushing out-of-date documents on proxy servers.

820

821

Caching is allowed by default.

822

823

`--cookies=on/off'

824

When set to off, disable the use of cookies. Cookies are a

825

mechanism for maintaining server-side state. The server sends the

826

client a cookie using the `Set-Cookie' header, and the client

827

responds with the same cookie upon further requests. Since

828

cookies allow the server owners to keep track of visitors and for

829

sites to exchange this information, some consider them a breach of

830

privacy. The default is to use cookies; however, _storing_

831

cookies is not on by default.

832

833

`--load-cookies FILE'

834

Load cookies from FILE before the first HTTP retrieval. FILE is a

835

textual file in the format originally used by Netscape's

836

`cookies.txt' file.

837

838

You will typically use this option when mirroring sites that

839

require that you be logged in to access some or all of their

840

content. The login process typically works by the web server

841

issuing an HTTP cookie upon receiving and verifying your

842

credentials. The cookie is then resent by the browser when

843

accessing that part of the site, and so proves your identity.

844

845

Mirroring such a site requires Wget to send the same cookies your

846

browser sends when communicating with the site. This is achieved

847

by `--load-cookies'--simply point Wget to the location of the

848

`cookies.txt' file, and it will send the same cookies your browser

849

would send in the same situation. Different browsers keep textual

850

cookie files in different locations:

851

852

Netscape 4.x.

853

The cookies are in `~/.netscape/cookies.txt'.

854

855

Mozilla and Netscape 6.x.

856

Mozilla's cookie file is also named `cookies.txt', located

857

somewhere under `~/.mozilla', in the directory of your

858

profile. The full path usually ends up looking somewhat like

859

`~/.mozilla/default/SOME-WEIRD-STRING/cookies.txt'.

860

861

Internet Explorer.

862

You can produce a cookie file Wget can use by using the File

863

menu, Import and Export, Export Cookies. This has been

864

tested with Internet Explorer 5; it is not guaranteed to work

865

with earlier versions.

866

867

Other browsers.

868

If you are using a different browser to create your cookies,

869

`--load-cookies' will only work if you can locate or produce a

870

cookie file in the Netscape format that Wget expects.

871

872

If you cannot use `--load-cookies', there might still be an

873

alternative. If your browser supports a "cookie manager", you can

874

use it to view the cookies used when accessing the site you're

875

mirroring. Write down the name and value of the cookie, and

876

manually instruct Wget to send those cookies, bypassing the

877

"official" cookie support:

878

879

wget --cookies=off --header "Cookie: NAME=VALUE"

880

881

`--save-cookies FILE'

882

Save cookies to FILE at the end of session. Cookies whose expiry

883

time is not specified, or those that have already expired, are not

884

saved.

885

886

`--ignore-length'

887

Unfortunately, some HTTP servers (CGI programs, to be more

888

precise) send out bogus `Content-Length' headers, which makes Wget

889

go wild, as it thinks not all the document was retrieved. You can

890

spot this syndrome if Wget retries getting the same document again

891

and again, each time claiming that the (otherwise normal)

892

connection has closed on the very same byte.

893

894

With this option, Wget will ignore the `Content-Length' header--as

895

if it never existed.

896

897

`--header=ADDITIONAL-HEADER'

898

Define an ADDITIONAL-HEADER to be passed to the HTTP servers.

899

Headers must contain a `:' preceded by one or more non-blank

900

characters, and must not contain newlines.

901

902

You may define more than one additional header by specifying

903

`--header' more than once.

904

905

wget --header='Accept-Charset: iso-8859-2' \

906

--header='Accept-Language: hr' \

907

http://fly.srk.fer.hr/

908

909

Specification of an empty string as the header value will clear all

910

previous user-defined headers.

911

912

`--proxy-user=USER'

913

`--proxy-passwd=PASSWORD'

914

Specify the username USER and password PASSWORD for authentication

915

on a proxy server. Wget will encode them using the `basic'

916

authentication scheme.

917

918

Security considerations similar to those with `--http-passwd'

919

pertain here as well.

920

921

`--referer=URL'

922

Include `Referer: URL' header in HTTP request. Useful for

923

retrieving documents with server-side processing that assume they

924

are always being retrieved by interactive web browsers and only

925

come out properly when Referer is set to one of the pages that

926

point to them.

927

928

`-s'

929

`--save-headers'

930

Save the headers sent by the HTTP server to the file, preceding the

931

actual contents, with an empty line as the separator.

932

933

`-U AGENT-STRING'

934

`--user-agent=AGENT-STRING'

935

Identify as AGENT-STRING to the HTTP server.

936

937

The HTTP protocol allows the clients to identify themselves using a

938

`User-Agent' header field. This enables distinguishing the WWW

939

software, usually for statistical purposes or for tracing of

940

protocol violations. Wget normally identifies as `Wget/VERSION',

941

VERSION being the current version number of Wget.

942

943

However, some sites have been known to impose the policy of

944

tailoring the output according to the `User-Agent'-supplied

945

information. While conceptually this is not such a bad idea, it

946

has been abused by servers denying information to clients other

947

than `Mozilla' or Microsoft `Internet Explorer'. This option

948

allows you to change the `User-Agent' line issued by Wget. Use of

949

this option is discouraged, unless you really know what you are

950

doing.

951

952

`--post-data=STRING'

953

`--post-file=FILE'

954

Use POST as the method for all HTTP requests and send the

955

specified data in the request body. `--post-data' sends STRING as

956

data, whereas `--post-file' sends the contents of FILE. Other than

957

that, they work in exactly the same way.

958

959

Please be aware that Wget needs to know the size of the POST data

960

in advance. Therefore the argument to `--post-file' must be a

961

regular file; specifying a FIFO or something like `/dev/stdin'

962

won't work. It's not quite clear how to work around this

963

limitation inherent in HTTP/1.0. Although HTTP/1.1 introduces

964

"chunked" transfer that doesn't require knowing the request length

965

in advance, a client can't use chunked unless it knows it's

966

talking to an HTTP/1.1 server. And it can't know that until it

967

receives a response, which in turn requires the request to have

968

been completed - a chicken-and-egg problem.

969

970

Note: if Wget is redirected after the POST request is completed,

971

it will not send the POST data to the redirected URL. This is

972

because URLs that process POST often respond with a redirection to

973

a regular page (although that's technically disallowed), which

974

does not desire or accept POST. It is not yet clear that this

975

behavior is optimal; if it doesn't work out, it will be changed.

976

977

This example shows how to log to a server using POST and then

978

proceed to download the desired pages, presumably only accessible

979

to authorized users:

980

981

# Log in to the server. This can be done only once.

982

wget --save-cookies cookies.txt \

983

--post-data 'user=foo&password=bar' \

984

http://server.com/auth.php

985

986

# Now grab the page or pages we care about.

987

wget --load-cookies cookies.txt \

988

-p http://server.com/interesting/article.php

989

990

991

File: wget.info, Node: FTP Options, Next: Recursive Retrieval Options, Prev: HTTP Options, Up: Invoking

992

993

FTP Options

994

===========

995

996

`-nr'

997

`--dont-remove-listing'

998

Don't remove the temporary `.listing' files generated by FTP

999

retrievals. Normally, these files contain the raw directory

1000

listings received from FTP servers. Not removing them can be

1001

useful for debugging purposes, or when you want to be able to

1002

easily check on the contents of remote server directories (e.g. to

1003

verify that a mirror you're running is complete).

1004

1005

Note that even though Wget writes to a known filename for this

1006

file, this is not a security hole in the scenario of a user making

1007

`.listing' a symbolic link to `/etc/passwd' or something and

1008

asking `root' to run Wget in his or her directory. Depending on

1009

the options used, either Wget will refuse to write to `.listing',

1010

making the globbing/recursion/time-stamping operation fail, or the

1011

symbolic link will be deleted and replaced with the actual

1012

`.listing' file, or the listing will be written to a

1013

`.listing.NUMBER' file.

1014

1015

Even though this situation isn't a problem, though, `root' should

1016

never run Wget in a non-trusted user's directory. A user could do

1017

something as simple as linking `index.html' to `/etc/passwd' and

1018

asking `root' to run Wget with `-N' or `-r' so the file will be

1019

overwritten.

1020

1021

`-g on/off'

1022

`--glob=on/off'

1023

Turn FTP globbing on or off. Globbing means you may use the

1024

shell-like special characters ("wildcards"), like `*', `?', `['

1025

and `]' to retrieve more than one file from the same directory at

1026

once, like:

1027

1028

wget ftp://gnjilux.srk.fer.hr/*.msg

1029

1030

By default, globbing will be turned on if the URL contains a

1031

globbing character. This option may be used to turn globbing on

1032

or off permanently.

1033

1034

You may have to quote the URL to protect it from being expanded by

1035

your shell. Globbing makes Wget look for a directory listing,

1036

which is system-specific. This is why it currently works only

1037

with Unix FTP servers (and the ones emulating Unix `ls' output).

1038

1039

`--passive-ftp'

1040

Use the "passive" FTP retrieval scheme, in which the client

1041

initiates the data connection. This is sometimes required for FTP

1042

to work behind firewalls.

1043

1044

`--retr-symlinks'

1045

Usually, when retrieving FTP directories recursively and a symbolic

1046

link is encountered, the linked-to file is not downloaded.

1047

Instead, a matching symbolic link is created on the local

1048

filesystem. The pointed-to file will not be downloaded unless

1049

this recursive retrieval would have encountered it separately and

1050

downloaded it anyway.

1051

1052

When `--retr-symlinks' is specified, however, symbolic links are

1053

traversed and the pointed-to files are retrieved. At this time,

1054

this option does not cause Wget to traverse symlinks to

1055

directories and recurse through them, but in the future it should

1056

be enhanced to do this.

1057

1058

Note that when retrieving a file (not a directory) because it was

1059

specified on the command-line, rather than because it was recursed

1060

to, this option has no effect. Symbolic links are always

1061

traversed in this case.

1062

Older »