1
:mod:`urllib2` --- extensible library for opening URLs
2
======================================================
5
:synopsis: Next generation URL opening library.
6
.. moduleauthor:: Jeremy Hylton <jhylton@users.sourceforge.net>
7
.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
11
The :mod:`urllib2` module has been split across several modules in
12
Python 3 named :mod:`urllib.request` and :mod:`urllib.error`.
13
The :term:`2to3` tool will automatically adapt imports when converting
14
your sources to Python 3.
17
The :mod:`urllib2` module defines functions and classes which help in opening
18
URLs (mostly HTTP) in a complex world --- basic and digest authentication,
19
redirections, cookies and more.
23
The `Requests package <http://requests.readthedocs.org/>`_
24
is recommended for a higher-level http client interface.
27
The :mod:`urllib2` module defines the following functions:
30
.. function:: urlopen(url[, data[, timeout[, cafile[, capath[, cadefault[, context]]]]])
32
Open the URL *url*, which can be either a string or a :class:`Request` object.
34
*data* may be a string specifying additional data to send to the server, or
35
``None`` if no such data is needed. Currently HTTP requests are the only ones
36
that use *data*; the HTTP request will be a POST instead of a GET when the
37
*data* parameter is provided. *data* should be a buffer in the standard
38
:mimetype:`application/x-www-form-urlencoded` format. The
39
:func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
40
returns a string in this format. urllib2 module sends HTTP/1.1 requests with
41
``Connection:close`` header included.
43
The optional *timeout* parameter specifies a timeout in seconds for blocking
44
operations like the connection attempt (if not specified, the global default
45
timeout setting will be used). This actually only works for HTTP, HTTPS and
48
If *context* is specified, it must be a :class:`ssl.SSLContext` instance
49
describing the various SSL options. See :class:`~httplib.HTTPSConnection` for
52
The optional *cafile* and *capath* parameters specify a set of trusted CA
53
certificates for HTTPS requests. *cafile* should point to a single file
54
containing a bundle of CA certificates, whereas *capath* should point to a
55
directory of hashed certificate files. More information can be found in
56
:meth:`ssl.SSLContext.load_verify_locations`.
58
The *cadefault* parameter is ignored.
60
This function returns a file-like object with three additional methods:
62
* :meth:`geturl` --- return the URL of the resource retrieved, commonly used to
63
determine if a redirect was followed
65
* :meth:`info` --- return the meta-information of the page, such as headers,
66
in the form of an :class:`mimetools.Message` instance
67
(see `Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_)
69
* :meth:`getcode` --- return the HTTP status code of the response.
71
Raises :exc:`URLError` on errors.
73
Note that ``None`` may be returned if no handler handles the request (though the
74
default installed global :class:`OpenerDirector` uses :class:`UnknownHandler` to
75
ensure this never happens).
77
In addition, if proxy settings are detected (for example, when a ``*_proxy``
78
environment variable like :envvar:`http_proxy` is set),
79
:class:`ProxyHandler` is default installed and makes sure the requests are
80
handled through the proxy.
82
.. versionchanged:: 2.6
85
.. versionchanged:: 2.7.9
86
*cafile*, *capath*, *cadefault*, and *context* were added.
89
.. function:: install_opener(opener)
91
Install an :class:`OpenerDirector` instance as the default global opener.
92
Installing an opener is only necessary if you want urlopen to use that opener;
93
otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
94
The code does not check for a real :class:`OpenerDirector`, and any class with
95
the appropriate interface will work.
98
.. function:: build_opener([handler, ...])
100
Return an :class:`OpenerDirector` instance, which chains the handlers in the
101
order given. *handler*\s can be either instances of :class:`BaseHandler`, or
102
subclasses of :class:`BaseHandler` (in which case it must be possible to call
103
the constructor without any parameters). Instances of the following classes
104
will be in front of the *handler*\s, unless the *handler*\s contain them,
105
instances of them or subclasses of them: :class:`ProxyHandler` (if proxy
106
settings are detected),
107
:class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`,
108
:class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`,
109
:class:`HTTPErrorProcessor`.
111
If the Python installation has SSL support (i.e., if the :mod:`ssl` module can be imported),
112
:class:`HTTPSHandler` will also be added.
114
Beginning in Python 2.3, a :class:`BaseHandler` subclass may also change its
115
:attr:`handler_order` attribute to modify its position in the handlers
118
The following exceptions are raised as appropriate:
121
.. exception:: URLError
123
The handlers raise this exception (or derived exceptions) when they run into a
124
problem. It is a subclass of :exc:`IOError`.
126
.. attribute:: reason
128
The reason for this error. It can be a message string or another exception
129
instance (:exc:`socket.error` for remote URLs, :exc:`OSError` for local
133
.. exception:: HTTPError
135
Though being an exception (a subclass of :exc:`URLError`), an :exc:`HTTPError`
136
can also function as a non-exceptional file-like return value (the same thing
137
that :func:`urlopen` returns). This is useful when handling exotic HTTP
138
errors, such as requests for authentication.
142
An HTTP status code as defined in `RFC 2616 <http://www.faqs.org/rfcs/rfc2616.html>`_.
143
This numeric value corresponds to a value found in the dictionary of
144
codes as found in :attr:`BaseHTTPServer.BaseHTTPRequestHandler.responses`.
146
.. attribute:: reason
148
The reason for this error. It can be a message string or another exception
151
The following classes are provided:
154
.. class:: Request(url[, data][, headers][, origin_req_host][, unverifiable])
156
This class is an abstraction of a URL request.
158
*url* should be a string containing a valid URL.
160
*data* may be a string specifying additional data to send to the server, or
161
``None`` if no such data is needed. Currently HTTP requests are the only ones
162
that use *data*; the HTTP request will be a POST instead of a GET when the
163
*data* parameter is provided. *data* should be a buffer in the standard
164
:mimetype:`application/x-www-form-urlencoded` format. The
165
:func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
166
returns a string in this format.
168
*headers* should be a dictionary, and will be treated as if :meth:`add_header`
169
was called with each key and value as arguments. This is often used to "spoof"
170
the ``User-Agent`` header, which is used by a browser to identify itself --
171
some HTTP servers only allow requests coming from common browsers as opposed
172
to scripts. For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
173
(X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib2`'s
174
default user agent string is ``"Python-urllib/2.6"`` (on Python 2.6).
176
The final two arguments are only of interest for correct handling of third-party
179
*origin_req_host* should be the request-host of the origin transaction, as
180
defined by :rfc:`2965`. It defaults to ``cookielib.request_host(self)``. This
181
is the host name or IP address of the original request that was initiated by the
182
user. For example, if the request is for an image in an HTML document, this
183
should be the request-host of the request for the page containing the image.
185
*unverifiable* should indicate whether the request is unverifiable, as defined
186
by RFC 2965. It defaults to ``False``. An unverifiable request is one whose URL
187
the user did not have the option to approve. For example, if the request is for
188
an image in an HTML document, and the user had no option to approve the
189
automatic fetching of the image, this should be true.
192
.. class:: OpenerDirector()
194
The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
195
together. It manages the chaining of handlers, and recovery from errors.
198
.. class:: BaseHandler()
200
This is the base class for all registered handlers --- and handles only the
201
simple mechanics of registration.
204
.. class:: HTTPDefaultErrorHandler()
206
A class which defines a default handler for HTTP error responses; all responses
207
are turned into :exc:`HTTPError` exceptions.
210
.. class:: HTTPRedirectHandler()
212
A class to handle redirections.
215
.. class:: HTTPCookieProcessor([cookiejar])
217
A class to handle HTTP Cookies.
220
.. class:: ProxyHandler([proxies])
222
Cause requests to go through a proxy. If *proxies* is given, it must be a
223
dictionary mapping protocol names to URLs of proxies. The default is to read
224
the list of proxies from the environment variables
225
:envvar:`<protocol>_proxy`. If no proxy environment variables are set, then
226
in a Windows environment proxy settings are obtained from the registry's
227
Internet Settings section, and in a Mac OS X environment proxy information
228
is retrieved from the OS X System Configuration Framework.
230
To disable autodetected proxy pass an empty dictionary.
233
.. class:: HTTPPasswordMgr()
235
Keep a database of ``(realm, uri) -> (user, password)`` mappings.
238
.. class:: HTTPPasswordMgrWithDefaultRealm()
240
Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
241
``None`` is considered a catch-all realm, which is searched if no other realm
245
.. class:: AbstractBasicAuthHandler([password_mgr])
247
This is a mixin class that helps with HTTP authentication, both to the remote
248
host and to a proxy. *password_mgr*, if given, should be something that is
249
compatible with :class:`HTTPPasswordMgr`; refer to section
250
:ref:`http-password-mgr` for information on the interface that must be
254
.. class:: HTTPBasicAuthHandler([password_mgr])
256
Handle authentication with the remote host. *password_mgr*, if given, should be
257
something that is compatible with :class:`HTTPPasswordMgr`; refer to section
258
:ref:`http-password-mgr` for information on the interface that must be
262
.. class:: ProxyBasicAuthHandler([password_mgr])
264
Handle authentication with the proxy. *password_mgr*, if given, should be
265
something that is compatible with :class:`HTTPPasswordMgr`; refer to section
266
:ref:`http-password-mgr` for information on the interface that must be
270
.. class:: AbstractDigestAuthHandler([password_mgr])
272
This is a mixin class that helps with HTTP authentication, both to the remote
273
host and to a proxy. *password_mgr*, if given, should be something that is
274
compatible with :class:`HTTPPasswordMgr`; refer to section
275
:ref:`http-password-mgr` for information on the interface that must be
279
.. class:: HTTPDigestAuthHandler([password_mgr])
281
Handle authentication with the remote host. *password_mgr*, if given, should be
282
something that is compatible with :class:`HTTPPasswordMgr`; refer to section
283
:ref:`http-password-mgr` for information on the interface that must be
287
.. class:: ProxyDigestAuthHandler([password_mgr])
289
Handle authentication with the proxy. *password_mgr*, if given, should be
290
something that is compatible with :class:`HTTPPasswordMgr`; refer to section
291
:ref:`http-password-mgr` for information on the interface that must be
295
.. class:: HTTPHandler()
297
A class to handle opening of HTTP URLs.
300
.. class:: HTTPSHandler([debuglevel[, context]])
302
A class to handle opening of HTTPS URLs. *context* has the same meaning as
303
for :class:`httplib.HTTPSConnection`.
305
.. versionchanged:: 2.7.9
309
.. class:: FileHandler()
314
.. class:: FTPHandler()
319
.. class:: CacheFTPHandler()
321
Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
324
.. class:: UnknownHandler()
326
A catch-all class to handle unknown URLs.
329
.. class:: HTTPErrorProcessor()
331
Process HTTP error responses.
339
The following methods describe all of :class:`Request`'s public interface, and
340
so all must be overridden in subclasses.
343
.. method:: Request.add_data(data)
345
Set the :class:`Request` data to *data*. This is ignored by all handlers except
346
HTTP handlers --- and there it should be a byte string, and will change the
347
request to be ``POST`` rather than ``GET``.
350
.. method:: Request.get_method()
352
Return a string indicating the HTTP request method. This is only meaningful for
353
HTTP requests, and currently always returns ``'GET'`` or ``'POST'``.
356
.. method:: Request.has_data()
358
Return whether the instance has a non-\ ``None`` data.
361
.. method:: Request.get_data()
363
Return the instance's data.
366
.. method:: Request.add_header(key, val)
368
Add another header to the request. Headers are currently ignored by all
369
handlers except HTTP handlers, where they are added to the list of headers sent
370
to the server. Note that there cannot be more than one header with the same
371
name, and later calls will overwrite previous calls in case the *key* collides.
372
Currently, this is no loss of HTTP functionality, since all headers which have
373
meaning when used more than once have a (header-specific) way of gaining the
374
same functionality using only one header.
377
.. method:: Request.add_unredirected_header(key, header)
379
Add a header that will not be added to a redirected request.
381
.. versionadded:: 2.4
384
.. method:: Request.has_header(header)
386
Return whether the instance has the named header (checks both regular and
389
.. versionadded:: 2.4
392
.. method:: Request.get_full_url()
394
Return the URL given in the constructor.
397
.. method:: Request.get_type()
399
Return the type of the URL --- also known as the scheme.
402
.. method:: Request.get_host()
404
Return the host to which a connection will be made.
407
.. method:: Request.get_selector()
409
Return the selector --- the part of the URL that is sent to the server.
412
.. method:: Request.get_header(header_name, default=None)
414
Return the value of the given header. If the header is not present, return
418
.. method:: Request.header_items()
420
Return a list of tuples (header_name, header_value) of the Request headers.
423
.. method:: Request.set_proxy(host, type)
425
Prepare the request by connecting to a proxy server. The *host* and *type* will
426
replace those of the instance, and the instance's selector will be the original
427
URL given in the constructor.
430
.. method:: Request.get_origin_req_host()
432
Return the request-host of the origin transaction, as defined by :rfc:`2965`.
433
See the documentation for the :class:`Request` constructor.
436
.. method:: Request.is_unverifiable()
438
Return whether the request is unverifiable, as defined by RFC 2965. See the
439
documentation for the :class:`Request` constructor.
442
.. _opener-director-objects:
444
OpenerDirector Objects
445
----------------------
447
:class:`OpenerDirector` instances have the following methods:
450
.. method:: OpenerDirector.add_handler(handler)
452
*handler* should be an instance of :class:`BaseHandler`. The following
453
methods are searched, and added to the possible chains (note that HTTP errors
456
* :samp:`{protocol}_open` --- signal that the handler knows how to open
459
* :samp:`http_error_{type}` --- signal that the handler knows how to handle
460
HTTP errors with HTTP error code *type*.
462
* :samp:`{protocol}_error` --- signal that the handler knows how to handle
463
errors from (non-\ ``http``) *protocol*.
465
* :samp:`{protocol}_request` --- signal that the handler knows how to
466
pre-process *protocol* requests.
468
* :samp:`{protocol}_response` --- signal that the handler knows how to
469
post-process *protocol* responses.
472
.. method:: OpenerDirector.open(url[, data][, timeout])
474
Open the given *url* (which can be a request object or a string), optionally
475
passing the given *data*. Arguments, return values and exceptions raised are
476
the same as those of :func:`urlopen` (which simply calls the :meth:`open`
477
method on the currently installed global :class:`OpenerDirector`). The
478
optional *timeout* parameter specifies a timeout in seconds for blocking
479
operations like the connection attempt (if not specified, the global default
480
timeout setting will be used). The timeout feature actually works only for
481
HTTP, HTTPS and FTP connections).
483
.. versionchanged:: 2.6
487
.. method:: OpenerDirector.error(proto[, arg[, ...]])
489
Handle an error of the given protocol. This will call the registered error
490
handlers for the given protocol with the given arguments (which are protocol
491
specific). The HTTP protocol is a special case which uses the HTTP response
492
code to determine the specific error handler; refer to the :meth:`http_error_\*`
493
methods of the handler classes.
495
Return values and exceptions raised are the same as those of :func:`urlopen`.
497
OpenerDirector objects open URLs in three stages:
499
The order in which these methods are called within each stage is determined by
500
sorting the handler instances.
502
#. Every handler with a method named like :samp:`{protocol}_request` has that
503
method called to pre-process the request.
505
#. Handlers with a method named like :samp:`{protocol}_open` are called to handle
506
the request. This stage ends when a handler either returns a non-\ :const:`None`
507
value (ie. a response), or raises an exception (usually :exc:`URLError`).
508
Exceptions are allowed to propagate.
510
In fact, the above algorithm is first tried for methods named
511
:meth:`default_open`. If all such methods return :const:`None`, the
512
algorithm is repeated for methods named like :samp:`{protocol}_open`. If all
513
such methods return :const:`None`, the algorithm is repeated for methods
514
named :meth:`unknown_open`.
516
Note that the implementation of these methods may involve calls of the parent
517
:class:`OpenerDirector` instance's :meth:`~OpenerDirector.open` and
518
:meth:`~OpenerDirector.error` methods.
520
#. Every handler with a method named like :samp:`{protocol}_response` has that
521
method called to post-process the response.
524
.. _base-handler-objects:
529
:class:`BaseHandler` objects provide a couple of methods that are directly
530
useful, and others that are meant to be used by derived classes. These are
531
intended for direct use:
534
.. method:: BaseHandler.add_parent(director)
536
Add a director as parent.
539
.. method:: BaseHandler.close()
543
The following attributes and methods should only be used by classes derived from
544
:class:`BaseHandler`.
548
The convention has been adopted that subclasses defining
549
:meth:`protocol_request` or :meth:`protocol_response` methods are named
550
:class:`\*Processor`; all others are named :class:`\*Handler`.
553
.. attribute:: BaseHandler.parent
555
A valid :class:`OpenerDirector`, which can be used to open using a different
556
protocol, or handle errors.
559
.. method:: BaseHandler.default_open(req)
561
This method is *not* defined in :class:`BaseHandler`, but subclasses should
562
define it if they want to catch all URLs.
564
This method, if implemented, will be called by the parent
565
:class:`OpenerDirector`. It should return a file-like object as described in
566
the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
567
It should raise :exc:`URLError`, unless a truly exceptional thing happens (for
568
example, :exc:`MemoryError` should not be mapped to :exc:`URLError`).
570
This method will be called before any protocol-specific open method.
573
.. method:: BaseHandler.protocol_open(req)
576
("protocol" is to be replaced by the protocol name.)
578
This method is *not* defined in :class:`BaseHandler`, but subclasses should
579
define it if they want to handle URLs with the given *protocol*.
581
This method, if defined, will be called by the parent :class:`OpenerDirector`.
582
Return values should be the same as for :meth:`default_open`.
585
.. method:: BaseHandler.unknown_open(req)
587
This method is *not* defined in :class:`BaseHandler`, but subclasses should
588
define it if they want to catch all URLs with no specific registered handler to
591
This method, if implemented, will be called by the :attr:`parent`
592
:class:`OpenerDirector`. Return values should be the same as for
593
:meth:`default_open`.
596
.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
598
This method is *not* defined in :class:`BaseHandler`, but subclasses should
599
override it if they intend to provide a catch-all for otherwise unhandled HTTP
600
errors. It will be called automatically by the :class:`OpenerDirector` getting
601
the error, and should not normally be called in other circumstances.
603
*req* will be a :class:`Request` object, *fp* will be a file-like object with
604
the HTTP error body, *code* will be the three-digit code of the error, *msg*
605
will be the user-visible explanation of the code and *hdrs* will be a mapping
606
object with the headers of the error.
608
Return values and exceptions raised should be the same as those of
612
.. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
614
*nnn* should be a three-digit HTTP error code. This method is also not defined
615
in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
616
subclass, when an HTTP error with code *nnn* occurs.
618
Subclasses should override this method to handle specific HTTP errors.
620
Arguments, return values and exceptions raised should be the same as for
621
:meth:`http_error_default`.
624
.. method:: BaseHandler.protocol_request(req)
627
("protocol" is to be replaced by the protocol name.)
629
This method is *not* defined in :class:`BaseHandler`, but subclasses should
630
define it if they want to pre-process requests of the given *protocol*.
632
This method, if defined, will be called by the parent :class:`OpenerDirector`.
633
*req* will be a :class:`Request` object. The return value should be a
634
:class:`Request` object.
637
.. method:: BaseHandler.protocol_response(req, response)
640
("protocol" is to be replaced by the protocol name.)
642
This method is *not* defined in :class:`BaseHandler`, but subclasses should
643
define it if they want to post-process responses of the given *protocol*.
645
This method, if defined, will be called by the parent :class:`OpenerDirector`.
646
*req* will be a :class:`Request` object. *response* will be an object
647
implementing the same interface as the return value of :func:`urlopen`. The
648
return value should implement the same interface as the return value of
652
.. _http-redirect-handler:
654
HTTPRedirectHandler Objects
655
---------------------------
659
Some HTTP redirections require action from this module's client code. If this
660
is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the
661
precise meanings of the various redirection codes.
664
.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl)
666
Return a :class:`Request` or ``None`` in response to a redirect. This is called
667
by the default implementations of the :meth:`http_error_30\*` methods when a
668
redirection is received from the server. If a redirection should take place,
669
return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
670
redirect to *newurl*. Otherwise, raise :exc:`HTTPError` if no other handler
671
should try to handle this URL, or return ``None`` if you can't but another
676
The default implementation of this method does not strictly follow :rfc:`2616`,
677
which says that 301 and 302 responses to ``POST`` requests must not be
678
automatically redirected without confirmation by the user. In reality, browsers
679
do allow automatic redirection of these responses, changing the POST to a
680
``GET``, and the default implementation reproduces this behavior.
683
.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
685
Redirect to the ``Location:`` or ``URI:`` URL. This method is called by the
686
parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
689
.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
691
The same as :meth:`http_error_301`, but called for the 'found' response.
694
.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
696
The same as :meth:`http_error_301`, but called for the 'see other' response.
699
.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
701
The same as :meth:`http_error_301`, but called for the 'temporary redirect'
705
.. _http-cookie-processor:
707
HTTPCookieProcessor Objects
708
---------------------------
710
.. versionadded:: 2.4
712
:class:`HTTPCookieProcessor` instances have one attribute:
715
.. attribute:: HTTPCookieProcessor.cookiejar
717
The :class:`cookielib.CookieJar` in which cookies are stored.
726
.. method:: ProxyHandler.protocol_open(request)
729
("protocol" is to be replaced by the protocol name.)
731
The :class:`ProxyHandler` will have a method :samp:`{protocol}_open` for every
732
*protocol* which has a proxy in the *proxies* dictionary given in the
733
constructor. The method will modify requests to go through the proxy, by
734
calling ``request.set_proxy()``, and call the next handler in the chain to
735
actually execute the protocol.
738
.. _http-password-mgr:
740
HTTPPasswordMgr Objects
741
-----------------------
743
These methods are available on :class:`HTTPPasswordMgr` and
744
:class:`HTTPPasswordMgrWithDefaultRealm` objects.
747
.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
749
*uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
750
*passwd* must be strings. This causes ``(user, passwd)`` to be used as
751
authentication tokens when authentication for *realm* and a super-URI of any of
752
the given URIs is given.
755
.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
757
Get user/password for given realm and URI, if any. This method will return
758
``(None, None)`` if there is no matching user/password.
760
For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
761
searched if the given *realm* has no matching user/password.
764
.. _abstract-basic-auth-handler:
766
AbstractBasicAuthHandler Objects
767
--------------------------------
770
.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
772
Handle an authentication request by getting a user/password pair, and re-trying
773
the request. *authreq* should be the name of the header where the information
774
about the realm is included in the request, *host* specifies the URL and path to
775
authenticate for, *req* should be the (failed) :class:`Request` object, and
776
*headers* should be the error headers.
778
*host* is either an authority (e.g. ``"python.org"``) or a URL containing an
779
authority component (e.g. ``"http://python.org/"``). In either case, the
780
authority must not contain a userinfo component (so, ``"python.org"`` and
781
``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
784
.. _http-basic-auth-handler:
786
HTTPBasicAuthHandler Objects
787
----------------------------
790
.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
792
Retry the request with authentication information, if available.
795
.. _proxy-basic-auth-handler:
797
ProxyBasicAuthHandler Objects
798
-----------------------------
801
.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
803
Retry the request with authentication information, if available.
806
.. _abstract-digest-auth-handler:
808
AbstractDigestAuthHandler Objects
809
---------------------------------
812
.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
814
*authreq* should be the name of the header where the information about the realm
815
is included in the request, *host* should be the host to authenticate to, *req*
816
should be the (failed) :class:`Request` object, and *headers* should be the
820
.. _http-digest-auth-handler:
822
HTTPDigestAuthHandler Objects
823
-----------------------------
826
.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
828
Retry the request with authentication information, if available.
831
.. _proxy-digest-auth-handler:
833
ProxyDigestAuthHandler Objects
834
------------------------------
837
.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
839
Retry the request with authentication information, if available.
842
.. _http-handler-objects:
848
.. method:: HTTPHandler.http_open(req)
850
Send an HTTP request, which can be either GET or POST, depending on
854
.. _https-handler-objects:
860
.. method:: HTTPSHandler.https_open(req)
862
Send an HTTPS request, which can be either GET or POST, depending on
866
.. _file-handler-objects:
872
.. method:: FileHandler.file_open(req)
874
Open the file locally, if there is no host name, or the host name is
875
``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it
876
using :attr:`parent`.
879
.. _ftp-handler-objects:
885
.. method:: FTPHandler.ftp_open(req)
887
Open the FTP file indicated by *req*. The login is always done with empty
888
username and password.
891
.. _cacheftp-handler-objects:
893
CacheFTPHandler Objects
894
-----------------------
896
:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
897
following additional methods:
900
.. method:: CacheFTPHandler.setTimeout(t)
902
Set timeout of connections to *t* seconds.
905
.. method:: CacheFTPHandler.setMaxConns(m)
907
Set maximum number of cached connections to *m*.
910
.. _unknown-handler-objects:
912
UnknownHandler Objects
913
----------------------
916
.. method:: UnknownHandler.unknown_open()
918
Raise a :exc:`URLError` exception.
921
.. _http-error-processor-objects:
923
HTTPErrorProcessor Objects
924
--------------------------
926
.. versionadded:: 2.4
929
.. method:: HTTPErrorProcessor.http_response()
931
Process HTTP error responses.
933
For 200 error codes, the response object is returned immediately.
935
For non-200 error codes, this simply passes the job on to the
936
:samp:`{protocol}_error_code` handler methods, via
937
:meth:`OpenerDirector.error`. Eventually,
938
:class:`urllib2.HTTPDefaultErrorHandler` will raise an :exc:`HTTPError` if no
939
other handler handles the error.
941
.. method:: HTTPErrorProcessor.https_response()
943
Process HTTPS error responses.
945
The behavior is same as :meth:`http_response`.
948
.. _urllib2-examples:
953
This example gets the python.org main page and displays the first 100 bytes of
957
>>> f = urllib2.urlopen('http://www.python.org/')
958
>>> print f.read(100)
959
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
960
<?xml-stylesheet href="./css/ht2html
962
Here we are sending a data-stream to the stdin of a CGI and reading the data it
963
returns to us. Note that this example will only work when the Python
964
installation supports SSL. ::
967
>>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi',
968
... data='This data is passed to stdin of the CGI')
969
>>> f = urllib2.urlopen(req)
971
Got Data: "This data is passed to stdin of the CGI"
973
The code for the sample CGI used in the above example is::
975
#!/usr/bin/env python
977
data = sys.stdin.read()
978
print 'Content-type: text-plain\n\nGot Data: "%s"' % data
980
Use of Basic HTTP Authentication::
983
# Create an OpenerDirector with support for Basic HTTP Authentication...
984
auth_handler = urllib2.HTTPBasicAuthHandler()
985
auth_handler.add_password(realm='PDQ Application',
986
uri='https://mahler:8092/site-updates.py',
988
passwd='kadidd!ehopper')
989
opener = urllib2.build_opener(auth_handler)
990
# ...and install it globally so it can be used with urlopen.
991
urllib2.install_opener(opener)
992
urllib2.urlopen('http://www.example.com/login.html')
994
:func:`build_opener` provides many handlers by default, including a
995
:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
996
variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
997
involved. For example, the :envvar:`http_proxy` environment variable is read to
998
obtain the HTTP proxy's URL.
1000
This example replaces the default :class:`ProxyHandler` with one that uses
1001
programmatically-supplied proxy URLs, and adds proxy authorization support with
1002
:class:`ProxyBasicAuthHandler`. ::
1004
proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'})
1005
proxy_auth_handler = urllib2.ProxyBasicAuthHandler()
1006
proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
1008
opener = urllib2.build_opener(proxy_handler, proxy_auth_handler)
1009
# This time, rather than install the OpenerDirector, we use it directly:
1010
opener.open('http://www.example.com/login.html')
1012
Adding HTTP headers:
1014
Use the *headers* argument to the :class:`Request` constructor, or::
1017
req = urllib2.Request('http://www.example.com/')
1018
req.add_header('Referer', 'http://www.python.org/')
1019
r = urllib2.urlopen(req)
1021
:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
1022
every :class:`Request`. To change this::
1025
opener = urllib2.build_opener()
1026
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
1027
opener.open('http://www.example.com/')
1029
Also, remember that a few standard headers (:mailheader:`Content-Length`,
1030
:mailheader:`Content-Type` and :mailheader:`Host`) are added when the
1031
:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).