40
40
commonly used to determine if a redirect was followed
42
42
* :meth:`info` --- return the meta-information of the page, such as headers,
43
in the form of an :class:`http.client.HTTPMessage` instance (see `Quick
44
Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_)
43
in the form of an :func:`email.message_from_string` instance (see
44
`Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_)
46
46
Raises :exc:`URLError` on errors.
126
126
of the data it has downloaded, and just returns it. In this case you just have
127
127
to assume that the download was successful.
132
The public functions :func:`urlopen` and :func:`urlretrieve` create an instance
133
of the :class:`FancyURLopener` class and use it to perform their requested
134
actions. To override this functionality, programmers can create a subclass of
135
:class:`URLopener` or :class:`FancyURLopener`, then assign an instance of that
136
class to the ``urllib._urlopener`` variable before calling the desired function.
137
For example, applications may want to specify a different
138
:mailheader:`User-Agent` header than :class:`URLopener` defines. This can be
139
accomplished with the following code::
141
import urllib.request
143
class AppURLopener(urllib.request.FancyURLopener):
146
urllib._urlopener = AppURLopener()
149
129
.. function:: urlcleanup()
151
131
Clear the cache that may have been built up by previous calls to
624
604
method on the currently installed global :class:`OpenerDirector`). The
625
605
optional *timeout* parameter specifies a timeout in seconds for blocking
626
606
operations like the connection attempt (if not specified, the global default
627
timeout setting will be usedi). The timeout feature actually works only for
607
timeout setting will be used). The timeout feature actually works only for
628
608
HTTP, HTTPS, FTP and FTPS connections).
1075
This example gets the python.org main page and displays the first 100 bytes of
1078
>>> import urllib.request
1079
>>> f = urllib.request.urlopen('http://www.python.org/')
1080
>>> print(f.read(100))
1081
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
1082
<?xml-stylesheet href="./css/ht2html
1084
Here we are sending a data-stream to the stdin of a CGI and reading the data it
1085
returns to us. Note that this example will only work when the Python
1086
installation supports SSL. ::
1055
This example gets the python.org main page and displays the first 300 bytes of
1058
>>> import urllib.request
1059
>>> f = urllib.request.urlopen('http://www.python.org/')
1060
>>> print(f.read(300))
1061
b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1062
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
1063
xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n
1064
<meta http-equiv="content-type" content="text/html; charset=utf-8" />\n
1065
<title>Python Programming '
1067
Note that urlopen returns a bytes object. This is because there is no way
1068
for urlopen to automatically determine the encoding of the byte stream
1069
it receives from the http server. In general, a program will decode
1070
the returned bytes object to string once it determines or guesses
1071
the appropriate encoding.
1073
The following W3C document, http://www.w3.org/International/O-charset , lists
1074
the various ways in which a (X)HTML or a XML document could have specified its
1075
encoding information.
1077
As python.org website uses *utf-8* encoding as specified in it's meta tag, we
1078
will use same for decoding the bytes object. ::
1080
>>> import urllib.request
1081
>>> f = urllib.request.urlopen('http://www.python.org/')
1082
>>> print(f.read(100).decode('utf-8'))
1083
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1084
"http://www.w3.org/TR/xhtml1/DTD/xhtm
1087
In the following example, we are sending a data-stream to the stdin of a CGI
1088
and reading the data it returns to us. Note that this example will only work
1089
when the Python installation supports SSL. ::
1088
1091
>>> import urllib.request
1089
1092
>>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi',
1090
1093
... data='This data is passed to stdin of the CGI')
1091
1094
>>> f = urllib.request.urlopen(req)
1095
>>> print(f.read().decode('utf-8'))
1093
1096
Got Data: "This data is passed to stdin of the CGI"
1095
1098
The code for the sample CGI used in the above example is::
1169
1172
>>> import urllib.parse
1170
1173
>>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
1171
1174
>>> f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
1175
>>> print(f.read().decode('utf-8'))
1174
1177
The following example uses an explicitly specified HTTP proxy, overriding
1175
1178
environment settings::
1178
1181
>>> proxies = {'http': 'http://proxy.example.com:8080/'}
1179
1182
>>> opener = urllib.request.FancyURLopener(proxies)
1180
1183
>>> f = opener.open("http://www.python.org")
1184
>>> f.read().decode('utf-8')
1183
1186
The following example uses no proxies at all, overriding environment settings::
1185
1188
>>> import urllib.request
1186
1189
>>> opener = urllib.request.FancyURLopener({})
1187
1190
>>> f = opener.open("http://www.python.org/")
1191
>>> f.read().decode('utf-8')
1191
1194
:mod:`urllib.request` Restrictions