1
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
2
"http://www.w3.org/TR/html4/strict.dtd">
5
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
6
<meta name="author" content="John J. Lee <jjl@@pobox.com>">
7
<meta name="date" content="2004-01">
8
<meta name="keywords" content="cookie,HTTP,Python,web,client,client-side,HTML,META,HTTP-EQUIV,Refresh">
9
<title>ClientCookie documentation</title>
10
<style type="text/css" media="screen">@@import "../styles/style.css";</style>
11
<style type="text/css" media="screen">@@import "../styles/cookie_style.css";</style>
12
<base href="http://wwwsearch.sourceforge.net/ClientCookie/">
16
@# This file is processed by EmPy to colorize Python source code
17
@# http://wwwsearch.sf.net/bits/colorize.py
18
@{from colorize import colorize}
20
<div id="sf"><a href="http://sourceforge.net">
21
<img src="http://sourceforge.net/sflogo.php?group_id=48205&type=2"
22
width="125" height="37" alt="SourceForge.net Logo"></a></div>
28
<p><em><strong>Note: this page describes the stable 0.4.x version. See <a
29
href="./src/doc-0_9.html">here</a> for the 0.9.x development version.
32
<a name="examples"></a>
37
response = ClientCookie.urlopen("http://foo.bar.com/")
40
<p>This function behaves identically to <code>urllib2.urlopen()</code>, except
41
that it deals with cookies automatically. That's probably all you need to
44
<p>Here is a more complicated example, involving <code>Request</code> objects
45
(useful if you want to pass <code>Request</code>s around, add headers to them,
51
request = urllib2.Request("http://www.acme.com/")
52
# note we're using the urlopen from ClientCookie, not urllib2
53
response = ClientCookie.urlopen(request)
54
# let's say this next request requires a cookie that was set in response
55
request2 = urllib2.Request("http://www.acme.com/flying_machines.html")
56
response2 = ClientCookie.urlopen(request2)
58
print response2.geturl()
59
print response2.info() # headers
60
print response2.read() # body (readline and readlines work too)
63
<p>In these examples, the workings are hidden inside the
64
<code>ClientCookie.urlopen()</code> function, which is an extension of
65
<code>urllib2.urlopen()</code>. Redirects, proxies and cookies are handled
66
automatically by this function. Cookie processing (etc.) is handled by
67
processor objects, which are an extension of <code>urllib2</code>'s handlers:
68
<code>HTTPCookieProcessor</code>, <code>HTTPRefererProcessor</code>,
69
<code>SeekableProcessor</code> etc. They are used like any other handler.
70
Processor-aware versions of <code>HTTPHandler</code> and
71
<code>HTTPSHandler</code> (if your Python installation has HTTPS support) are
72
also included, along with a bugfixed <code>HTTPRedirectHandler</code> is also
73
included (the bug, related to redirection, is fixed in 2.3).
75
<p>An example at a slightly lower level shows how the module processes
79
# Don't copy this blindly! You probably want to follow the examples
80
# above, not this one.
83
# Build an opener that *doesn't* automatically call .add_cookie_header()
84
# and .extract_cookies(), so we can do it manually without interference.
85
class NullCookieProcessor(ClientCookie.HTTPCookieProcessor):
86
def http_request(self, request): return request
87
def http_response(self, request, response): return response
88
opener = ClientCookie.build_opener(NullCookieProcessor)
90
request = ClientCookie.Request("http://www.acme.com/")
91
response = opener.open(request)
92
cj = ClientCookie.CookieJar()
93
cj.extract_cookies(response, request)
94
# let's say this next request requires a cookie that was set in response
95
request2 = ClientCookie.Request("http://www.acme.com/flying_machines.html")
96
cj.add_cookie_header(request2)
97
response2 = opener.open(request2)
100
<p>The <code>CookieJar</code> class does all the work. There are essentially
101
two operations: <code>.extract_cookies()</code> extracts HTTP cookies from
102
<code>Set-Cookie</code> (the original <a
103
href="http://www.netscape.com/newsref/std/cookie_spec.html">Netscape cookie
104
standard</a>) and <code>Set-Cookie2</code> (<a
105
href="http://www.ietf.org/rfc/rfc2965.txt">RFC 2965</a>) headers from a
106
response if and only if they should be set given the request, and
107
<code>.add_cookie_header()</code> adds <code>Cookie</code> headers if and only
108
if they are appropriate for a particular HTTP request. Incoming cookies are
109
checked for acceptability based on the host name, etc. Cookies are only set on
110
outgoing requests if they match the request's host name, path, etc.
112
<p><strong>Note that if you're using <code>ClientCookie.urlopen()</code> (or if
113
you're using <code>ClientCookie.HTTPCookieProcessor</code> by some other
114
means), you don't need to call <code>.extract_cookies()</code> or
115
<code>.add_cookie()</code> header yourself</strong>. If, on the other hand,
116
you don't want to use <code>urllib2</code>, you will need to use this pair of
117
methods. You can make your own <code>request</code> and <code>response</code>
118
objects, which must support the interfaces described in the docstrings of
119
<code>.extract_cookies()</code> and <code>.add_cookie_header()</code>.
121
<p>Cookies may be saved to and loaded from a file. The subclass
122
<code>MozillaCookieJar</code> differs from <code>CookieJar</code> only in
123
storing cookies using a different, Mozilla/Netscape/lynx-compatible, file
124
format. This Mozilla-compatible (<code>'cookies.txt'</code>) format loses some
125
information when you save cookies to a file. Note that lynx also uses the
126
Mozilla file format. The subclass <code>MSIECookieJar</code> can load (but not
127
save, yet) from Microsoft Internet Explorer's cookie files (on Windows).
129
<h2>Important note</h2>
131
<p>Only use names you can import directly from the <code>ClientCookie</code>
132
package, and that don't start with a single underscore. Everything else is
133
subject to change or disappearance without notice.
135
<a name="browsers"></a>
136
<h2>Cooperating with Mozilla/Netscape, lynx and Internet Explorer</h2>
138
<p>The subclass <code>MozillaCookieJar</code> differs from
139
<code>CookieJar</code> only in storing cookies using a different,
140
Mozilla/Netscape-compatible, file format. The lynx browser also uses this
141
format. This file format can't store RFC 2965 cookies, so they are downgraded
142
to Netscape cookies on saving. <code>CookieJar</code> itself uses a
143
libwww-perl specific format (`Set-Cookie3'). Python and your browser should be
144
able to share a cookies file (note that the file location here will differ on
147
<p><strong>WARNING:</strong> you may want to backup your browser's cookies file
148
if you use <code>MozillaCookieJar</code> to save cookies. I <em>think</em> it
149
works, but there have been bugs in the past!
152
import os, ClientCookie
153
cookies = ClientCookie.MozillaCookieJar()
154
cookies.load(os.path.join(os.environ["HOME"], "/.netscape/cookies.txt"))
155
# see also the save and revert methods
158
<p>Note that cookies saved while Mozilla is running will get clobbered by
159
Mozilla - see <code>MozillaCookieJar.__doc__</code>.
161
<p><code>MSIECookieJar</code> does the same for Microsoft Internet Explorer
162
(MSIE) 5.x and 6.x on Windows, but does not allow saving cookies in this
163
format. In future, the Windows API calls might be used to load and save
164
(though the index has to be read directly, since there is no API for that,
169
c = ClientCookie.MSIECookieJar(delayload=True)
170
c.load_from_registry() # finds cookie index file from registry
173
<p>A true <code>delayload</code> argument speeds things up.
175
<p>On Windows 9x (win 95, win 98, win ME), you need to supply a username to the
176
<code>.load_from_registry()</code> method:
179
c.load_from_registry(username="jbloggs")
182
<p>Konqueror/Safari and Opera use different file formats, which aren't yet
185
<a name="cookiejar"></a>
186
<h2>Using your own CookieJar instance</h2>
188
<p>You might want to do this to <a href="./doc.html#browsers">use your
189
browser's cookies</a>, to customize <code>CookieJar</code>'s behaviour by
190
passing constructor arguments, or to be able to get at the cookies it will hold
191
(for example, for saving cookies between sessions and for debugging).
193
<p>If you're using the higher-level <code>urllib2</code>-like interface
194
(<code>urlopen()</code>, etc), you'll have to let it know what
195
<code>CookieJar</code> it should use:
199
cookies = ClientCookie.CookieJar()
200
# build_opener() adds standard handlers and processors (such as HTTPHandler
201
# and HTTPCookieProcessor) by default. The cookie processor we supply
202
# will replace the default one.
203
opener = ClientCookie.build_opener(ClientCookie.HTTPCookieProcessor(cookies))
205
r = opener.open("http://acme.com/") # GET
206
r = opener.open("http://acme.com/", data) # POST
209
<p>The <code>urlopen()</code> function uses a global <code>OpenerDirector</code>
210
instance to do its work, so if you want to use <code>urlopen()</code> with your
211
own <code>CookieJar</code>, install the <code>OpenerDirector</code> you built
212
with <code>build_opener()</code> using the
213
<code>ClientCookie.install_opener()</code> function, then proceed as usual:
216
ClientCookie.install_opener(opener)
217
r = ClientCookie.urlopen("http://www.acme.com/")
220
<p>Of course, everyone using <code>urlopen()</code> is using the same global
221
<code>CookieJar</code> instance!
223
<a name="policy"></a>
225
<p>You can set a policy object (must satisfy the interface defined by
226
<code>ClientCookie.CookiePolicy</code>), which determines which cookies
227
are allowed to be set and returned. Use the policy argument to the
228
<code>CookieJar</code> constructor, or just set the policy attribute
229
directly. The default implementation has some useful switches:
232
from ClientCookie import CookieJar, DefaultCookiePolicy as Policy
233
cookies = CookieJar()
234
# turn off RFC 2965 cookies, be more strict about domains when setting and
235
# returning Netscape cookies, and block some domains from setting cookies
236
# or having them returned (read the DefaultCookiePolicy docstring for the
237
# domain matching rules here)
238
policy = Policy(rfc2965=False, strict_ns_domain=Policy.DomainStrict,
239
blocked_domains=["ads.net", ".ads.net"])
240
cookies.policy = policy
244
<a name="goodies"></a>
245
<h2>Optional goodies: HTTP-EQUIV, Refresh, Referer and seekable responses</h2>
247
<p>These are implemented as processor classes. Processors are an extension of
248
<code>urllib2</code>'s handlers: you just pass them to
249
<code>build_opener()</code> (example code below).
253
<dt><code>HTTPRobotRulesProcessor</code>
255
<dd><p>WWW Robots (also called wanderers or spiders) are programs that traverse
256
many pages in the World Wide Web by recursively retrieving linked pages. This
257
kind of program can place significant loads on web servers, so there is a <a
258
href="http://www.robotstxt.org/wc/norobots.html">standard</a> for a <code>
259
robots.txt</code> file by which web site operators can request robots to keep
260
out of their site, or out of particular areas of it. This processor uses the
261
standard Python library's <code>robotparser</code> module. It raises
262
<code>ClientCookie.RobotExclusionError</code> (subclass of
263
<code>urllib2.HTTPError</code>) if an attempt is made to open a URL prohibited
264
by <code>robots.txt</code>. XXX ATM, this makes use of code in the
265
<code>robotparser</code> module that uses <code>urllib</code> - this will
266
likely change in future to use <code>urllib2</code>.
268
<dt><code>HTTPEquivProcessor</code>
270
<dd><p>The <code><META HTTP-EQUIV></code> tag is a way of including data
271
in HTML to be treated as if it were part of the HTTP headers. ClientCookie can
272
automatically read these tags and add the <code>HTTP-EQUIV</code> headers to
273
the response object's real HTTP headers. The HTML is left unchanged.
275
<dt><code>HTTPRefreshProcessor</code>
277
<dd><p>The <code>Refresh</code> HTTP header is a non-standard header which is
278
widely used. It requests that the user-agent follow a URL after a specified
279
time delay. ClientCookie can treat these headers (which may have been set in
280
<code><META HTTP-EQUIV></code> tags) as if they were 302 redirections.
281
Exactly when and how <code>Refresh</code> headers are handled is configurable
282
using the constructor arguments.
284
<dt><code>SeekableProcessor</code>
286
<dd><p>This makes ClientCookie's response objects <code>seek()</code>able.
287
Seeking is done lazily (ie. the response object only reads from the socket as
288
necessary, rather than slurping in all the data before the response is returned
289
to you). XXX only works for HTTP ATM, I think
291
<dt><code>HTTPRefererProcessor</code>
293
<dd><p>The <code>Referer</code> HTTP header lets the server know which URL
294
you've just visited. Some servers use this header as state information, and
295
don't like it if this is not present. It's a chore to add this header by hand
296
every time you make a request. This adds it automatically.
297
<strong>NOTE</strong>: this only makes sense if you use each processor for a
298
single chain of HTTP requests (so, for example, if you use a single
299
HTTPRefererProcessor to fetch a series of URLs extracted from a single page,
300
<strong>this will break</strong>). The <a href="../mechanize/">mechanize</a>
301
package does this properly.
306
cookies = ClientCookie.CookieJar()
308
opener = ClientCookie.build_opener(ClientCookie.HTTPRefererProcessor,
309
ClientCookie.HTTPEquivProcessor,
310
ClientCookie.HTTPRefreshProcessor,
311
ClientCookie.SeekableProcessor)
312
opener.open("http://www.rhubarb.com/")
318
<a name="requests"></a>
319
<h2>Confusing fact about headers and Requests</h2>
321
ClientCookie automatically upgrades <code>urllib2.Request</code> objects to
322
<code>ClientCookie.Request</code>. This means that you won't see any headers
323
that are added to Request objects by handlers unless you use
324
<code>ClientCookie.Request</code> in the first place. Sorry about that.
327
<a name="headers"></a>
328
<h2>Adding headers</h2>
330
<p>Adding headers is done like so:
333
import ClientCookie, urllib2
334
req = urllib2.Request("http://foobar.com/")
335
req.add_header("Referer", "http://wwwsearch.sourceforge.net/ClientCookie/")
336
r = ClientCookie.urlopen(req)
339
<p>You can also use the headers argument to the <code>urllib2.Request</code>
342
<p><code>urllib2</code> (in fact, ClientCookie takes over this task from
343
<code>urllib2</code>) adds some headers to <code>Request</code> objects
344
automatically - see the next section for details.
347
<h2>Changing the automatically-added headers (User-Agent)</h2>
349
<p><code>OpenerDirector</code> automatically adds a <code>User-Agent</code>
350
header to every <code>Request</code>.
352
<p>To change this and/or add similar headers, use your own
353
<code>OpenerDirector</code>:
357
cookies = ClientCookie.CookieJar()
358
opener = ClientCookie.build_opener(ClientCookie.HTTPCookieProcessor(cookies))
359
opener.addheaders = [("User-agent", "Mozilla/5.0 (compatible; MyProgram/0.1)"),
360
("From", "responsible.person@example.com")]
363
<p>Again, to use <code>urlopen()</code>, install your
364
<code>OpenerDirector</code> globally:
367
ClientCookie.install_opener(opener)
368
r = ClientCookie.urlopen("http://acme.com/")
371
<p>Also, a few standard headers (<code>Content-Length</code>,
372
<code>Content-Type</code> and <code>Host</code>) are added when the
373
<code>Request</code> is passed to <code>urlopen()</code> (or
374
<code>OpenerDirector.open()</code>). ClientCookie explictly adds these (and
375
<code>User-Agent</code>) to the <code>Request</code> object, unlike
376
<code>urllib2</code>. You shouldn't need to change these headers, but since
377
this is done by <code>AbstractHTTPHandler</code>, you can change the way it
378
works by passing a subclass of that handler to <code>build_opener()</code>.
381
<a name="unverifiable"></a>
382
<h2>Initiating unverifiable transactions</h2>
384
<p>ClientCookie knows that redirected transactions are unverifiable, so it'll
385
handle that on its own.
387
<p>If you want to initiate an unverifiable transaction yourself (which you
388
should if, for example, you're downloading the images from a page, and 'the
389
user' hasn't explicitly OKed those URLs), you need to set a true
390
<code>request.unverifiable</code> on your <code>Request</code> instance, and
391
also set <code>request.origin_req_host</code> to the request-host of the origin
392
transaction (eg. the URL of the page containing the images). If
393
<code>unverifiable</code> is present and true, but <code>origin_req_host</code>
394
is not present, you'll get an <code>AttributeError</code>. XXX None of this is
398
<a name="debugging"></a>
401
<!--XXX move as much as poss. to General page-->
403
<p>First, a few common problems. The most frequent mistake people seem to make
404
is to use <code>ClientCookie.urlopen()</code>, <em>and</em> the
405
<code>.extract_cookies()</code> and <code>.add_cookie_header()</code> methods
406
on a cookie object themselves. If you use <code>ClientCookie.urlopen()</code>
407
(or <code>OpenerDirector.open()</code>), the module handles extraction and
408
adding of cookies by itself, so you should not call
409
<code>.extract_cookies()</code> or <code>.add_cookie_header()</code>.
411
<p>If things don't seem to be working as expected, the first thing to try is to
412
<a href="./doc.html#policy">switch off</a> RFC 2965 handling. This is because
413
few browsers implement it, so it is likely that some servers incorrectly
416
<p>Are you sure the server is sending you any cookies in the first place?
417
Maybe the server is keeping track of state in some other way
418
(<code>HIDDEN</code> HTML form entries (possibly in a separate page referenced
419
by a frame), URL-encoded session keys, IP address, HTTP <code>Referer</code>
420
headers)? Perhaps some embedded script in the HTML is setting cookies (see
421
below)? Maybe you messed up your request, and the server is sending you some
422
standard failure page (even if the page doesn't appear to indicate any
423
failure). Sometimes, a server wants particular headers set to the values it
424
expects, or it won't play nicely. The most frequent offenders here are the
425
<code>Referer</code> [<em>sic</em>] and / or <code>User-Agent</code> HTTP
426
headers (<a href="./doc.html#headers">see above</a> for how to set these). The
427
<code>User-Agent</code> header may need to be set to a value like that of a
428
popular browser. The <code>Referer</code> header may need to be set to the URL
429
that the server expects you to have followed a link from. Occasionally, it may
430
even be that operators deliberately configure a server to insist on precisely
431
the headers that the popular browsers (MS Internet Explorer, Mozilla/Netscape,
432
Opera, Konqueror/Safari) generate, but remember that incompetence (possibly on
433
your part) is more probable than deliberate sabotage (and if a site owner is
434
that keen to stop robots, you probably shouldn't be scraping it anyway).
436
<p>When you <code>.save()</code> to or
437
<code>.load()</code>/<code>.revert()</code> from a file, single-session cookies
438
will expire unless you explicitly request otherwise with the
439
<code>ignore_discard</code> argument. This may be your problem if you find
440
cookies are going away after saving and loading.
444
cookies = ClientCookie.CookieJar()
445
opener = ClientCookie.build_opener(ClientCookie.HTTPCookieProcessor(cookies))
446
ClientCookie.install_opener(opener)
447
r = ClientCookie.urlopen("http://foobar.com/")
448
cookies.save("/some/file", ignore_discard=True, ignore_expires=True)
451
<p>If none of the advice above solves your problem quickly, try comparing the
452
headers and data that you are sending out with those that a browser emits.
453
Often this will give you the clue you need. Of course, you'll want to check
454
that the browser is able to do manually what you're trying to achieve
455
programatically before minutely examining the headers. Make sure that what you
456
do manually is <em>exactly</em> the same as what you're trying to do from
457
Python - you may simply be hitting a server bug that only gets revealed if you
458
view pages in a particular order, for example. In order to see what your
459
browser is sending to the server (even if HTTPS is in use), see <a
460
href="../clientx.html">the General FAQ page</a>. If nothing is obviously wrong
461
with the requests your program is sending and you're out of ideas, you can try
462
the last resort of good old brute force binary-search debugging. Temporarily
463
switch to sending HTTP headers (with <code>httplib</code>). Start by copying
464
Netscape/Mozilla or IE slavishly (apart from session IDs, etc., of course),
465
then begin the tedious process of mutating your headers and data until they
466
match what your higher-level code was sending. This will at least reliably
469
<p>You can turn on display of HTTP headers:
473
hh = ClientCookie.HTTPHandler() # you might want HTTPSHandler, too
474
hh.set_http_debuglevel(1)
475
opener = ClientCookie.build_opener(hh)
476
response = opener.open(url)
479
<p>Alternatively, you can examine your individual request and response objects
480
to see what's going on. ClientCookie's responses can be made
481
<code>.seek()</code>able using <code>SeekableProcessor</code>. It's often
482
useful to use the <code>.seek()</code> method like this during debugging:
486
response = ClientCookie.urlopen("http://spam.eggs.org/")
487
print response.read()
489
# rest of code continues as if you'd never .read() the response
493
<p>Also, note <code>HTTPRedirectDebugProcessor</code> (which prints information
494
about redirections) and <code>HTTPResponseDebugProcessor</code> (which prints
495
out all response bodies, including those that are read during redirections).
496
<strong>NOTE</strong>: as well as having these processors in your
497
<code>OpenerDirector</code> (for example, by passing them to
498
<code>build_opener()</code>) you have to turn on logging at the
499
<cdoe>INFO</code> level or lower in order to see any output.
501
<p>If you would like to see what is going on in ClientCookie's tiny mind, do
506
# ClientCookie.DEBUG covers masses of debugging information,
507
# ClientCookie.INFO just shows the output from HTTPRedirectDebugProcessor,
508
ClientCookie.getLogger("ClientCookie").setLevel(ClientCookie.DEBUG)
511
<p>(In Python 2.3, <code>logging.getLogger</code>, <code>logging.DEBUG</code>,
512
<code>logging.INFO</code> etc. work just as well.)
514
<p>The <code>DEBUG</code> level (as opposed to the <code>INFO</code> level) can
515
actually be quite useful, as it explains why particular cookies are accepted or
516
rejected and why they are or are not returned.
518
<p>One final thing to note is that there are some catch-all bare
519
<code>except:</code> statements in the module, which are there to handle
520
unexpected bad input without crashing your program. If this happens, it's a
521
bug in ClientCookie, so please mail me the warning text.
524
<a name="script"></a>
525
<h2>Embedded script that sets cookies</h2>
527
<p>It is possible to embed script in HTML pages (sandwiched between
528
<code><SCRIPT>here</SCRIPT></code> tags, and in
529
<code>javascript:</code> URLs) - JavaScript / ECMAScript, VBScript, or even
530
Python - that causes cookies to be set in a browser. See the <a
531
href="../bits/clientx.html">General FAQs</a> page for what to do about this.
535
<h2>Parsing HTTP date strings</h2>
537
<p>A function named <code>str2time</code> is provided by the package,
538
which may be useful for parsing dates in HTTP headers.
539
<code>str2time</code> is intended to be liberal, since HTTP date/time
540
formats are poorly standardised in practice. There is no need to use this
541
function in normal operations: <code>CookieJar</code> instances keep track
542
of cookie lifetimes automatically. This function will stay around in some
543
form, though the supported date/time formats may change.
546
<a name="standards"></a>
547
<h2>Note about cookie standards</h2>
549
<p>The various cookie standards and their history form a case study of the
550
terrible things that can happen to a protocol. The long-suffering David
551
Kristol has written a <a
552
href="http://doi.acm.org/10.1145/502152.502153">paper</a> about it, if you
553
want to know the gory details.
555
<p>Here is a summary.
557
<p>The <a href="http://www.netscape.com/newsref/std/cookie_spec.html">Netscape
558
protocol</a> (cookie_spec.html) is still the only standard supported by most
559
browsers (including Internet Explorer and Netscape). Be aware that
560
cookie_spec.html is not, and never was, actually followed to the letter (or
561
anything close) by anyone (including Netscape, IE and ClientCookie): the
562
Netscape protocol standard is really defined by the behaviour of Netscape (and
563
now IE). Netscape cookies are also known as V0 cookies, to distinguish them
564
from RFC 2109 or RFC 2965 cookies, which have a version cookie-attribute with a
567
<p><a href="http://www.ietf.org/rfcs/rfc2109.txt">RFC 2109</a> was introduced
568
to fix some problems identified with the Netscape protocol, while still keeping
569
the same HTTP headers (<code>Cookie</code> and <code>Set-Cookie</code>). The
570
most prominent of these problems is the 'third-party' cookie issue, which was
571
an accidental feature of the Netscape protocol. When one visits www.bland.org,
572
one doesn't expect to get a cookie from www.lurid.com, a site one has never
573
visited. Depending on browser configuration, this can still happen, because
574
the unreconstructed Netscape protocol is happy to accept cookies from, say, an
575
image in a webpage (www.bland.org) that's included by linking to an
576
advertiser's server (www.lurid.com). This kind of event, where your browser
577
talks to a server that you haven't explicitly okayed by some means, is what the
578
RFCs call an 'unverifiable transaction'. In addition to the potential for
579
embarrassment caused by the presence of lurid.com's cookies on one's machine,
580
this may also be used to track your movements on the web, because advertising
581
agencies like doubleclick.net place ads on many sites. RFC 2109 tried to
582
change this by requiring cookies to be turned off during unverifiable
583
transactions with third-party servers - unless the user explicitly asks them to
584
be turned on. This clashed with the business model of advertisers like
585
doubleclick.net, who had started to take advantage of the third-party cookies
586
'bug'. Since the browser vendors were more interested in the advertisers'
587
concerns than those of the browser users, this arguably doomed both RFC 2109
588
and its successor, RFC 2965, from the start. Other problems than the
589
third-party cookie issue were also fixed by 2109. However, even ignoring the
590
advertising issue, 2109 was stillborn, because Internet Explorer and Netscape
591
behaved differently in response to its extended <code>Set-Cookie</code>
592
headers. This was not really RFC 2109's fault: it worked the way it did to
593
keep compatibility with the Netscape protocol as implemented by Netscape.
594
Microsoft Internet Explorer (MSIE) was very new when the standard was designed,
595
but was starting to be very popular when the standard was finalised. XXX P3P,
596
and MSIE & Mozilla options
598
<p>XXX Apparently MSIE implements bits of RFC 2109 - but not very compliant
599
(surprise). Presumably other browsers do too, as a result. ClientCookie
600
already does allow Netscape cookies to have <code>max-age</code> and
601
<code>port</code> cookie-attributes, and as far as I know that's the extent of
602
the support present in MSIE. I haven't tested, though!
604
<p><a href="http://www.ietf.org/rfcs/rfc2965.txt">RFC 2965</a> attempted to fix
605
the compatibility problem by introducing two new headers,
606
<code>Set-Cookie2</code> and <code>Cookie2</code>. Unlike the
607
<code>Cookie</code> header, <code>Cookie2</code> does <em>not</em> carry
608
cookies to the server - rather, it simply advertises to the server that RFC
609
2965 is understood. <code>Set-Cookie2</code> <em>does</em> carry cookies, from
610
server to client: the new header means that both IE and Netscape completely
611
ignore these cookies. This prevents breakage, but introduces a chicken-egg
612
problem that means 2965 may never be widely adopted, especially since Microsoft
613
shows no interest in it. XXX Rumour has it that the European Union is unhappy
614
with P3P, and might introduce legislation that requires something better,
615
forming a gap that RFC 2965 might fill - any truth in this? Opera is the only
616
browser I know of that supports the standard. On the server side, Apache's
617
<code>mod_usertrack</code> supports it. One confusing point to note about RFC
618
2965 is that it uses the same value (1) of the Version attribute in HTTP
619
headers as does RFC 2109.
621
<p>Recently, it was discovered that RFC 2965 does not fully take account of
622
issues arising when 2965 and Netscape cookies coexist, and errata were
623
discussed on the W3C http-state mailing list, but the list traffic has died and
624
it seems RFC 2965 is dead as an internet protocol (but still a useful basis for
625
implementing the de-facto standards, and perhaps as an intranet protocol).
627
<p>Because Netscape cookies are so poorly specified, the general philosophy
628
of the module's Netscape cookie implementation is to start with RFC 2965
629
and open holes where required for Netscape protocol-compatibility. RFC
630
2965 cookies are <em>always</em> treated as RFC 2965 requires, of course!
632
<a name="faq_use"></a>
633
<h2>FAQs - usage</h2>
635
<li>Why don't I have any cookies?
636
<p>Read the <a href="./doc.html#debugging">debugging section</a> of this page.
637
<li>My response claims to be empty, but I know it's not!
638
<p>Did you call <code>response.read()</code> (eg., in a debug statement),
639
then forget that all the data has already been read? In that case, you
640
may want to use <code>SeekableProcessor</code>.
641
<li>How do I download only part of a response body?
642
<p>Just call <code>.read()</code> or <code>.readline()</code> methods on your
643
response object as many times as you need. The <code>seek</code> method
644
(which will only be there if you're using <code>SeekableProcessor</code>)
645
still works, because <code>SeekableProcessor</code>'s response objects
647
<li>What's the difference between the <code>.load()</code> and
648
<code>.revert()</code> methods of <code>CookieJar</code>?
649
<p><code>.load()</code> <emph>appends</emph> cookies from a file.
650
<code>.revert()</code> discards all existing cookies held by the
651
<code>CookieJar</code> first (but it won't lose any existing cookies if
653
<li>Is it threadsafe?
654
<p>I believe so, but it's not been tested yet.
655
<li>How do I do <X>
656
<p>The module docstrings are worth reading if you want to do something
658
<li>What's this "processor" business about? I knew
659
<code>urllib2</code> used "handlers", but not these
660
"processors".
661
<p>See this Python library <a href="http://www.python.org/sf/852995">patch</a>.
662
<li>How do I use it without urllib2.py?
664
from ClientCookie import CookieJar
665
print CookieJar.extract_cookies.__doc__
666
print CookieJar.add_cookie_header.__doc__
670
<p><a href="mailto:jjl@@pobox.com">John J. Lee</a>, January 2004.
678
<a href="..">Home</a><br>
679
<!--<a href=""></a><br>-->
683
<a href="../ClientCookie/">ClientCookie</a><br>
684
<span class="thispage"><span class="subpage">ClientCookie docs</span></span><br>
685
<a href="../ClientForm/">ClientForm</a><br>
686
<a href="../DOMForm/">DOMForm</a><br>
687
<a href="../spidermonkey/">spidermonkey</a><br>
688
<a href="../ClientTable/">ClientTable</a><br>
689
<a href="../mechanize/">mechanize</a><br>
690
<a href="../pullparser/">pullparser</a><br>
691
<a href="../bits/clientx.html">General FAQs</a><br>
692
<a href="../bits/urllib2_152.py">urllib2.py</a><br>
693
<a href="../bits/urllib_152.py">urllib.py</a><br>
697
<a href="./doc.html#examples">Examples</a><br>
698
<a href="./doc.html#browsers">Mozilla & MSIE</a><br>
699
<a href="./doc.html#cookiejar">Using a <code>CookieJar</code></a><br>
700
<a href="./doc.html#goodies">Processors</a><br>
701
<a href="./doc.html#requests">Request confusion</a><br>
702
<a href="./doc.html#headers">Adding headers</a><br>
703
<a href="./doc.html#unverifiable">Verifiability</a><br>
704
<a href="./doc.html#debugging">Debugging</a><br>
705
<a href="./doc.html#script">Embedded scripts</a><br>
706
<a href="./doc.html#dates">HTTP date parsing</a><br>
707
<a href="./doc.html#standards">Standards</a><br>
708
<a href="./doc.html#faq_use">FAQs - usage</a><br>