1
<?xml version="1.0" encoding="ISO-8859-1"?>
2
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
3
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head><!--
4
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
5
This file is generated from xml source: DO NOT EDIT
6
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
8
<title>mod_unique_id - Apache HTTP Server</title>
9
<link href="../style/css/manual.css" rel="stylesheet" media="all" type="text/css" title="Main stylesheet" />
10
<link href="../style/css/manual-loose-100pc.css" rel="alternate stylesheet" media="all" type="text/css" title="No Sidebar - Default font size" />
11
<link href="../style/css/manual-print.css" rel="stylesheet" media="print" type="text/css" />
12
<link href="../images/favicon.ico" rel="shortcut icon" /></head>
14
<div id="page-header">
15
<p class="menu"><a href="../mod/">Modules</a> | <a href="../mod/directives.html">Directives</a> | <a href="../faq/">FAQ</a> | <a href="../glossary.html">Glossary</a> | <a href="../sitemap.html">Sitemap</a></p>
16
<p class="apache">Apache HTTP Server Version 2.2</p>
17
<img alt="" src="../images/feather.gif" /></div>
18
<div class="up"><a href="./"><img title="<-" alt="<-" src="../images/left.gif" /></a></div>
20
<a href="http://www.apache.org/">Apache</a> > <a href="http://httpd.apache.org/">HTTP Server</a> > <a href="http://httpd.apache.org/docs/">Documentation</a> > <a href="../">Version 2.2</a> > <a href="./">Modules</a></div>
21
<div id="page-content">
22
<div id="preamble"><h1>Apache Module mod_unique_id</h1>
24
<p><span>Available Languages: </span><a href="../en/mod/mod_unique_id.html" title="English"> en </a> |
25
<a href="../ja/mod/mod_unique_id.html" hreflang="ja" rel="alternate" title="Japanese"> ja </a> |
26
<a href="../ko/mod/mod_unique_id.html" hreflang="ko" rel="alternate" title="Korean"> ko </a></p>
28
<table class="module"><tr><th><a href="module-dict.html#Description">Description:</a></th><td>Provides an environment variable with a unique
29
identifier for each request</td></tr>
30
<tr><th><a href="module-dict.html#Status">Status:</a></th><td>Extension</td></tr>
31
<tr><th><a href="module-dict.html#ModuleIdentifier">Module�Identifier:</a></th><td>unique_id_module</td></tr>
32
<tr><th><a href="module-dict.html#SourceFile">Source�File:</a></th><td>mod_unique_id.c</td></tr></table>
36
<p>This module provides a magic token for each request which is
37
guaranteed to be unique across "all" requests under very
38
specific conditions. The unique identifier is even unique
39
across multiple machines in a properly configured cluster of
40
machines. The environment variable <code>UNIQUE_ID</code> is
41
set to the identifier for each request. Unique identifiers are
42
useful for various reasons which are beyond the scope of this
45
<div id="quickview"><h3 class="directives">Directives</h3>
46
<p>This module provides no
50
<li><img alt="" src="../images/down.gif" /> <a href="#theory">Theory</a></li>
52
<div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
54
<h2><a name="theory" id="theory">Theory</a></h2>
57
<p>First a brief recap of how the Apache server works on Unix
58
machines. This feature currently isn't supported on Windows NT.
59
On Unix machines, Apache creates several children, the children
60
process requests one at a time. Each child can serve multiple
61
requests in its lifetime. For the purpose of this discussion,
62
the children don't share any data with each other. We'll refer
63
to the children as <dfn>httpd processes</dfn>.</p>
65
<p>Your website has one or more machines under your
66
administrative control, together we'll call them a cluster of
67
machines. Each machine can possibly run multiple instances of
68
Apache. All of these collectively are considered "the
69
universe", and with certain assumptions we'll show that in this
70
universe we can generate unique identifiers for each request,
71
without extensive communication between machines in the
74
<p>The machines in your cluster should satisfy these
75
requirements. (Even if you have only one machine you should
76
synchronize its clock with NTP.)</p>
79
<li>The machines' times are synchronized via NTP or other
80
network time protocol.</li>
82
<li>The machines' hostnames all differ, such that the module
83
can do a hostname lookup on the hostname and receive a
84
different IP address for each machine in the cluster.</li>
87
<p>As far as operating system assumptions go, we assume that
88
pids (process ids) fit in 32-bits. If the operating system uses
89
more than 32-bits for a pid, the fix is trivial but must be
90
performed in the code.</p>
92
<p>Given those assumptions, at a single point in time we can
93
identify any httpd process on any machine in the cluster from
94
all other httpd processes. The machine's IP address and the pid
95
of the httpd process are sufficient to do this. So in order to
96
generate unique identifiers for requests we need only
97
distinguish between different points in time.</p>
99
<p>To distinguish time we will use a Unix timestamp (seconds
100
since January 1, 1970 UTC), and a 16-bit counter. The timestamp
101
has only one second granularity, so the counter is used to
102
represent up to 65536 values during a single second. The
103
quadruple <em>( ip_addr, pid, time_stamp, counter )</em> is
104
sufficient to enumerate 65536 requests per second per httpd
105
process. There are issues however with pid reuse over time, and
106
the counter is used to alleviate this issue.</p>
108
<p>When an httpd child is created, the counter is initialized
109
with ( current microseconds divided by 10 ) modulo 65536 (this
110
formula was chosen to eliminate some variance problems with the
111
low order bits of the microsecond timers on some systems). When
112
a unique identifier is generated, the time stamp used is the
113
time the request arrived at the web server. The counter is
114
incremented every time an identifier is generated (and allowed
117
<p>The kernel generates a pid for each process as it forks the
118
process, and pids are allowed to roll over (they're 16-bits on
119
many Unixes, but newer systems have expanded to 32-bits). So
120
over time the same pid will be reused. However unless it is
121
reused within the same second, it does not destroy the
122
uniqueness of our quadruple. That is, we assume the system does
123
not spawn 65536 processes in a one second interval (it may even
124
be 32768 processes on some Unixes, but even this isn't likely
127
<p>Suppose that time repeats itself for some reason. That is,
128
suppose that the system's clock is screwed up and it revisits a
129
past time (or it is too far forward, is reset correctly, and
130
then revisits the future time). In this case we can easily show
131
that we can get pid and time stamp reuse. The choice of
132
initializer for the counter is intended to help defeat this.
133
Note that we really want a random number to initialize the
134
counter, but there aren't any readily available numbers on most
135
systems (<em>i.e.</em>, you can't use rand() because you need
136
to seed the generator, and can't seed it with the time because
137
time, at least at one second resolution, has repeated itself).
138
This is not a perfect defense.</p>
140
<p>How good a defense is it? Suppose that one of your machines
141
serves at most 500 requests per second (which is a very
142
reasonable upper bound at this writing, because systems
143
generally do more than just shovel out static files). To do
144
that it will require a number of children which depends on how
145
many concurrent clients you have. But we'll be pessimistic and
146
suppose that a single child is able to serve 500 requests per
147
second. There are 1000 possible starting counter values such
148
that two sequences of 500 requests overlap. So there is a 1.5%
149
chance that if time (at one second resolution) repeats itself
150
this child will repeat a counter value, and uniqueness will be
151
broken. This was a very pessimistic example, and with real
152
world values it's even less likely to occur. If your system is
153
such that it's still likely to occur, then perhaps you should
154
make the counter 32 bits (by editing the code).</p>
156
<p>You may be concerned about the clock being "set back" during
157
summer daylight savings. However this isn't an issue because
158
the times used here are UTC, which "always" go forward. Note
159
that x86 based Unixes may need proper configuration for this to
160
be true -- they should be configured to assume that the
161
motherboard clock is on UTC and compensate appropriately. But
162
even still, if you're running NTP then your UTC time will be
163
correct very shortly after reboot.</p>
165
<p>The <code>UNIQUE_ID</code> environment variable is
166
constructed by encoding the 112-bit (32-bit IP address, 32 bit
167
pid, 32 bit time stamp, 16 bit counter) quadruple using the
168
alphabet <code>[A-Za-z0-9@-]</code> in a manner similar to MIME
169
base64 encoding, producing 19 characters. The MIME base64
170
alphabet is actually <code>[A-Za-z0-9+/]</code> however
171
<code>+</code> and <code>/</code> need to be specially encoded
172
in URLs, which makes them less desirable. All values are
173
encoded in network byte ordering so that the encoding is
174
comparable across architectures of different byte ordering. The
175
actual ordering of the encoding is: time stamp, IP address,
176
pid, counter. This ordering has a purpose, but it should be
177
emphasized that applications should not dissect the encoding.
178
Applications should treat the entire encoded
179
<code>UNIQUE_ID</code> as an opaque token, which can be
180
compared against other <code>UNIQUE_ID</code>s for equality
183
<p>The ordering was chosen such that it's possible to change
184
the encoding in the future without worrying about collision
185
with an existing database of <code>UNIQUE_ID</code>s. The new
186
encodings should also keep the time stamp as the first element,
187
and can otherwise use the same alphabet and bit length. Since
188
the time stamps are essentially an increasing sequence, it's
189
sufficient to have a <em>flag second</em> in which all machines
190
in the cluster stop serving and request, and stop using the old
191
encoding format. Afterwards they can resume requests and begin
192
issuing the new encodings.</p>
194
<p>This we believe is a relatively portable solution to this
195
problem. It can be extended to multithreaded systems like
196
Windows NT, and can grow with future needs. The identifiers
197
generated have essentially an infinite life-time because future
198
identifiers can be made longer as required. Essentially no
199
communication is required between machines in the cluster (only
200
NTP synchronization is required, which is low overhead), and no
201
communication between httpd processes is required (the
202
communication is implicit in the pid value assigned by the
203
kernel). In very specific situations the identifier can be
204
shortened, but more information needs to be assumed (for
205
example the 32-bit IP address is overkill for any site, but
206
there is no portable shorter replacement for it). </p>
209
<div class="bottomlang">
210
<p><span>Available Languages: </span><a href="../en/mod/mod_unique_id.html" title="English"> en </a> |
211
<a href="../ja/mod/mod_unique_id.html" hreflang="ja" rel="alternate" title="Japanese"> ja </a> |
212
<a href="../ko/mod/mod_unique_id.html" hreflang="ko" rel="alternate" title="Korean"> ko </a></p>
213
</div><div id="footer">
214
<p class="apache">Copyright 2006 The Apache Software Foundation.<br />Licensed under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.</p>
215
<p class="menu"><a href="../mod/">Modules</a> | <a href="../mod/directives.html">Directives</a> | <a href="../faq/">FAQ</a> | <a href="../glossary.html">Glossary</a> | <a href="../sitemap.html">Sitemap</a></p></div>
b'\\ No newline at end of file'