3
<TITLE>REC-PICS-labels-961031</TITLE>
5
<BODY BACKGROUND="recbg.jpg">
7
<A HREF="http://www.w3.org/"><IMG BORDER="0" align=left ALT="W3C" SRC="w3c_home.gif"></A>
8
<A HREF="http://www.w3.org/pub/WWW/PICS/"><IMG BORDER="0" SRC="pics_48x48.gif"
9
ALT="PICS" WIDTH="48" HEIGHT="48"></A>
11
REC-PICS-labels-961031
14
PICS Label Distribution Label Syntax and Communication Protocols
24
W3C Recommendation 31-October-96
32
Jim Miller <A HREF="mailto:jmiller@w3.org"><jmiller@w3.org></A>
37
<A HREF="mailto:timk@spyglass.com"><timk@spyglass.com></A>
40
<A HREF="mailto:jmiller@w3.org"><jmiller@w3.org></A><BR>
42
<A HREF="mailto:presnick@research.att.com"><presnick@research.att.com></A><BR>
44
<A HREF="mailto:treese@OpenMarket.com"><treese@OpenMarket.com></A>
50
Status of this document
53
This document has been reviewed by W3C members and other interested parties
54
and has been endorsed by the Director as a W3C Recommendation. It is a stable
55
document and may be used as reference material or cited as a normative reference
56
from another document. W3C's role in making the Recommendation is to draw
57
attention to the specification and to promote its widespread deployment.
58
This enhances the functionality and interoperability of the Web.
60
A list of current W3C Recommendations and other technical documents can be
62
<A href="http://www.w3.org/pub/WWW/TR/">http://www.w3.org/pub/WWW/TR/</A>.
68
This document has been prepared for the technical subcommittee of PICS (Platform
69
for Internet Content Selection). It defines a general format for labels and
70
three methods by which these labels may be transmitted:
75
With a document transported via a protocol that uses RFC-822 headers.
77
Separately from the document.
85
<A HREF="#Overview">Overview</A>
87
<A HREF="#General">General Format</A>
89
<A HREF="#Example">Example</A>
91
<A HREF="#Detailed">Detailed Syntax</A>
93
<A HREF="#Semantics">Semantics of PICS Labels and Label Lists</A>
95
<A HREF="#Embedding">Embedding Labels in HyperText Markup Language (HTML)</A>
97
<A HREF="#Using">Using HTTP to Request Labels With A Document</A>
99
<A HREF="#Requesting">Requesting Labels Separately</A>
101
<A HREF="#MICs">MICs and Digital Signatures</A>
103
<A HREF="#Glossary">Glossary</A>
105
<A HREF="#Acknowledgements">Acknowledgments</A>
107
<A HREF="#Appendix A">Appendix A: An Algorithm for Locating a Label Bureau</A>
109
<A HREF="#Appendix B">Appendix B: Sample Label Bureau Queries and
110
Responses</A><A NAME="queries"> </A>
114
<A NAME="Overview">Overview</A>
117
This document has been prepared for the technical subcommittee of PICS (Platform
118
for Internet Content Selection). It defines a general format for labels and
119
three methods by which these labels may be transmitted:
124
We specify a mechanism, using the existing META tag, for embedding one or
125
more labels in (the header of) an HTML document.
127
With a document transported via a protocol that uses RFC-822 headers.
129
Labels can be transmitted using <EM>any</EM> protocol that uses RFC-822-style
130
headers. In addition, we define an extension specific to the HTTP protocol
131
that allows an HTTP client (Web browser) to request which labels (if any)
132
it would like to have sent along with a document. The PICS committee hopes
133
that other network protocols will be extended in a similar way.
135
Separately from the document.
137
A client can request labels from a "label bureau" that runs the HTTP protocol.
138
The labels may refer to any document that has a URL (see
139
<A HREF="ftp://ds.internic.net/rfc/rfc1738.txt">RFC-1738</A>), including
140
those available through protocols other than HTTP, such as ftp, gopher, or
141
netnews. Notice that PICS defines a new URL scheme for referencing IRC chat
142
rooms (see <A href="http://w3.org/PICS/services.html">Rating Services and
143
Rating Systems</A>). The simplest implementation of a label bureau is an
144
off-the-shelf HTTP server running a special CGI script.
147
<A NAME="General">General Format</A>
150
A label consists of a <I>service identifier</I>, <I>label options</I>, and
151
a <I>rating</I>. The service identifier is the URL chosen by the rating service
152
(see <A href="http://w3.org/PICS/services.html">Rating Services and Rating
153
Systems</A>) as its unique identifier. Label options give additional properties
154
of the document being rated as well as properties of the rating itself, such
155
as the time the document was rated. The rating itself is a set of attribute-value
156
pairs that describe a document along one or more dimensions. One or more
157
labels may be distributed together as a list. The general form for a label
158
list (formatted for presentation, and not showing error status codes) is:
161
<I><service url> </I>[<I>option...</I>]
162
labels [<I>option...</I>] ratings (<category> <value> ...)
163
[option...] ratings (<category> <value> ...)
165
<<I>service url</I>> [<I>option...</I>]
166
labels [<I>option...</I>] ratings (<category> <value> ...)
167
[option...] ratings (<category> <value> ...)
172
A <EM>specific</EM> label applies to a single document. If the document is
173
in HTML format, it may refer to other documents, either by external reference
174
(for example, using the <A href=...> tag) or by requesting that they
175
be displayed in-line (for example, using the <img ...> or <object
176
...> tag). A label applies to the given document only, <EM>not</EM> to
177
the referenced documents.
179
A <EM>generic</EM> label (identified by the use of the <B>generic</B> option)
180
applies to any document whose URL begins with a specific string of characters
181
(specified using the <B>for</B> option). A generic label does <EM>not</EM>
182
have the expected semantics of a "default" label that can be overridden by
183
more specific labels. While a specific label does override a generic label
184
when a client has access to both, the two labels may be distributed separately,
185
and thus a client may have access to only the generic label. A server can
186
keep track of defaults and overrides and generate a specific label based
187
on a default that is not overridden in its local database. However, a generic
188
label for a site or directory should only be distributed if it applies to
189
all the documents in that site or directory.
191
A rating service may provide a generic label for any or all prefixes of a
192
given URL, but should provide only one specific label for that URL. When
193
the specific label for a document can be found, it should be used in preference
194
to any generic label. Lacking a specific label, any generic label may be
195
substituted, but preference should be given to the generic label which has
196
the longest string. Some PICS client software may impose restrictions on
197
the use of generic labels. For example, a client may choose to ignore a generic
198
label that applies to a node in the URL tree more than two levels above the
199
node where the document is located.
201
Label options can be divided into three groups. Options from the first group
202
supply information about the document to which the label applies. Options
203
from the second group supply information about the label itself. The last
204
group provides miscellaneous information.
207
<B>Information about the document that is labeled.</B>
210
at <I>quoted-ISO-date</I>
212
The last modification date of the item to which this rating applies, at the
213
time the rating was assigned. This can serve as a less expensive, but less
214
reliable, alternative to the message integrity check (MIC) options.
216
MIC-md5 "<I>Base64-string</I>"
218
-or- md5 "<I>Base64-string</I>"
220
A message integrity check (MIC) of the item being rated. The MD5 Message
221
Digest Algorithm (see
222
<A HREF="ftp://ds.internic.net/rfc/rfc1321.txt">RFC1321</A>) is used to compute
223
the MIC. One way to create this message digest is to use the RSAREF (version
224
2.0) software available for this purpose at no charge from RSA Laboratories.
225
See <A href="#MICs">MICs and Digital Signatures</A> below.
228
<B>Information about the label itself.</B>
233
An identifier for the person or entity within the rating service who was
234
responsible for creating this particular label. This may be human readable,
235
or it may be used to contain a (base-64 encoded) set of certificates and
236
other information used to verify the signature on the label.
240
The URL (or prefix string of a URL) of the item to which this rating applies.
241
This option is required for generic labels and in certain other cases (see
242
"Requesting Labels Separately," below); it is optional in other cases. Since
243
a single document can have many URLs, the URL used to retrieve a document
244
may differ from the URL in the <B>for</B> option of a label that accompanies
247
generic <I>boolean</I>
249
-or- gen <I>boolean</I>
251
If this option is set to true, the label can be applied to any URL starting
252
with the prefix given in the <B>for</B> option. This is used to supply ratings
253
for entire sites or any subparts of sites. All generic labels must also include
254
the <B>for</B> option. As mentioned earlier, a generic label should not be
255
created unless it can be legitimately applied to <EM>all</EM> documents whose
256
URL begins with the prefix specified in the <B>for</B> option (even if a
257
more specific label exists).
259
on <I>quoted-ISO-date</I>
261
The date on which this rating was issued.
263
signature-RSA-MD5 "<I>Base64-string</I>"
265
An RSA digital signature encompassing the label. The signature is computed
266
using the MD5 algorithm by the rating service that issued the label. One
267
way to create this signature is to use the RSAREF (version 2.0) software
268
available for this purpose at no charge from RSA Laboratories. See
269
<A href="#MICs">MICs and Digital Signatures</A> below.
271
until <I>quoted-ISO-date</I>
273
-or- exp <I>quoted-ISO-date</I>
275
The date on which this rating expires.
278
<B>Other information.</B>
281
comment <I>quotedname</I>
283
Information for humans who may see the label; no associated semantics.
285
complete-label <I>quotedURL</I>
287
-or- full <I>quotedURL</I>
289
Dereferencing this URL returns a complete label that can be used in place
290
of the current one. The complete label has values for as many attributes
291
as possible. This is used when a short label is transmitted for performance
292
purposes but additional information is also available. When the URL is
293
dereferenced it returns an item of type application/pics-labels that contains
294
a labellist with exactly one label.
296
extension (optional <I>quotedURL data</I>*)
298
-or- extension (mandatory <I>quotedURL data</I>*)
300
Future extension mechanism. To avoid duplication of extension names, each
301
extension is identified by a <I>quotedURL</I>. The URL can be dereferenced
302
to get a human-readable description of the extension. If the extension is
303
<B>optional</B> then software which does not understand the extension can
304
simply ignore it; if the extension is <B>mandatory</B> then software which
305
does not understand the extension should act as though no label had been
306
supplied. Each item of <I>data</I> must be one of a fixed set of simple-to-parse
307
data types as specified in the detailed syntax below. See
308
<A href="http://w3.org/PICS/extensions/"> http://w3.org/PICS/extensions/</A>
309
to find out what extensions are currently in use.
313
<A NAME="Example">Example</A>
316
For example, a label list for two documents, using the example rating system
317
from <A HREF="REC-PICS-services-961031.html">PICS Rating Services and Rating
318
Systems</A>, might be as follows (in all examples, the spacing and indentation
319
is provided for readability; the specification treats multiple white space
320
characters as if they were compressed into a single space):
322
(PICS-1.1 "http://www.gcf.org/v2.5"
324
labels on "1994.11.05T08:15-0500"
325
until "1995.12.31T23:59-0000"
326
for "http://w3.org/PICS/Overview.html"
327
ratings (suds 0.5 density 0 color/hue 1)
328
for "http://w3.org/PICS/Underview.html"
330
ratings (subject 2 density 1 color/hue 1))
333
The same label list may be transmitted more compactly by converting all of
334
the line breaks and subsequent indentation characters into a single space,
335
and by replacing the word "labels" with "l", "ratings" with "r" and long
336
option names with their abbreviations. It may be compressed for transmission
337
purposes even further by removing all of the optional information to a separate
338
document and referencing that document by a URL:
340
(PICS-1.1 "http://www.gcf.org/v2.5" l
341
full "http://www.gcf.org/labels/13242123"
342
r (suds 0.5 density 0 color/hue 1)
343
full "http://www.gcf.org/labels/123412278"
344
r (subject 2 density 1 color/hue 1))
347
Finally, the optional information may be omitted entirely, reducing the
348
information content of the labels but making the transmission even smaller.
349
The resulting label list would then be:
351
(PICS-1.1 "http://www.gcf.org/v2.5"
352
l r (suds 0.5 density 0 color/hue 1)
353
r (subject 2 density 1 color/hue 1))
356
<A NAME="Detailed">Detailed Syntax</A>
359
The following grammar, in modified BNF, describes the syntax of labels. The
360
methods by which labels are embedded in specific protocols are detailed below.
365
The string "PICS-1.1" in <B>version</B> corresponds to the version number
366
1.1 of the PICS specification in <A HREF="REC-PICS-services-961031.html">PICS
367
Rating Services and Rating Systems</A>. While it is inelegant that the service
368
description uses the notation "(PICS-version 1.1)" while the label itself
369
uses "PICS-1.1", it is intentional.
371
Whitespace is ignored except in quoted strings. Multiple contiguous whitespace
372
characters can be treated as though they were a single space character.
374
Transmit-names and quoted strings are case sensitive. Option names and other
375
tokens in the BNF grammar are case insensitive.
377
This specification is strictly about information carried over the wire from
378
the client to the server, and it requires the use of US-ASCII. The companion
379
document <A HREF="REC-PICS-services-961031.html">PICS Rating Services and
380
Rating Systems</A> describes how a client can map these transmit-names to
381
descriptive strings using other character sets. Clients are advised to cache
382
the descriptions of rating services they use so that the information in labels
383
can be conveniently presented to the user.
385
An option that appears in the <I>service-info</I> applies to all labels in
386
that <I>service-info</I> unless overridden by an option in a specific
387
<I>label</I>. That is, a <I>label</I> is effectively lexically nested within
388
the enclosing <I>service-info</I> for the purpose of understanding the applicable
389
options. This is most likely to be useful in the case of the <B>by</B>,
390
<B>generic</B>, <B>on, until </B>and experimental or future options. In the
391
first example above, the <B>by</B> option (with the value "John Doe") supplied
392
with the <I>service-info</I> applies to the first label, but is overridden
393
in the second (by the value "Jane Doe").
395
Numbers in PICS labels may be integers or fractions with no greater range
396
or precision than that provided by IEEE single-precision floating point numbers.
397
Implementors concerned about the vagaries of floating point comparisons may
398
choose to represent numbers internally as ASCII strings.
400
The <I>multi-value</I> syntax <I>must</I> be used when there is more than
401
one value for a particular category. This syntax <I>may</I> be used when
402
there is exactly one value, but the more compact version may also be used
403
in that case. When there is no value, the category may be omitted entirely
404
or transmitted using the multi-value syntax.
406
The only options that may occur more than once in a particular
407
<I>single-label</I> or <I>service-info</I> are <B>comment</B> and
408
<B>extension</B>; if the <B>extension</B> option is supplied more than once,
409
the <I>quotedURL</I>s defining the extensions must be distinct.
411
Categories may appear in any order in a <I>rating</I>; they need not match
412
the order in which they appear in the <TT>application/pics-service</TT>.
414
For parsing purposes, notice that a label ends with either "ratings" or "r"
415
followed by a parenthesized list of categories and values. If this does not
416
end the label list, it is followed by either another label (possibly starting
417
with options), a new service URL (recognizable because it must be surrounded
418
by quotation marks), or an error (starting with the word "error").
421
<B>labellist ::</B> '(' <I>version</I> <I>service-info</I>+<I> </I>')'
422
<B>version ::</B> 'PICS-1.1'
423
<B>service-info :: </B>'error' '(no-ratings' <I>explanation</I>* ')'
424
| <I>serviceID service-error </I>| <I>serviceID option</I>*<I> labelword label</I>*
425
<B>serviceID ::</B> <I>quotedURL</I>
426
<B>labelword :: </B>'labels' | 'l'
427
<B>label ::</B> <I>label-error </I>| <I>single-label </I>| '(' <I>single-label</I>* ')'
428
<B>single-label ::</B> <I>option</I>* <I>ratingword</I> '(' <I>rating</I>+ ')'
429
<B>ratingword :: </B>'ratings' | 'r'
430
<B>quotedURL ::</B> '"' <I>URL</I> '"' as described and extended in
431
<A HREF="REC-PICS-services-961031.html">Rating Services and Rating Systems</A>.
432
<B>option ::</B> <I>labeloption</I> | <I>documentoption</I> | <I>otheroption</I>
433
<B>labeloption ::</B>
434
'by' <I>quotedname</I>
435
| 'generic' <I>boolean</I> | 'gen' <I>boolean</I>
436
| 'for' <I>quotedURL</I>
437
| 'on' <I>quoted-ISO-date</I>
438
| 'signature-RSA-MD5' "<I>base64-string</I>"
439
| 'until' <I>quoted-ISO-date</I> | 'exp' <I>quoted-ISO-date</I>
440
<B>documentoption ::</B>
441
'at' <I>quoted-ISO-date</I>
442
| 'MIC-md5' "<I>base64-string</I>" | 'md5' "<I>base64-string</I>"
443
<B>otheroption ::</B>
444
'comment' <I>quotedname</I>
445
| 'complete-label' <I>quotedURL</I> | 'full' <I>quotedURL</I>
446
| 'extension' '(' <I>mand/opt quotedURL data</I>* ')'
447
<B>mand/opt :: </B>'optional' | 'mandatory'
448
<B>data :: </B><I>quoted-ISO-date </I>| <I>quotedURL</I>
449
| <I>number</I> | <I>quotedname</I> | '(' <I>data</I>* ')'
450
<B>quoted-ISO-date ::</B> '"'YYYY'.'MM'.'DD'T'hh':'mmStz'"'
451
based on the ISO 8601:1988 date and time standard, restricted
452
to the specific form described here:
453
<B>YYYY ::</B> four-digit year
454
<B>MM ::</B> two-digit month (01=January, etc.)
455
<B>DD ::</B> two-digit day of month (01 through 31)
456
<B>hh ::</B> two digits of hour (00 through 23) (am/pm NOT allowed)
457
<B>mm ::</B> two digits of minute (00 through 60)
458
<B>S ::</B> sign of time zone offset from UTC ('+' or '-')
459
<B>tz ::</B> four digit amount of offset from UTC
460
(e.g., 1512 means 15 hours and 12 minutes)
461
For example, "1994.11.05T08:15-0500" is a valid <I>quoted-ISO-date</I>
462
denoting November 5, 1994, 8:15 am, US Eastern Standard Time
463
<B>Note:</B> The ISO standard allows considerably greater
464
flexibility than that described here. PICS requires <I>precisely</I>
465
the syntax described here -- neither the time nor the time zone may
466
be omitted, none of the alternate formats are permitted, and
467
the punctuation must be as specified here.
468
<B>rating ::</B> <I>transmit-name</I> <I>number</I> | <I>transmit-name </I>'(' <I>multi-value</I>*<I> </I>')'
469
<B>multi-value :: </B><I>number </I>| <I>number </I>':' <I>number</I>
470
<B>transmit-name ::</B> <I>transmit-name-char</I>+ ['/' <I>transmit-name</I>]
471
<B>number ::</B> [<I>sign</I>]<I>unsignedint</I>['.' [<I>unsignedint</I>]]
472
<B>sign ::</B> '+' | '-'
473
<B>unsignedint :: </B>[0-9]+
474
<B>quotedname ::</B> '"' <I>urlchar-or-space</I>+ '"'
475
<B>alphanumpm ::</B> 'A' | ... | 'Z' | 'a' | ... | 'z' | '0' | ... | '9' | <I>sign</I>
476
<B>transmit-name-char ::</B> <I>alphanumpm</I> | '.' | '$' | ',' | ';' | ':'
477
| '&' | '=' | '?' | '!' | '*' | '~' | '@'
478
| '#' | '_' | '%' <I>hex hex</I>
479
<I>Note</I>: Use the "%" escape technique (% followed by the two
480
hex digits that represent the character in the ASCII character
481
set) to insert single or double quotation marks or parentheses.
482
<B>urlchar ::</B> <I>transmit-name-char</I> | '(' | ')'
483
<B>hex ::</B> '0' | ... | '9' | 'A' | ... | 'F' | 'a' | ... | 'f'
484
<B>urlchar-or-space ::</B> <I>urlchar</I> | ' '
485
<B>base64-string</B> <B>:: </B>as defined in <A HREF="ftp://ds.internic.net/rfc/rfc1521.txt">RFC-1521</A>.
486
<B>service-error :: </B>'error' '(' 'request-denied' <I>explanation</I>* ')'
487
<I> </I>| 'error' 'service-unavailable'
488
<B>label-error</B> :: 'error' '(' 'request-denied' [<I>quotedURL</I> <I>explanation</I>*] ')'
489
<I> </I>| 'error' '(' 'not-labeled' <I>quotedURL</I>* ')'
490
<B>explanation :: </B><I>quotedname</I>
493
<A NAME="Semantics">Semantics of PICS Labels and Label Lists</A>
496
A <I>labellist</I> is used to transmit a set of PICS labels. The format specified
497
here is intended to be registered with IANA as the MIME type
498
"application/pics-labels." It allows for transmission of both labels and
499
reasons why labels are not available, and is the format used when labels
500
must be conveyed in a document, along with a document, or from a PICS label
501
bureau. The <I>labellist</I> will always be surrounded by parentheses and
502
begin with the PICS version number (1.1 in this specification).
504
A label list either specifies that there are no labels available at all (e.g.,
505
"error (no-ratings ...)") or is separated into sections of labels, one section
506
for each rating service. The URL of each service must be specified (the
507
<I>serviceID</I>). This is either followed by an error message indicating
508
why no labels are available from that service (<I>service-error) </I>or an
509
overall set of optional information (<I>option</I>*) followed by the keyword
510
"labels" (or "l") and the <I>label</I>s from the service. The optional
511
information provided here applies to every label from the service, unless
512
overridden in the specific label itself.
514
A <I>label</I> encompasses three separate cases. The first is an error that
515
applies to retrieving the label for a particular URL (<I>label-error</I>).
516
The second, and most common, is a <I>single-label</I> consisting of options
517
(which override those specified with the service), the marker word "ratings"
518
(or "r") and the ratings themselves (a list of category names and values).
519
Finally, in the special case where the ratings for an entire tree of documents
520
have been requested, any number of <I>single-label</I>s can be transmitted,
521
enclosed in parentheses. This case is described in more detail in the section
522
on "Requesting Labels Separately."
524
A label may apply to a specific URL, or it may be generic. A generic label
525
implicitly rates every URL for which the specified one is a prefix. For example,
526
a generic label for the URL "http://w3.org" implicitly rates every document
527
available at that site. A specific (non-generic) label for the same URL,
528
"http://w3.org", does not give any implicit ratings: it merely rates the
529
organization's home page that is fetched by the command "<CODE>GET /</CODE>"
530
sent by HTTP to the host <CODE>w3.org</CODE>. A generic label <I>must
531
</I>include the "<B>for</B>" option specifying the URL to which it applies.
532
As mentioned above, a generic label should be supplied only if it can be
533
legitimately applied to <EM>all</EM> documents with URLs that begin with
534
the string specified in the label's <B>for</B> option.
536
When a <I>multi-value</I> is provided, any combination of numbers and ranges
537
of numbers may be specified, with the endpoints of a range separated by a
538
":". Thus, in the labellist
540
(PICS-1.1 "http://www.gcf.org/v2.5" l
541
r (suds 0.5 density 0 color/hue 1 subject (0.5:1.5 2)))
544
all subject values between 0.5 and 1.5 (including both endpoints) apply to
545
the item, as does the subject value 2. Given the example service description
546
in <A href="http://w3.org/PICS/services.html">Rating Services and Rating
547
Systems</A>, two document subjects apply, "water" (subject value 1) and
548
"soapdish" (subject value 2.) The third, "soap," has subject value 0, so
554
Many protocols, such as Internet electronic mail, the HyperText Transfer
555
Protocol, and USENET News, use US-ASCII headers as described in RFC-822.
556
For use in such protocols, we define a new header, PICS-Label, used to contain
557
the labels described in this document. The syntax is:
559
PICS-Label: <labellist>
562
where <I>labellist</I> is described according to the syntax above. Continuation
563
lines beginning with whitespace may be used following the specification given
566
<A NAME="Embedding">Embedding Labels in HyperText Markup Language (HTML)</A>
569
Labels may be embedded in HTML files as meta-information, using the META
570
element defined in the HTML specification. This embedding uses the HTTP header
571
equivalence mechanism:
573
<META http-equiv="PICS-Label" content='<I>labellist</I>'>
576
Note that the content attribute uses single quotes, because the PICS label
577
syntax uses double quotes. Any of the following characters appearing within
578
the content must be escaped using SGML entities:
580
' &#39; /* single quote */
581
& &amp; /* ampersand */
582
> &gt; /* greater than */
585
See the <A HREF="http://ds.internic.net/rfc/rfc1866.txt">HTML 2.0 Proposed
588
A label that is embedded in a document may omit the "for" option, which would
589
normally specify a URL to which the label applies. A specific (non-generic)
590
label embedded in a document applies to that document, regardless of what
591
URL is used to locate the document. A generic label, when embedded in a document
592
that can be retrieved via a "home" URL (i.e., a URL path ending in /), applies
593
to all URLs that include the home URL as a prefix.
595
For example, if a client is interested in a label for the document
596
"http://www.greatdocs.com/foo/bar/bat.htm", it can first check whether the
597
document has a specific label embedded in it. If not, the client can ask
598
for the document "http://www.greatdocs.com/foo/bar/". The server sends back
599
the home document for foo/bar, which may be foo/bar/index.html,
600
foo/bar/home.html, or something else, depending on the server. If that document
601
contains an embedded generic label, then the client may interpret it as applying
602
to the document bat.htm. If the client does not find a generic label there,
603
it may check further up the hierarchy, in "http://www.greatdoc.com/foo/"
604
or even at "http://www.greatdocs.com/".
606
Web site operators who wish to provide specific labels for their html documents
607
are encouraged to embed them in the documents. Those who wish to provide
608
generic labels for their sites or subparts of their sites are encouraged
609
to include them in the home documents at as many levels of the document naming
610
hierarchy as they think are appropriate. They are also encouraged to use
611
the more elegant and functional method, described in the next section, of
612
sending labels in the http header stream, whenever tools are available for
615
<A NAME="Using">Using HTTP to Request Labels With A Document</A>
618
We specify a simple extension to HTTP that allows a client to request that
619
one or more labels be included in a header along with the document. We deal
620
here only with the HTTP protocol; we hope that other protocols will be similarly
621
extended. HTTP servers should include PICS label headers only if requested
622
to do so by the client, and should only include the labels from services
623
requested by the client. As with labels embedded in documents, the client
624
may assume that a label returned in the http header stream applies to the
625
document requested, regardless of the URL specified in the "for" option of
631
<B>Client sends to HTTP server www.greatdocs.com, a PICS-enabled server:</B>
633
GET /foo.html HTTP/1.0
634
Protocol-Request: {PICS-1.1 {params full
635
{services "http://www.gcf.org/v2.5"}}}
638
<B>Server responds to client:</B>
641
Date: Thu, 30 Jun 1995 17:51:47 GMT
642
Last-modified: Thursday, 29-Jun-95 17:51:47 GMT
643
Protocol: {PICS-1.1 {headers PICS-Label}}
645
(PICS-1.1 "http://www.gcf.org/v2.5" labels
646
on "1994.11.05T08:15-0500"
647
exp "1995.12.31T23:59-0000"
648
for "http://www.greatdocs.com/foo.html"
649
by "George Sanderson, Jr."
650
ratings (suds 0.5 density 0 color/hue 1))
651
Content-type: text/html
653
...contents of foo.html...
656
Explanation of example
659
The client requests the document foo.html. In addition, the client requests
660
the full label of the document from the rating service "http://www.gcf.org/v2.5".
661
The server responds by sending back the label, in the PICS-Label header,
662
as well as the document. The format of the PICS-Label header field (a
663
<I>labellist</I>) allows the server to respond either with a label or an
664
explanation of why the label is not available, since it would be inappropriate
665
for the server to generate an HTTP error status if the document is available
666
but (some of) the labels are not.
668
Following the usual HTTP distinction between HEAD and GET, a client that
669
wishes to examine a rating before retrieving the full document can substitute
670
the word HEAD for GET in the request. The server responds with exactly the
671
headers shown above, but does not send back the document foo.html.
673
Detailed Syntax of HTTP Requests for Labels With Document
676
The following grammar, in modified BNF, describes the syntax of the additional
677
header line to be included in an HTTP request for a document and associated
680
<B>request-header</B> ::
681
'Protocol-Request: {PICS-1.1 {params ' [<I>completeness</I>]
684
<B>completeness ::</B> 'minimal' | 'short' | 'full' | 'signed'
685
<B>extension ::</B> '{' <I>token-or-quoted-string</I>+ '}'
686
where the first <I>token-or-quoted-string</I> is not '<B>services</B>'.
687
<B>token-or-quoted-string ::</B> <I>token</I> | <I>quotedname</I>
688
<B>token ::</B> <I>alphanumpm</I>+
689
<B>services</B> :: '{' 'services' <I>quotedURL</I>+ '}'
692
A request for a <B>minimal</B> label asks that all options be omitted, unless
693
a generic label is returned, in which case the <B>generic</B> and <B>for</B>
694
options must also be included in the label. A <B>short</B> label includes
695
everything that is included in a <B>minimal</B> label, plus additional options
696
that the server deems appropriate. A request for a <B>full</B> label asks
697
that as much information as possible should be sent back in the label, either
698
directly or through the use of a <B>complete-label</B> (or <B>full</B>) option,
699
but no <B>signature-RSA-MD5</B> option is needed.
701
A request for <B>signed</B> labels asks that all the information in a
702
<B>full</B> label should be sent, along with a digital signature on the label
703
itself. In a signed label the information must be transmitted directly as
704
part of the label (and included in the computation of the signature); the
705
<B>complete-label</B> (or <B>full</B>) option may be sent, but it would be
706
redundant. Details of signing labels are included in the section
707
<A href="#MICs">MICs and Digital Signature</A>.
709
It is acceptable for a server to ignore the <I>completeness</I>, either by
710
delivering more or fewer options than requested. If the <I>completeness</I>
711
is omitted, it should be treated as though <B>minimal</B> had been supplied.
712
For future extensibility, any alphanumeric string may be used for a value
713
of the <B>completeness</B> option. Servers which receive a value of
714
<B>completeness</B> that they do not recognize must treat it as though
715
<B>minimal</B> had been specified.
717
The <I>extension</I>s are for future extensions to the protocol; any extensions
718
which are not understood by the server must be ignored by it. It is recommended
719
that experimental extensions use a URL, which dereferences to a description
720
of the extension, as the initial <I>token-or-quoted-string</I>.
722
Each <I>quotedURL </I>in a <I>service</I> specifies a rating service from
723
which the client is requesting a label for the document. There may be as
724
many repetitions of the <I>quotedURL </I>part of the <I>service</I> as desired,
725
so it is possible to request labels from any number of rating services in
726
a single HTTP request.
728
Detailed Syntax For HTTP Response Headers For Labels With Document
731
Two additional headers are specified:
733
<B>protocol-header :: </B>'Protocol: {PICS-1.1 {headers PICS-Label}}'
734
<B>label-header ::</B> 'PICS-Label: ' <I>labellist</I>
737
<A NAME="Requesting">Requesting Labels Separately</A>
740
PICS labels can also be retrieved separately from the documents to which
741
they refer. To request labels in this way, a client contacts a <B>label
742
bureau</B>. A label bureau is an HTTP server that understands a particular
743
query syntax, defined below. It can provide labels for documents that reside
744
on other servers, and, indeed, for documents available through protocols
745
other than HTTP. It is anticipated that there will be "well-known" label
746
bureaus which dispense (possibly for a fee) labels created by many rating
749
Rating services are also encouraged to act as label bureaus, providing on-line
750
access to their own labels. By default, the URL that identifies a rating
751
service also identifies its label bureau. If a client requests the URL that
752
identifies a rating service, a human-readable description of the service
753
is returned, as specified in <A href="http://w3.org/PICS/services.html">Rating
754
Services and Rating Systems</A>. If, on the other hand, a client requests
755
the same URL and includes query parameters as defined below, it should be
756
interpreted as a request for labels. A rating service, however, is not required
757
to act as a label bureau, and it may choose a different URL (perhaps even
758
on a different HTTP server) to act as its label bureau.
763
(For more complex queries and responses, see <A href="#queries">Appendix
766
Imagine a rating service, identified by the URL http://www.labels.org/Ratings,
767
which decides to run a label bureau to dispense (at least) its own labels
768
for documents. The following sample request, made to the HTTP server
769
www.labels.org, is illustrative (line breaks are inserted for presentation
772
GET /Ratings?opt=generic&
773
u="http%3A%2F%2Fwww.questionable.org%2Fimages"&
774
s="http%3A%2F%2Fwww.gcf.org%2Fv2.5"
778
The query asks the label bureau http://www.labels.org/Ratings to send a single
779
label that applies to everything in the images hierarchy at site
780
www.questionable.org. The desired label should have been created by the service
781
http://www.gcf.org/v2.5. Notice the use of %3A to represent a ":" and %2F
782
for "/." This is required for encoding characters within a URL. See
783
<A HREF="ftp://ds.internic.net/rfc/rfc1738.txt">RFC-1738</A>.
785
The label bureau responds by sending back a document of type
786
"application/pics-labels." The labels should be as complete as possible,
787
either by including as many options as possible or by supplying the
788
<B>complete-label </B>(or <B>full</B>) option.
790
Detailed Syntax of HTTP Query for Labels Separate From Documents
793
The following grammar, in modified BNF, describes the syntax of GET and POST
794
requests to a label bureau. The use of the POST request is specified only
795
for backward compatibility with HTTP servers that cannot handle a long GET
796
query. Its use, while described in the
797
<A href="ftp://ds.internic.net/rfc/rfc1866.txt">HTML 2.0</A> specification
798
(for use in submitting forms, see section 8.2.1 and 8.2.3), is deprecated.
800
<B>request ::</B> <I>get</I> | <I>post</I>
801
<B>get ::</B> 'get' <I>url-fragment</I> '?' [<I>opt</I>] [<I>format</I>]
802
<I>extension</I>* <I>url</I>+ <I>service</I>+
803
<B>post ::</B> 'post' <I>url-fragment crlf crlf formencodeddata</I>
804
<B>url-fragment ::</B> the part of the original URL after the host
805
name, as specified in HTTP 1.0.
806
<B>crlf ::</B> carriage return (hex D) followed by line feed (hex A)
807
<B>opt ::</B> 'opt=' <I>option</I>
808
<B>option ::</B> 'generic' | 'normal' | 'tree' | 'generic+tree'
809
<B>format ::</B> [and] 'format=' <I>form</I>
810
<B>form ::</B> 'minimal' | 'short' | 'full' | 'signed'
811
<B>extension ::</B> <I>token</I> '=' <I>token-or-quoted-string</I>
812
where the <I>token</I> is not one of <B>opt</B>, <B>format</B>,
813
<B>u</B>, or <B>s</B>; and <I>token-or-quoted-string</I> follows
814
the quoting conventions specified in <A HREF="ftp://ds.internic.net/rfc/rfc1738.txt">RFC-1738</A>
815
<B>token-or-quoted-string ::</B> <I>token</I> | <I>quotedname</I>
816
<B>token ::</B> <I>alphanumpm</I>+
817
<B>url ::</B> [and] 'u=' encodedURL
818
<B>service ::</B> [and] 's=' encodedURL
819
<B>boolean :: </B>'t' | 'f' | 'true' | 'false'
820
<B>and ::</B> '&' this must be included unless it immediately
821
follows the ? in the query.
822
<B>encodedURL ::</B> a quoted URL. Following <A HREF="ftp://ds.internic.net/rfc/rfc1738.txt">RFC-1738</A>, quotation and some
823
special characters inside the URL are encoded using "%xx" notation.
824
Alphabetic characters, digits, and the special characters
825
$_-.+!*'(), need not be quoted, but other characters must be.
826
This <I>does</I> imply that the colon (:) must be encoded as %3A
827
and slash (/) as %2F.
828
<B>formencodeddata ::</B> The query as specified for <I>get</I> but encoded into
829
MIME type application/x-www-form-encoded as described in
830
sections 8.2.1 and 8.2.3 of <A href="ftp://ds.internic.net/rfc/rfc1866.txt">HTML 2.0</A>.
833
Response to Query for Labels Separate From Documents
837
The label bureau responds by sending back a document of type
838
"application/pics-labels."
840
Unless the document indicates an overall error, there should be one
841
<I>service-info</I> for each rating service requested in the query. Each
842
<I>service-info</I> should have an error message or a label (or list of labels,
843
in the case of a "tree" query) for each requested URL.
845
The query's ordering must be preserved in the response. That is, the information
846
from the rating services must be presented in the same order the rating services
847
appear in the query, and the labels from each service must be presented in
848
the same order the URLs appear in the query. If a rating service or label
849
is not provided, the error message should appear in the same position that
850
the <I>service-info</I> or label would appear. Because order is preserved,
851
it is acceptable (except where indicated below) to omit from the labels the
852
"<B>for</B>" option which indicates the URL being rated. The client should
853
match the label positionally with the URL for which it requested a rating.
855
<B>Definitions.</B> Given a URL (e.g., "http://www.greatdocs.com/foo/"),
856
a <B>descendant</B> URL is any URL that contains the original as a prefix
857
(e.g., "http://www.greatdocs.com/foo/bar/bat.htm"). A <B>child</B> URL is
858
any descendant URL that does not contain any additional '/' characters (e.g.,
859
"http://www.greatdocs.com/foo/ba"). An <B>ancestor</B> URL is any URL that
860
is a prefix of the original (e.g., "http://www.greatdocs.com/f"). Note that
861
ancestry and descendence is determined strictly by case-sensitive string
862
matching on URLs, not by any links that may appear in html documents retrieved
863
using those URLs. Note that any quotation such as %3A for colon (:) or %2F
864
for slash (/) is unencoded prior to comparing URL strings.
866
<B>opt=normal</B>, or omitting the <I>opt </I>completely, requests specific
867
labels for the URLs specified. If no specific label is available for a requested
868
URL, the server may choose to send a generic label for the requested URL
869
or for an ancestor URL. For example, in response to a label request for URL
870
"http://w3.org/PICS/Overview.html" a generic label for the URL
871
"http://w3.org/PICS" (or even "http://w3.org") may be returned. In this case,
872
it is required that the "<B>for</B>" and "<B>generic</B>" options be included
873
in the label, to specify exactly what rating is being returned. Note that
874
the "for" option may specify a URL string which does not appear to match
875
the request URL, perhaps due to the server knowing about the existence of
876
an alternative URL for the same document. In that case, the server is suggesting
877
that the label applies to the request URL, though a suspicious client may
878
choose not to believe the suggestion.
880
<B>opt=generic</B> requests generic labels. It is useful for requesting a
881
rating of a site or subpart of a site. For each requested URL, the desired
882
response is a generic label that applies to the requested URL and all descendant
883
URLs. A generic label for the requested URL, or a generic label for any ancestor
884
URL, would satisfy this request, as such a generic label would apply to all
885
URLs containing the requested URL as a prefix. If no such generic label is
886
available, the server should include the "no-label" message rather than sending
887
back a specific label.
889
<B>opt=tree</B> requests a tree of labels. This is a way to request all the
890
labels that apply to items in a site or subpart of a site. For each requested
891
URL, the desired response is a set of labels (both specific and generic)
892
that apply to descendants of the requested URL. In the response, everywhere
893
a <I>label</I> would normally be expected in the response, a set of
894
<I>simple-label</I>s will be returned, surrounded by parentheses. This enables
895
the client to match the entire set positionally with the single request URL.
896
All labels produced in response to this query must include a <B>for</B> option.
897
The minimum response expected is the set of labels that would have been generated
898
if a query had been issued, with <B>opt=normal</B> specified, for each known
899
child of the requested URL. Additional labels may also be returned, typically
900
either generic labels for ancestor URLs or labels for descendant URLs farther
901
down the hierarchy than children.
903
<B>opt=generic+tree</B> is similar to the <B>opt=tree</B> request, but returns
904
only generic labels. As with <B>opt=tree</B>, the server can choose the amount
905
of detail. The minimum response expected is the set of labels that would
906
have been generated if a query had been issued, with <B>opt=generic</B>
907
specified, for each known child of the requested URL. Additional labels may
908
also be returned, typically generic labels for ancestor URLs. All labels
909
produced in response to this query must include a <B>for</B> option.
911
It is permitted to include more than one URL and/or service in the request.
912
Requesting <B>u</B> URLs and <B>s</B> services results in a total of <B>u</B>
913
x <B>s</B> labels being generated (or label sets in the case of tree and
914
generic+tree queries.)
916
The <B>format=</B> specifies the optional information that should be transmitted
917
with the labels. It is treated precisely as the similar keywords would be
918
when sent to a document server as the <I>completeness</I> (see
919
<A href="#with">Detailed Syntax of HTTP Requests for Labels With Document</A>),
920
except that the default is <B>full</B> (rather than <B>minimal</B>). Servers
921
which receive a value of <B>completeness</B> that they do not recognize must
922
treat it as though the default, <B>full</B>, had been specified. All labels
923
produced in response to this query must include a <B>for</B> option.
926
<A NAME="MICs">MICs and Digital Signatures</A>
929
This specification includes two independent security features, each intended
930
to prevent a different problem that can arise in a PICS system. They may
931
be used independently or together. Both features rely on patented cryptographic
932
technology whose use is subject to a variety of legal restrictions (including
933
possible U.S. export controls). The PICS technical committee cannot provide
934
any information about the exact legal status of the code or algorithms.
936
Within the United States, RSA Laboratories (100 Marine Parkway, Redwood City,
937
CA, 94065-1031) distributes a source code kit called
938
<A href="http://www.rsa.com/rsalabs/faq/faq_misc.html">RSAREF</A> which provides
939
all of the code required to implement the cryptographic components of the
940
PICS spec. The president of RSA Data Security, Inc., Mr. Jim Bidzos, has
941
advised us that RSAREF will be made available at no cost for use in implementing
942
the PICS specifications. Questions about the legal status, etc., should be
943
directed to Mr. Bidzos.
945
The first problem arises when a document has been examined and a label generated,
946
and then the document is modified without updating the label. While this
947
can happen legitimately (as when Time-Warner updates the page containing
948
the current issue of Time Magazine and believes that the label is still valid)
949
it can also happen as a result of tampering with the document by an unauthorized
950
party. PICS labels contain three option fields intended to help deter this
956
If the objective is to simply detect accidental changes, then the date of
957
last modification of the document can be calculated when the label is created
958
and stored in the <B>at</B> field. Assuming that the last modification time
959
is accurately maintained, this will detect updates to the document made after
960
the label was created.
962
<B>Until</B> or <B>exp</B>
964
If the document is expected to be updated infrequently or periodically, the
965
label can contain an expiration date that should cause the label to be invalid
966
before the document is next updated. This, too, does not guard against a
967
concerted malicious attack.
969
<B>MIC-md5</B> or <B>md5</B>
971
If the label is intended to apply only to the data that was actually rated,
972
then a form of checksum (called a "message digest") can be applied to the
973
data when the label is created. The message digest is converted into US-ASCII
974
characters using <A href="ftp://ds.internic.net/rfc/rfc1521.txt">MIME</A>
975
base-64 encoding and stored in the <B>MIC-md5</B> (also called <B>md5</B>)
976
field. When the document is later retrieved, the same algorithm can be used
977
to recompute the message digest and the two digests can be compared. The
978
MD5 algorithm is designed so that it is extremely unlikely that the two digests
979
will be the same if the document has been tampered with in any way. <BR>
980
This technique is well-known in the cryptographic community and has been
981
adopted by the electronic mail community, where it is part of the
982
<A href="ftp://ds.internic.net/rfc/rfc1848.txt">MOSS</A> specification. For
983
use with electronic mail, an elaborate technique is required to assure that
984
the two message digests will match, since electronic mail gateways can modify
985
the data before it is delivered (by wrapping lines, for example). We have
986
chosen <EM>not</EM> to adopt MOSS directly for PICS, largely because of this
988
Instead, we recommend the direct use of the MD5 algorithm on the source document
989
and conversion of the result to base64 encoding. This resulting string is
990
included directly in the <B>mic-md5</B> (<B>md5</B>) label option. The MD5
991
algorithm and the conversion of the result into US-ASCII characters is provided
992
by the RSAREF (version 2.0) software. <BR>
993
Because PICS labels can be embedded inside of the documents they label, care
994
must be taken to ensure that the message digest is computed excluding
995
<EM>all</EM> PICS labels in the document. For HTML documents, this means
996
that the digest must be computed after removing all META elements that include
997
PICS labels (and any whitespace immediately following the end of each of
998
these meta elements).
1001
The second problem is that of tampering with or forging labels. Here the
1002
problem is that the end user needs some way of being reassured that the label
1003
they receive was created by the rating service they expected and that it
1004
has not been altered since it was created. PICS addresses this problem by
1005
allowing labels to be "digitally signed". A digital signature, while not
1006
currently legally recognized, is a cryptographic technique to provide exactly
1007
this assurance. The RSA signature technique works as follows:
1010
In order to sign a label, the rating service (or people authorized to generate
1011
labels on behalf of the service) needs a "public key pair." (The RSAREF software
1012
includes routines to create these pairs.) One of these (the private key)
1013
must be kept secret by the service; the other (the public key) must be
1014
distributed to anyone who is interested in verifying the signatures on the
1017
After creating a label, the service converts it to a special form specified
1018
below and computes the MD5 message digest of the label. It then uses the
1019
service's private key to encrypt the digest. This encrypted digest is the
1020
digital signature, and it is converted to US-ASCII using the same base64
1021
encoding technique mentioned above. The US-ASCII version (split into 60 character
1022
lines) is stored in the <B>signature-rsa-md5</B> option of the label when
1023
it is transmitted to the client. (The RSAREF software includes routines to
1024
generate the signature and convert it to US-ASCII.)
1026
When the client receives a label and wants to verify the signature it takes
1027
the label it received and converts it back into the same special form in
1028
which it was originally signed. The client recomputes the message digest
1029
on this special form. It also takes the contents of the
1030
<B>signature-rsa-md5</B> option, combines all of the lines back into a single
1031
string of US-ASCII characters, converts these from base64 into their original
1032
(binary) form, and decrypts them using the service's public key. If the result
1033
isn't the same as the message digest it computed the signature is invalid.
1034
(RSAREF contains routines to do all of this work except for the combining
1035
of the lines into a long string.)
1038
The problem of distributing these keys (and invalidating them in case the
1039
service's key is compromised) is an active area of commercial competition.
1040
Since there is no clearly established solution available today, PICS assumes
1041
that each service will distribute the public keys in some way it chooses.
1042
It also assumes that no keys will ever have to be invalidated. While this
1043
is clearly not a perfect solution, it seems to be the limit of what can be
1044
done today without committing to specific proprietary technology.
1046
There is one additional problem with the digital signature solution outlined
1047
above. If a rating service allows other people to generate labels under its
1048
name (for example, a service that supports self-ratings by content producers)
1049
then the labels may need to be signed by <EM>both</EM> the service and the
1050
content producer. This can be done (each signs the label without the other's
1051
signature), but it becomes quite difficult to distribute the public keys
1052
needed to verify the signature. The PICS specification does not propose a
1053
solution to this problem (it, too, is part of active commercial competition).
1059
PICS specifically requires the use of the RSA signature algorithm with the
1060
MD5 message digest. Should this system become outdated, the PICS specification
1061
can be easily updated to add a new label option that supports a different
1064
PICS does not specify the key length to be used for the digital signatures.
1065
Individual services will need to investigate the legal and technical
1066
ramifications involved and make a choice. Should a single answer become common,
1067
this specification may be re-issued with this detail filled in.
1069
The special form of the label that is used for signatures is computed as
1073
The service must decide which options it will include in the signed label
1074
when it is transmitted. Any options not transmitted with the signature cannot
1075
be used in the computation of the signature. We recommend that <EM>all</EM>
1076
options with known values be included with the exception of
1077
<B>signature-rsa-md5</B>. Any option may be omitted, but it will be common
1078
for the options <B>mic-md5</B> (or <B>md5</B>) and <B>full</B> (or
1079
<B>complete-label</B>) to be omitted. The <B>signature-rsa-md5</B> option
1080
is <EM>never</EM> included in the list of options.
1082
The selected options are sorted alphabetically by their shortest name (i.e.
1083
use <B>full</B> instead of <B>complete-label</B>). If a selected option has
1084
a default value and it is the same as the value to be used in the label,
1085
the option is omitted from this list.
1087
For each option in the list (in order), the short name is put into the label
1088
followed by a single space followed by the value of the option, followed
1089
by a space. The shortest form of a value is used, and strings are output
1090
in lower case if they are case insensitive.
1092
After all of the options has been output, output the characters "r (".
1094
Output the transmission names and their values, in alphabetical order by
1095
transmission name (using the US-ASCII character collating sequence for
1096
"alphabetical order"), separating the transmission name from the value by
1097
a single space. In outputting the value, no whitespace is permitted except
1098
for a single space used to separate items in a <I>multi-value</I>.
1103
When the client computes the special label format described above, it will
1104
use all options available to it: both those in the <EM>single-label</EM>
1105
and in the <EM>service-info</EM>. This implies a constraint on the server
1106
when it decides what options to include in the transmitted set. The transmitted
1107
set must include any options that the server ships as part of the
1108
<EM>service-info</EM>, unless either the value specified in the
1109
<EM>service-info</EM> or the value of the option for this label is the default
1110
value of the option.
1113
<A NAME="Glossary">Glossary</A>
1117
application/pics-labels
1119
A new MIME data type used to transmit one or more <I>labels</I>, defined
1122
application/pics-service
1124
A new MIME data type used to describe a <I>rating service</I>, defined in
1125
<A href="http://w3.org/PICS/services.html">Rating Services and Rating
1130
Backus-Naur Form (or Backus Normal Form). A notation for describing a formal
1131
syntax, used extensively in describing programming languages and
1132
computer-readable data formats.
1136
The part of a rating system which describes a particular criterion used for
1137
rating. For example, a rating system might have three categories named "sexual
1138
material," "violence," and "vocabulary." Also called a <I>dimension</I>.
1142
A data structure containing information about a given document's contents.
1143
Also called a <I>rating</I> or <I>content rating</I>. The content label may
1144
accompany the document it is about or be available separately.
1148
See <I>content label</I>.
1152
See <I>category</I>.
1156
Any item that can be referred to by a URL. Also known, in other contexts,
1157
as a "hypertext page" or a "resource."
1161
HyperText Markup Language. A means of representing <I>hypertext</I> documents.
1162
Based on <I>SGML</I>. See the
1163
<A HREF="http://ds.internic.net/rfc/rfc1866.txt">HTML 2.0 Proposed
1168
HyperText Transfer Protocol. Used for retrieving document contents and/or
1169
descriptive header information. See the
1170
<A href="http://w3.org/pub/WWW/Protocols/HTTP1.0/draft-ietf-http-spec.html">draft
1171
HTTP specification</A>.
1175
Text, graphics, and other media connected through links.
1179
See <I>content label</I>.
1183
A computer system which supplies, via a computer network, ratings of documents.
1184
It may or may not provide the documents themselves.
1189
<A HREF="ftp://ds.internic.net/rfc/rfc1321.txt">RFC1321</A>, that can be
1190
used to compute a <I>MIC.</I> PICS specifies this particular algorithm for
1195
Message Integrity Check. Also known as a "cryptographic checksum." For PICS,
1196
the importance of a MIC is that a rating service can compute the MIC of a
1197
piece of information when the label is created and that MIC can be put into
1198
the label itself. A client can retrieve the label and the information to
1199
which it is supposed to be attached, recompute the MIC and compare it to
1200
the one in the label. If they match, for all practical purposes, it is a
1201
proof that the label really belongs to the information that has been retrieved.
1202
The particular algorithm specified by PICS to compute the MIC is <I>MD5.</I>
1206
Multimedia Internet Message Extension. A technique for sending arbitrary
1207
data through electronic mail on the Internet. See
1208
<A href="ftp://ds.internic.net/rfc/rfc1521.txt">RFC-1521</A>
1212
Platform for Internet Content Selection, the name for both the suite of
1213
specification documents of which this is a part, and for the organization
1214
writing the documents. For more information, see http://w3.org/PICS
1218
See <I>content label</I>.
1222
See <I>label bureau</I>.
1226
An individual or organization that assigns labels according to some rating
1227
system, and then distributes them, perhaps via a label bureau or via CD-ROM.
1231
A method for rating information. A rating system consists of one or more
1236
The range of permissible values for a category.
1240
Standard Generalized Markup Language. See
1241
<A href="http://www.iso.ch/cate/d16387.html">ISO 8879</A>.
1245
(of a <I>category</I>) The short name intended for use over a network to
1246
refer to the category. This is distinct from the category name in as much
1247
as the transmission name must be language-independent, encoded in US-ASCII,
1248
and as short as reasonably possible. Within a single <I>rating system</I>
1249
the transmission names of all categories must be distinct. URLs, while generally
1250
longer than desired, can be used as transmission names. Hence transmission
1251
names are case sensitive.
1255
Uniform Resource Locator. Described in
1256
<A HREF="ftp://ds.internic.net/rfc/rfc1738.txt">RFC-1738</A>. A URL describes
1257
the location and means of retrieval for a single document. It consists of
1258
three components: the "scheme" (protocol used to retrieve a document, like
1259
"http" or "ftp"), a host name, and a hierarchical document name within that
1260
host. For example "http://w3.org/PICS" is the URL of the PICS home page.
1261
The scheme for retrieving it is "http," the host is "w3.org" and the name
1262
within that host is "PICS". Notice that PICS defines an additional scheme
1263
beyond those listed in RFC-1738, described in
1264
<A href="http://w3.org/PICS/services.html">Rating Services and Rating
1265
Systems</A>, which allows Chat (IRC) rooms to be named.
1272
PICS, <A href="http://w3.org/PICS/services.html">Rating Services and Rating
1273
Systems</A>, Internet Draft, "draft-pics-services-00.txt", 11/21/95.
1275
R. Rivest, "The MD5 Message-Digest Algorithm",
1276
<A href="ftp://ds.internic.net/rfc/rfc1321.txt">RFC 1321</A>, 04/16/1992.
1278
N. Borenstein, N. Freed, "MIME (Multipurpose Internet Mail Extensions) Part
1279
One: Mechanisms for Specifying and Describing the Format of Internet Message
1280
Bodies", <A href="ftp://ds.internic.net/rfc/rfc1521.txt">RFC 1521</A>,
1283
T. Berners-Lee, D. Connolly, "Hypertext Markup Language - 2.0",
1284
<A href="ftp://ds.internic.net/rfc/rfc1866.txt">RFC 1866</A>, 11/03/1995.
1286
T. Berners-Lee, L. Masinter, M. McCahill, "Uniform Resource Locators (URLs)",
1287
<A href="ftp://ds.internic.net/rfc/rfc1738.txt">RFC 1738</A>, 12/20/94.
1290
<A NAME="Acknowledgments">Acknowledgments</A>
1293
Comments and suggestions from the following people are gratefully acknowledged:
1295
Bob Atkinson, Microsoft
1296
Anselm Baird-Smith, W3C
1297
Brenda Baker, Lucent
1298
Scott Berkun, Microsoft
1299
Tim Berners-Lee, W3C
1300
Roxana Bradescu, AT&T
1301
Daniel W. Connolly, W3C
1303
Jay Friedland, SurfWatch
1304
Henrik Frystyk Nielsen, W3C
1305
Philip Gladstone, Raptor Systems
1306
Michael Gordon, Prodigy
1308
Woodson Hobbs, NewView
1312
John C. Klensin, MCI
1314
Ann McCurdy, Microsoft
1315
Rich Petke, CompuServe
1316
Eric Prud'hommeaux, W3C
1318
Gordon Ross, NetNanny
1321
Ray Soular, SafeSurf
1322
Michael Smith, Prodigy
1323
Marcy Swenson, Providence Systems
1327
<A NAME="Appendix A">Appendix A: An Algorithm for Locating a Label Bureau</A>
1331
As the use of PICS grows, we must consider its impact on overall network
1332
performance. In general, the PICS techniques for transmitting labels in or
1333
with documents add only a very small amount of traffic to the net, since
1334
the additional PICS headers will ordinarily contain only a few hundred bytes
1335
of data and the documents themselves are more likely to be several thousand
1336
bytes of data. Furthermore, since the labels come from the same source as
1337
the document itself there is no network hot spot created by PICS (although
1338
popular servers may themselves already be such hot spots).
1340
Label bureaus, however, are a new component proposed by PICS. And if a single
1341
label bureau becomes popular then there is a significant risk of it becoming
1342
a hot spot and hence a performance bottleneck for the PICS system. The Internet
1343
is in need of a good solution to this problem, and there is work (both underway
1344
and proposed) that may solve the problem in the long term.
1346
In the short term, however, there is no truly good solution. The following
1347
suggestion comes from Prof. David Karger at MIT. It is a variant on several
1348
well-known algorithms for distributing load in a system.
1350
First, we assume that popular label bureaus will be able to establish a number
1351
of mirror sites around the network. This is already common practice, and
1352
we have no suggestions for the details of determining the sites or keeping
1353
them updated as new labels are generated. Our algorithm simply assumes that
1354
they exist and are equivalent, and that the network's Domain Name System
1355
(DNS) has records which map the single well-defined name for the label bureau
1356
to multiple Internet addresses, in the usual manner.
1358
When client software starts, it should attempt to resolve the name of the
1359
label bureau it wishes to use (we assume one label bureau, but the algorithm
1360
extends in an obvious manner to multiple bureaus) through DNS. If it receives
1361
more than one host address, it saves the entire list and chooses two at random,
1362
labeling one the "primary" and the other the "secondary" bureau. Alternatively,
1363
these may be configuration parameters of the client software that are then
1364
validated when the software starts. It also divides 60 minutes by the total
1365
number of address it can find for the label bureau, sets a timer to this
1366
value, and remembers this as the "threshold" value.
1368
Every time the client wishes to contact the label bureau it does the following.
1369
If the timer is below the threshold, the primary bureau address is used.
1370
Otherwise, the query is sent to both the primary and the secondary label
1371
bureau address. When the first answer arrives the connection to both label
1372
bureaus is closed down. The bureau which answered first becomes the primary
1373
bureau. In any case, a new secondary bureau address is chosen at random and
1374
the timer is reset to the threshold value.
1376
A simple variant on this algorithm will probably become feasible in the near
1377
future. When the HTTP protocol is updated to allow "keep alive" connections
1378
to a server, the PICS client should keep its connection to the primary label
1379
bureau alive as long as possible. Then, instead of simply accepting the first
1380
response and considering the responder as the primary, a more careful measurement
1381
must be made. The time required to send the query and receive the response
1382
must be measured, rather than the total transaction time: connection setup
1383
costs can be quite high, and would distort the measurement if one compared
1384
the round trip time to the primary bureau through an existing connection
1385
to the time to establish the connection to the secondary bureau plus the
1388
<A NAME="Appendix B">Appendix B: Sample Label Bureau Queries and
1389
Responses</A><A NAME="queries"> </A>
1392
The following queries and responses illustrate many of the features of client
1393
interactions with label bureaus that dispense labels separately from documents.
1394
All four queries request labels for the same three documents, provided by
1395
the same three services. They differ only in the query mode (Generic, Normal,
1396
Tree, Generic+Tree).
1398
Labels are requested for the following URLs:
1401
http://www.w3.org/pub/WWW/
1403
http://www.w3.org/pub/WWW/TheProject.html
1405
http://www.w3.org/unknown
1408
Labels are requested from the following services:
1411
http://www.ages.org/our-service/v1.0/
1413
http://www.rsac.org/v1.0
1418
The server has the following relevant labels:
1424
"http://www.w3.org/pub" (generic)
1426
"http://www.w3.org/pub/WWW/" (generic)
1428
"http://www.w3.org/pub/WWW/Daemon" (generic)
1430
"http://www.w3.org/pub/WWW/PICS" (generic)
1432
"http://www.w3.org/pub/WWW/Overview.html"
1438
"http://www.w3.org/pub/WWW" (generic)
1440
"http://www.w3.org/pub/WWW/Daemon" (generic)
1442
"http://www.w3.org/pub/WWW/PICS" (generic)
1444
"http://www.w3.org/pub/WWW/Daemon/Overview.html"
1446
"http://www.w3.org/pub/WWW/TheProject.html"
1449
unknown.com rating service<BR>
1453
The query responses have been pretty-printed for readability. Comment lines,
1454
beginning with ';' have been added to explain the responses. Query requests
1455
have been split onto multiple lines for display purposes; they are actually
1456
sent as single (very long) lines.
1461
This request is for full generic labels that apply to the three documents.
1463
<B>Client sends request to server:</B>
1465
GET /ratings?opt=generic&format=full&
1466
u="http%3A%2F%2Fwww.w3.org%2Fpub%2FWWW%2F+&"
1467
u="http%3A%2F%2Fwww.w3.org%2Fpub%2FWWW%2FTheProject.html&"
1468
u="http%3A%2F%2Fwww.w3.org%2Funknown&"
1469
s="http%3A%2F%2Fwww.ages.org%2Four-service%2Fv1.0%2F&"
1470
s="http%3A%2F%2Fwww.rsac.org%2Fv1.0&"
1471
s="http%3A%2F%2Funknown.com" HTTP/1.0
1474
<B>Server responds to client:</B>
1478
Content-Type: application/pics-labels
1480
Date: 15 Apr 1996 18:20:47 GMT
1483
"http://www.ages.org/our-service/v1.0/" <I>;first service</I>
1485
for "http://www.w3.org/pub/WWW/"
1488
ratings (age 11) <I>;end of first label, since 'ratings' is always </I>
1489
<I> ;last part of a label. The same generic label</I>
1490
<I> ;applies also to any URL beginning</I>
1491
<I> ;http://www.w3.org/pub/WWW/TheProject.html </I>
1492
for "http://www.w3.org/pub/WWW/"
1495
ratings (age 11) <I>;end of second label</I>
1497
error (not-labeled "http://www.w3.org/unknown")
1498
<I>;no label available for third document</I>
1499
<I> ;three labels requested, so end of first service</I>
1500
"http://www.rsac.org/v1.0"
1502
for "http://www.w3.org/pub/WWW"
1505
ratings (v 0 s 0 n 0 l 0)
1506
for "http://www.w3.org/pub/WWW"
1509
ratings (v 0 s 0 n 0 l 0)
1510
error (not-labeled "http://www.w3.org/unknown")
1512
<I>;;no labels for third service</I>
1513
error (no-ratings "unknown service"))
1519
This query requests full specific labels for each of the documents.
1521
<B>Client sends request to server:</B>
1523
GET /ratings?opt=normal&format=full&
1524
u="http%3A%2F%2Fwww.w3.org%2Fpub%2FWWW%2F+&"
1525
u="http%3A%2F%2Fwww.w3.org%2Fpub%2FWWW%2FTheProject.html&"
1526
u="http%3A%2F%2Fwww.w3.org%2Funknown&"
1527
s="http%3A%2F%2Fwww.ages.org%2Four-service%2Fv1.0%2F&"
1528
s="http%3A%2F%2Fwww.rsac.org%2Fv1.0&"
1529
s="http%3A%2F%2Funknown.com" HTTP/1.0
1532
<B>Server responds to client:</B>
1536
Content-Type: application/pics-labels
1538
Date: 15 Apr 1996 18:20:54 GMT
1540
"http://www.ages.org/our-service/v1.0/"
1542
<I>;;no specific label available, so generic label returned</I>
1543
for "http://www.w3.org/pub/WWW/"
1547
<I>;;no specific label available, so generic label returned</I>
1548
for "http://www.w3.org/pub/WWW/"
1552
error (not-labeled "http://www.w3.org/unknown")
1553
"http://www.rsac.org/v1.0"
1555
<I>;;no specific label available, so generic label returned</I>
1556
for "http://www.w3.org/pub/WWW"
1559
ratings (v 0 s 0 n 0 l 0)
1560
<I>;;here a specific label is returned.</I>
1561
for "http://www.w3.org/pub/WWW/TheProject.html"
1564
ratings (v 0 s 0 n 0 l 0)
1565
error (not-labeled "http://www.w3.org/unknown")
1566
error (no-ratings "unknown service"))
1572
This request is for full specific labels for all URLs that have the requested
1573
URLs as a prefix. This label bureau responds to tree queries by sending only
1574
labels for documents in the current directory.
1576
<B>Client sends request to server:</B>
1578
GET /ratings?opt=tree&format=full&
1579
u="http%3A%2F%2Fwww.w3.org%2Fpub%2FWWW%2F+&"
1580
u="http%3A%2F%2Fwww.w3.org%2Fpub%2FWWW%2FTheProject.html&"
1581
u="http%3A%2F%2Fwww.w3.org%2Funknown&"
1582
s="http%3A%2F%2Fwww.ages.org%2Four-service%2Fv1.0%2F&"
1583
s="http%3A%2F%2Fwww.rsac.org%2Fv1.0&"
1584
s="http%3A%2F%2Funknown.com" HTTP/1.0
1587
<B>Server responds to client:</B>
1590
Content-Length: 1075
1591
Content-Type: application/pics-labels
1593
Date: 15 Apr 1996 18:21:00 GMT
1595
"http://www.ages.org/our-service/v1.0/"
1597
<I>;;several labels delimited by ()</I>
1598
(for "http://www.w3.org/pub/WWW/"
1602
for "http://www.w3.org/pub/WWW/Overview.html"
1607
for "http://www.w3.org/pub/WWW/PICS"
1611
for "http://www.w3.org/pub/WWW/Daemon"
1614
<I>;;end of labels for directory http://www.w3.org/pub/WWW/</I>
1615
<I>;;no labels available for URLs containing</I>
1616
<I> ;;http://www.w3.org/pub/WWW/TheProject.html as a prefix</I>
1617
error (not-labeled "http://www.w3.org/pub/WWW/TheProject.html")
1618
error (not-labeled "http://www.w3.org/unknown")
1619
"http://www.rsac.org/v1.0"
1621
(for "http://www.w3.org/pub/WWW"
1624
ratings (v 0 s 0 n 0 l 0)
1625
for "http://www.w3.org/pub/WWW/TheProject.html"
1628
ratings (v 0 s 0 n 0 l 0)
1629
for "http://www.w3.org/pub/WWW/Daemon"
1632
ratings (v 0 s 0 n 0 l 0)
1633
for "http://www.w3.org/pub/WWW/PICS"
1636
ratings (v 0 s 0 n 0 l 0))
1637
error (not-labeled "http://www.w3.org/pub/WWW/TheProject.html")
1639
error (not-labeled "http://www.w3.org/unknown")
1641
error (no-ratings "unknown service"))
1647
This query requests all generic labels for URLs that contain the requested
1648
URLs as prefixes. A subset of the labels returned for the previous query
1649
are returned here: only those that are generic.
1651
<B>Client sends request to server:</B>
1653
GET /ratings?opt=generic%2Btree&
1655
u="http%3A%2F%2Fwww.w3.org%2Fpub%2FWWW%2F+&"
1656
u="http%3A%2F%2Fwww.w3.org%2Fpub%2FWWW%2FTheProject.html&"
1657
u="http%3A%2F%2Fwww.w3.org%2Funknown&"
1658
s="http%3A%2F%2Fwww.ages.org%2Four-service%2Fv1.0%2F&"
1659
s="http%3A%2F%2Fwww.rsac.org%2Fv1.0&"
1660
s="http%3A%2F%2Funknown.com" HTTP/1.0
1663
<B>Server responds to client:</B>
1667
Content-Type: application/pics-labels
1669
Date: 15 Apr 1996 18:38:28 GMT
1671
"http://www.ages.org/our-service/v1.0/"
1673
(for "http://www.w3.org/pub/WWW/"
1678
for "http://www.w3.org/pub/WWW/PICS"
1682
for "http://www.w3.org/pub/WWW/Daemon"
1686
error (not-labeled "http://www.w3.org/pub/WWW/TheProject.html")
1688
error (not-labeled "http://www.w3.org/unknown")
1689
"http://www.rsac.org/v1.0"
1691
(for "http://www.w3.org/pub/WWW"
1694
ratings (v 0 s 0 n 0 l 0)
1695
for "http://www.w3.org/pub/WWW/Daemon"
1698
ratings (v 0 s 0 n 0 l 0)
1699
for "http://www.w3.org/pub/WWW/PICS"
1702
ratings (v 0 s 0 n 0 l 0))
1704
error (not-labeled "http://www.w3.org/pub/WWW/TheProject.html")
1706
error (not-labeled "http://www.w3.org/unknown")
1708
error (no-ratings "unknown service"))
1711
<ADDRESS><A HREF="mailto:web-human@w3.org">Webmaster</A><BR>$Date: 1996/12/09 03:45:13 $