2
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
3
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
6
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
7
<title>XML Information Set</title>
8
<style type="text/css">
9
.xml-def {padding-left: 24pt}
10
.xml-syntax {padding-left: 24pt}
12
<link href="http://www.w3.org/StyleSheets/TR/W3C-REC" type="text/css" rel="stylesheet"/>
13
<meta name="RCSId" content="$Id: Overview.html,v 1.2 2001/10/24 20:38:27 dom Exp $"/>
15
<body> <div class="head">
16
<a href="http://www.w3.org/">
17
<img height="48" width="72" alt="W3C" src="http://www.w3.org/Icons/w3c_home" />
19
<div align="center"><h1>XML Information Set</h1> <h2
20
class="nonum">W3C Recommendation 24 October 2001</h2></div> <dl>
21
<dt>This version:</dt>
22
<dd><a href="http://www.w3.org/TR/2001/REC-xml-infoset-20011024">http://www.w3.org/TR/2001/REC-xml-infoset-20011024</a></dd>
23
<dt>Latest version:</dt>
24
<dd><a href="http://www.w3.org/TR/xml-infoset">
25
http://www.w3.org/TR/xml-infoset
27
<dt>Previous version:</dt>
28
<dd><a href="http://www.w3.org/TR/2001/PR-xml-infoset-20010810">
29
http://www.w3.org/TR/2001/PR-xml-infoset-20010810
32
<dd>John Cowan, <a href="mailto:jcowan@reutershealth.com">jcowan@reutershealth.com</a></dd>
33
<dd>Richard Tobin, <a href="mailto:richard@cogsci.ed.ac.uk">richard@cogsci.ed.ac.uk</a></dd>
34
</dl> <p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice-20000612#Copyright">
35
Copyright</a> ©1999, 2000, 2001 <a href="http://www.w3.org/"><abbr title="World Wide Web
36
Consortium">W3C</abbr></a><sup>®</sup> (<a href="http://www.lcs.mit.edu/"><abbr
37
title="Massachusetts Institute of Technology">MIT</abbr></a>, <a href="http://www.inria.fr/"><abbr
38
title="Institut National de Recherche en
39
Informatique et Automatique" lang="fr">INRIA</abbr></a>, <a href="http://www.keio.ac.jp/">
40
Keio</a>), All Rights Reserved. W3C <a href="http://www.w3.org/Consortium/Legal/ipr-notice-20000612#Legal_Disclaimer">
41
liability</a>, <a href="http://www.w3.org/Consortium/Legal/ipr-notice-20000612#W3C_Trademarks">
42
trademark</a>, <a href="http://www.w3.org/Consortium/Legal/copyright-documents-19990405">
43
document use</a> and <a href="http://www.w3.org/Consortium/Legal/copyright-software-19980720">
44
software licensing</a> rules apply.</p> </div> <hr />
46
<h2 class="nonum"><a name="abstract">Abstract</a></h2>
47
<p>This specification provides a set of definitions for use in other
48
specifications that need to refer to the information in an XML document.
53
<h2 class="nonum"><a name="status">Status of this Document</a></h2>
55
<p><em>This section describes the status of this document at the time of its
56
publication. Other documents may supersede this document. The latest status
57
of this document series is maintained at the W3C.</em></p>
60
<a href="/Consortium/Process-20010719/process.html#RecsW3C">Recommendation</a>
61
of the XML Information Set.</p>
63
<p>This document has been reviewed by W3C Members and other interested parties
64
and has been endorsed by the Director as a W3C Recommendation.
65
It is a stable document and may be used as reference material or
66
cited as a normative reference from another document.
67
W3C's role in making the Recommendation is to draw
68
attention to the specification and to promote its widespread deployment.
69
This enhances the functionality and interoperability of the Web.</p>
72
This document has been produced by the
73
W3C XML Core Working Group as part of the XML Activity
74
in the W3C Architecture Domain.
75
For background on this work, please see the
76
<a href="http://www.w3.org/XML/Activity">XML Activity Statement</a>.</p>
79
There are patent disclosures associated with the XML Information Set;
80
these may be found on the
81
<a href="http://www.w3.org/2001/10/02/xml-infoset-IPR-statements.html">XML Infoset Patent Statement page</a>
83
<a href="http://www.w3.org/Consortium/Process-20010719/#ipr">W3C policy</a>.
87
Please report errors in this document to
88
<a href="mailto:www-xml-infoset-comments@w3.org">www-xml-infoset-comments@w3.org</a> -
89
(<a href="http://lists.w3.org/Archives/Public/www-xml-infoset-comments/">public
92
<a href="http://www.w3.org/2001/10/02/xml-infoset-errata.html">known errors</a>
93
in this specification is available at
94
<a href="http://www.w3.org/2001/10/02/xml-infoset-errata.html">http://www.w3.org/2001/10/02/xml-infoset-errata.html</a>
98
The English version of this specification is the only normative version.
99
Information about translations of this document is available at
100
<a href="http://www.w3.org/XML/#trans">http://www.w3.org/XML/#trans</a>.</p>
103
A list of <a href="http://www.w3.org/TR">current W3C Recommendations and
104
other technical documents</a> can be found at
105
<a href="http://www.w3.org/TR">http://www.w3.org/TR</a>.
111
<h2 class="nonum">Contents</h2>
112
<ul style="list-style-type: none;">
113
<li><a href="#intro">1. Introduction</a></li>
115
<a href="#infoitem">2. Information Items</a>
116
<ul style="list-style-type: none;">
117
<li><a href="#infoitem.document">2.1 The Document Information Item</a></li>
118
<li><a href="#infoitem.element">2.2 Element Information Items</a></li>
119
<li><a href="#infoitem.attribute">2.3 Attribute Information Items</a></li>
120
<li><a href="#infoitem.pi">2.4 Processing Instruction Information Items</a></li>
121
<li><a href="#infoitem.rse">2.5 Unexpanded Entity Reference Information Items</a></li>
122
<li><a href="#infoitem.character">2.6 Character Information Items</a></li>
123
<li><a href="#infoitem.comment">2.7 Comment Information Items</a></li>
124
<li><a href="#infoitem.doctype">2.8 The Document Type Declaration Information Item</a></li>
125
<li><a href="#infoitem.entity.unparsed">2.9 Unparsed Entity Information Items</a></li>
126
<li><a href="#infoitem.notation">2.10 Notation Information Items</a></li>
127
<li><a href="#infoitem.namespace">2.11 Namespace Information Items</a></li>
130
<li><a href="#conformance">3. Conformance</a></li>
131
<li><a href="#references">Appendix A: References</a></li>
132
<li><a href="#reporting">Appendix B: XML 1.0 Reporting Requirements (informative)</a></li>
133
<li><a href="#example">Appendix C: Example (informative)</a></li>
134
<li><a href="#omitted">Appendix D: What is not in the Information Set</a></li>
135
<li><a href="#rdfschema">Appendix E: RDF Schema (informative)</a></li>
140
<h2><a name="intro">1. Introduction </a></h2>
141
<p>This specification defines an abstract data set called
142
the <dfn><strong>XML Information Set</strong></dfn>
143
(<dfn><strong>Infoset</strong></dfn>).
144
Its purpose is to provide a consistent set of definitions for use
145
in other specifications that need to refer to the information in a well-formed
146
XML document <a href="#XML">[XML]</a>.
149
It does not attempt to be exhaustive; the primary criterion for inclusion
150
of an information item or property has been that of expected usefulness
151
in future specifications. Nor does it constitute a minimum set of
152
information that must be returned by an XML processor.
156
An XML document has an information set if it is well-formed and
157
satisfies the namespace constraints described
158
<a href="#intro.namespaces">below</a>.
159
There is no requirement
160
for an XML document to be valid in order to have an information set.
164
Information sets may be created by methods (not described in this
165
specification) other than parsing an XML document.
166
See <a href="#intro.synthetic">Synthetic Infosets</a> below.
170
An XML document's information set consists of a number of
171
<dfn><strong>information items</strong></dfn>;
172
the information set for any well-formed XML document
173
will contain at least a
174
<a href="#infoitem.document">document</a> information item
176
An information item is an abstract description of some part of an XML
177
document: each information item has a set of associated named
178
<dfn><strong>properties</strong></dfn>. In this specification, the
179
property names are shown in square brackets, <strong>[thus]</strong>.
180
The types of information item are listed in
181
<a href="#infoitem">section 2</a>.
186
Information Set does not require or favor a specific interface or class of
187
interfaces. This specification presents the information set as a modified
188
tree for the sake of clarity and simplicity, but there is no requirement that
189
the XML Information Set be made available through a tree structure; other
190
types of interfaces, including (but not limited to) event-based and query-based
191
interfaces, are also capable of providing information conforming to the XML
195
The terms "information set" and "information
196
item" are similar in meaning to the generic terms "tree" and "node", as they
197
are used in computing. However, the former terms are used in this specification
198
to reduce possible confusion with other specific data models. Information
199
items do <em>not</em> map one-to-one with the nodes of the DOM or the "tree"
200
and "nodes" of the XPath data model.
204
In this specification, the words "must",
205
"should", and "may" assume the meanings specified in
206
<a href="#RFC2119">[RFC2119]</a>, except that the words do not appear in
210
<h3><a name="intro.namespaces">Namespaces</a></h3>
213
XML 1.0 documents that do not conform to
214
<a href="#Namespaces">[Namespaces]</a>,
215
though technically well-formed,
216
are not considered to have meaningful information sets.
217
That is, this specification does not define an information
218
set for documents that have element or attribute names containing colons that
219
are used in other ways than as prescribed by
220
<a href="#Namespaces">[Namespaces]</a>.
224
Furthermore, this specification does not define an information set for
225
documents which use relative URI references in namespace declarations.
226
This is in accordance with the decision of the W3C XML Plenary Interest
227
Group described in <a href="#RelNS">[Relative Namespace URI References]</a>.
231
The value of a [namespace name] property is the normalized value of
232
the corresponding namespace attribute; no additional URI escaping is
233
applied to it by the processor.
236
<h3><a name="intro.entities">Entities</a></h3>
239
An information set describes its XML document with entity
240
references already expanded, that is, represented by the information
241
items corresponding to their replacement text. However, there are
242
various circumstances in which a processor may not perform this
243
expansion. An entity may not be declared, or may not be retrievable.
244
A non-validating processor may choose not to read all declarations,
245
and even if it does, may not expand all external entities. In these
247
<a href="#infoitem.rse">unexpanded entity reference</a>
248
information item is used to represent the entity reference.
251
<h3><a name="intro.eol">End-of-Line Handling</a></h3>
253
The values of all properties in the Infoset
254
take account of the end-of-line normalization described in
255
<a href="#XML">[XML]</a>, 2.11 "End-of-Line Handling".
258
<h3><a name="intro.baseURIs">Base URIs</a></h3>
260
Several information items have a [base URI] or [declaration base URI] property.
261
These are computed according to
262
<a href="#XMLBase">[XML Base]</a>.
263
Note that retrieval of a resource may involve redirection
264
at the parser level (for example, in an entity resolver) or below;
265
in this case the base URI is the final URI used to retrieve the resource
266
after all redirection.
269
The value of these properties does not reflect any URI escaping that
270
may be required for retrieval of the resource, but it may include
271
escaped characters if these were specified in the document, or returned
272
by a server in the case of redirection.
275
In some cases (such as a document read from a string or a pipe) the
277
<a href="#XMLBase">[XML Base]</a>
278
may result in a base URI being application
279
dependent. In these cases this specification does not define
280
the value of the [base URI] or [declaration base URI] property.
283
When resolving relative URIs the [base URI] property should be used in
284
preference to the values of xml:base attributes; they may be inconsistent
285
in the case of <a href="#intro.synthetic">Synthetic Infosets</a>.
288
<h3><a name="intro.null">``Unknown'' and ``No Value''</a></h3>
290
Some properties may sometimes have the value
291
<dfn><strong>unknown</strong></dfn> or
292
<dfn><strong>no value</strong></dfn>,
293
and it is said that a property value is unknown or that a property
294
has no value respectively.
295
These values are distinct from each other and from all other values.
296
In particular they are distinct from the empty string, the empty set,
297
and the empty list, each of which simply has no members.
298
This specification does not use the term <strong>null</strong> since in some
299
communities it has particular connotations which may not match those
303
<h3><a name="intro.synthetic">Synthetic Infosets</a></h3>
305
This specification describes the information set resulting from parsing
306
an XML document. Information sets may be constructed by other means,
307
for example by use of an API such as the DOM or by transforming an
308
existing information set.
312
An information set corresponding to a real document will necessarily
313
be consistent in various ways; for example the [in-scope namespaces]
314
property of an element will be consistent with the [namespace
315
attributes] properties of the element and its ancestors. This may not
316
be true of an information set constructed by other means; in such a case
317
there will be no XML document corresponding to the information set,
318
and to serialize it will require resolution of the inconsistencies
319
(for example, by outputting namespace declarations that correspond to
320
the namespaces in scope).
326
<h2><a name="infoitem">2. Information Items</a></h2>
328
information set can contain up to eleven different types of information item,
329
as explained in the following sections. Every information item has properties.
330
For ease of reference, each property is given a name, indicated
331
<strong>[thus]</strong>.
332
Links to a definition and/or syntax in the XML 1.0
333
Recommendation <a href="#XML">[XML]</a> are given for each information item.
336
<h3><a name="infoitem.document">2.1. The Document Information Item</a></h3>
337
<p class="xml-def"><em><strong>XML Definition:
338
</strong> <a href="http://www.w3.org/TR/REC-xml#dt-xml-doc">document</a> (Section
339
2, <cite>Documents</cite>)</em></p> <p class="xml-syntax"><em><strong>
340
XML Syntax:</strong> [1] <a href="http://www.w3.org/TR/REC-xml#NT-document">
341
Document</a> (Section 2.1, <cite>Well-Formed XML Documents</cite>)</em></p>
342
<p>There is exactly one <dfn><strong>document information item</strong></dfn>
343
in the information set, and all other information items are accessible from
344
the properties of the document information item, either directly or indirectly
345
through the properties of other information items.</p> <p>The document information
346
item has the following properties:</p> <ol>
347
<li><strong>[children]</strong> An ordered list of child information items,
348
in document order. The list contains exactly one <a href="#infoitem.element">
349
element</a> information item. The list also contains one <a href="#infoitem.pi">
350
processing instruction</a> information item for each processing instruction
351
outside the document element, and one <a href="#infoitem.comment">comment</a> information item for each comment outside
352
the document element. Processing instructions and comments within the DTD
353
are excluded. If there is a document type declaration, the list also
354
contains a <a href="#infoitem.doctype">document type declaration</a>
355
information item.</li>
356
<li><strong>[document element]</strong>
357
The <a href="#infoitem.element">element</a> information item corresponding to the document element.
359
<li><strong>[notations]</strong> An unordered set of <a href="#infoitem.notation">
360
notation</a> information items, one for each notation declared in the DTD.
362
<li><strong>[unparsed entities]</strong> An unordered set of
363
<a href="#infoitem.entity.unparsed">unparsed entity</a>
364
information items, one for each unparsed entity declared
367
<li><strong>[base URI]</strong> The base URI of the document entity.
369
<li><strong>[character encoding scheme]</strong>
370
The name of the character encoding scheme in which the document entity
373
<li><strong>[standalone]</strong> An indication of the standalone status of
374
the document, either yes or no. This property is derived
375
from the optional standalone document declaration in
376
the XML declaration at the beginning of the document
377
entity, and has no value if there is no standalone document declaration.</li>
378
<li><strong>[version]</strong> A string representing the XML version of the
379
document. This property is derived from the XML declaration optionally present
380
at the beginning of the document entity, and has no value if there is no
381
XML declaration.</li>
383
<strong>[all declarations processed]</strong> This property is not
384
strictly speaking part of the infoset of the document. Rather it is
385
an indication of whether the processor has read the complete DTD.
386
Its value is a boolean. If it is false, then certain
387
properties (indicated in their descriptions below) may be unknown.
388
If it is true, those properties are never unknown.
392
<h3><a name="infoitem.element">2.2. Element Information Items</a></h3>
393
<p class="xml-def"><em><strong>XML Definition:</strong> <a href="http://www.w3.org/TR/REC-xml#dt-element">element</a> (Section 3, <cite>
394
Logical Structures</cite>)</em></p> <p class="xml-syntax"><em><strong>
395
XML Syntax:</strong> [39] <a href="http://www.w3.org/TR/REC-xml#NT-element">
396
Element</a> (Section 3, <cite>Logical Structures</cite>)</em></p>
397
<p>There is an <dfn><strong>element information item</strong></dfn> for each
398
element appearing in the XML document. One of the element information items
399
is the value of the [document element] property of the document information
400
item, corresponding to the root of the element tree, and all
401
other element information items are accessible by recursively following
402
its [children] property.</p>
403
<p>An element information item has the following
405
<li><strong>[namespace name]</strong> The namespace name, if any, of the element
406
type. If the element does not belong to a namespace, this property
409
<li><strong>[local name]</strong> The local part of the element-type name.
410
This does not include any namespace prefix or following colon.</li>
411
<li><strong>[prefix]</strong> The namespace prefix part of the element-type
412
name. If the name is unprefixed, this property
413
has no value. Note that namespace-aware applications should use
414
the namespace name rather than the prefix to identify elements.
416
<li><strong>[children]</strong> An ordered list of child information items,
417
in document order. This list contains <a href="#infoitem.element">element</a>,
418
<a href="#infoitem.pi">processing instruction</a>, <a href="#infoitem.rse">
419
unexpanded entity reference</a>, <a href="#infoitem.character">character</a>,
420
and <a href="#infoitem.comment">comment</a> information items, one for each
421
element, processing instruction, reference to an unprocessed external entity,
422
data character, and comment appearing immediately within the current element.
423
If the element is empty, this list has no members.</li>
424
<li><strong>[attributes]</strong> An unordered set of <a href="#infoitem.attribute">
425
attribute</a> information items, one for each of the attributes (specified
426
or defaulted from the DTD) of this element. Namespace declarations
427
do not appear in this set.
428
If the element has no attributes, this
429
set has no members.</li>
430
<li><strong>[namespace attributes]</strong> An unordered set of <a href="#infoitem.attribute">
431
attribute</a> information items, one for each of the namespace
432
declarations (specified or defaulted from the DTD) of this element.
433
A declaration of the form <code>xmlns=""</code>, which undeclares the
434
default namespace, counts as a namespace declaration.
435
By definition, all namespace attributes (including
436
those named <code>xmlns</code>, whose [prefix] property
437
has no value) have a namespace
438
URI of <code>http://www.w3.org/2000/xmlns/</code>.
439
If the element has no namespace declarations, this set
442
<li><strong>[in-scope namespaces]</strong> An unordered set
443
of <a href="#infoitem.namespace">
444
namespace</a> information items, one for each of the namespaces
445
in effect for this element. This set always contains an item with
446
the prefix <code>xml</code> which is implicitly bound to the
447
namespace name <code>http://www.w3.org/XML/1998/namespace</code>.
448
It does not contain an item with the prefix <code>xmlns</code> (used
449
for declaring namespaces), since
450
an application can never encounter an element or attribute with that
452
The set will include namespace items corresponding to all of the
453
members of [namespace attributes], except for any representing
454
a declaration of the form <code>xmlns=""</code>, which does not declare a
455
namespace but rather undeclares the default namespace.
456
When resolving the prefixes of qualified names this property should be
457
used in preference to the [namespace attributes] property; they may be
458
inconsistent in the case of <a href="#intro.synthetic">Synthetic
461
<li><strong>[base URI]</strong> The base URI of the element.
463
<li><strong>[parent]</strong> The document or element information item which
464
contains this information item in its [children] property.</li>
467
<h3><a name="infoitem.attribute">2.3. Attribute Information Items</a></h3>
468
<p class="xml-def"><em><strong>XML Definition:</strong> <a href="http://www.w3.org/TR/REC-xml#dt-attr">attribute</a> (Section 3.1, <cite>
469
Start-Tags, End-Tags, and Empty-Element Tags</cite>)</em></p>
470
<p class="xml-syntax"><em><strong>XML Syntax:</strong> [41] <a href="http://www.w3.org/TR/REC-xml#NT-Attribute">Attribute</a> (Section 3.1, <cite>
471
Start-Tags, End-Tags, and Empty-Element Tags</cite>)</em></p>
472
<p>There is an <dfn><strong>attribute information item</strong></dfn> for
473
each attribute (specified or defaulted) of each element in the document,
474
including those which are namespace declarations. The latter however
475
appear as members of an element's [namespace attributes] property rather
476
than its [attributes] property.
477
</p> <p>Attributes declared in the DTD with no default value
478
and not specified in the element's start tag are not represented by
479
attribute information items.</p>
481
<p>An attribute information item has the
482
following properties:</p> <ol>
483
<li><strong>[namespace name]</strong> The namespace name, if any, of the attribute.
484
Otherwise, this property has no value.
486
<li><strong>[local name]</strong> The local part of the attribute name.
487
This does not include any namespace prefix or following colon.</li>
488
<li><strong>[prefix]</strong> The namespace prefix part of the attribute
489
name. If the name is unprefixed, this property
491
Note that namespace-aware applications should use
492
the namespace name rather than the prefix to identify attributes.
494
<li><strong>[normalized value]</strong> The normalized attribute value (see <a href="http://www.w3.org/TR/REC-xml#AVNormalize">3.3.3 Attribute-Value Normalization
495
</a> <a href="#XML">[XML]</a>).</li>
496
<li><strong>[specified]</strong> A flag indicating whether this attribute
497
was actually specified in the start-tag of its element, or was defaulted from
499
<li><strong>[attribute type]</strong> An indication of the type declared for
500
this attribute in the DTD. Legitimate values are ID, IDREF, IDREFS, ENTITY,
501
ENTITIES, NMTOKEN, NMTOKENS, NOTATION, CDATA, and ENUMERATION.
502
If there is no declaration for the attribute, this property has no value.
503
If no declaration has been read, but the [all declarations processed]
504
property of the document information item is false (so there may be an
505
unread declaration), then the value of this property is unknown.
506
Applications should treat no value and unknown as equivalent to
509
<li><strong>[references]</strong>
510
If the attribute type is ID, NMTOKEN, NMTOKENS, CDATA, or ENUMERATION,
511
this property has no value. If the attribute type is unknown,
512
the value of this property is unknown. Otherwise (that is,
513
if the attribute type is IDREF, IDREFS, ENTITY, ENTITIES, or NOTATION),
514
the value of this property is an ordered list of the
515
<a href="#infoitem.element">element</a>,
516
<a href="#infoitem.entity.unparsed">unparsed entity</a>, or
517
<a href="#infoitem.notation">notation</a>
519
referred to in the attribute value, in the order that they appear there.
520
In this case, if the attribute value is syntactically
521
invalid, this property has no value.
522
If the type is IDREF or IDREFS and any of the IDs does not appear as
523
the value of an ID attribute in the document, or if the type is
524
ENTITY, ENTITIES or NOTATION and no declaration has been read for any
525
of the entities or the notation, then this property has no value
526
or is unknown, depending on whether the [all declarations processed]
527
property of the document information item is true or false.
528
If the type is IDREF or IDREFS and any of the IDs appears as the
529
value of more than one ID attribute in the document, then this property
532
<li><strong>[owner element]</strong> The element information item which contains
533
this information item in its [attributes] property.</li>
536
<h3><a name="infoitem.pi">2.4. Processing Instruction Information Items</a></h3>
537
<p class="xml-def"><em><strong>XML Definition:
538
</strong> <a href="http://www.w3.org/TR/REC-xml#dt-pi">processing instruction
539
</a> (Section 2.6, <cite>Processing Instructions</cite>)</em></p>
540
<p class="xml-syntax"><em><strong>XML Syntax:</strong> [16] <a href="http://www.w3.org/TR/REC-xml#NT-PI">PI</a> (Section 2.6, <cite>Processing
541
Instructions</cite>)</em></p> <p>There is a <dfn><strong>
542
processing instruction information item</strong></dfn> for each processing
543
instruction in the document. The XML declaration and text declarations for
544
external parsed entities are not considered processing instructions. </p>
545
<p>A processing instruction information item has the following properties:
547
<li><strong>[target]</strong> A string representing the target part of the
548
processing instruction (an XML name).</li>
549
<li><strong>[content]</strong> A string representing the content of the processing
550
instruction, excluding the target and any white space immediately following
551
it. If there is no such content, the value of this property will be an empty
553
<li><strong>[base URI]</strong> The base URI of the PI.
554
Note that if an infoset is serialized as an XML document, it will not be
555
possible to preserve the base URI of any PI that originally appeared at
556
the top level of an external entity, since there is no syntax for PIs
557
corresponding to the <code>xml:base</code> attribute on elements.
559
<li><strong>[notation]</strong>
560
The <a href="#infoitem.notation">notation</a>
561
information item named by the target.
562
If there is no declaration for a notation with that name, this
563
property has no value. If no declaration has been read, but the [all
564
declarations processed] property of the document information item is
565
false (so there may be an unread declaration), then the value of this
568
<li><strong>[parent]</strong> The document, element, or document type definition
569
information item which contains this information item in its [children] property.
573
<h3><a name="infoitem.rse">2.5. Unexpanded Entity Reference Information Items</a></h3>
574
<p class="xml-def"><em><strong>
575
XML Definition:</strong> Section 4.4.3, <cite><a href="http://www.w3.org/TR/REC-xml#include-if-valid">
576
Included If Validating</a></cite></em></p>
577
<p>A <dfn><strong>unexpanded entity reference information item</strong></dfn>
578
serves as a placeholder by which an XML processor
579
can indicate that it has not expanded an external parsed entity.
580
There is such an information item for each unexpanded
581
reference to an external general entity within the content of an
582
element. A validating XML processor, or a non-validating processor that reads
583
all external general entities, will never generate unexpanded entity reference
584
information items for a valid document.</p>
585
<p>An unexpanded entity reference
586
information item has the following properties:</p> <ol>
587
<li><strong>[name]</strong> The name of the entity referenced.</li>
589
<li><strong>[system identifier]</strong>
590
The system identifier of the entity, as it appears in the declaration
591
of the entity, without any additional URI escaping applied by the processor.
592
If there is no declaration for the entity, this property has no
593
value. If no declaration has been read, but the [all declarations
594
processed] property of the document information item is false (so
595
there may be an unread declaration), then the value of this property
599
<strong>[public identifier]</strong>
600
The public identifier of the entity, normalized as described in
601
<a href="http://www.w3.org/TR/REC-xml#dt-pubid">4.2.2 External Entities</a>
602
<a href="#XML">[XML]</a>.
603
If there is no declaration for the entity, or the declaration does not
604
include a public identifier, this property has no value. If no
605
declaration has been read, but the [all declarations processed]
606
property of the document information item is false (so there may be an
607
unread declaration), then the value of this property is unknown.
610
<strong>[declaration base URI]</strong>
611
The base URI relative to which the system identifier should be resolved
612
(i.e. the base URI of the resource within which the entity declaration occurs).
613
This is unknown or has no value in the same circumstances as the
614
[system identifier] property.
616
<li><strong>[parent]</strong> The element information item which contains
617
this information item in its [children] property.</li>
620
<h3><a name="infoitem.character">2.6. Character Information Items</a></h3>
621
<p class="xml-syntax"><em><strong>XML Syntax:</strong>
622
[2] <a href="http://www.w3.org/TR/REC-xml#NT-Char">Char</a> (Section 2.2, <cite>
623
Characters</cite>)</em></p> <p>There is a <dfn><strong>character
624
information item</strong></dfn> for each data character that appears in the
625
document, whether literally, as a character reference, or within a
630
is a logically separate information item, but XML applications are free to
631
chunk characters into larger groups as necessary or desirable.</p> <p>A character
632
information item has the following properties:</p> <ol>
633
<li><strong>[character code]</strong> The ISO 10646 character code (in the
634
range 0 to #x10FFFF, though not every value in this range is a legal XML character
635
code) of the character.</li>
636
<li><strong>[element content whitespace]</strong> A boolean indicating whether
637
the character is white space appearing within element content (see <a href="#XML">
638
[XML]</a>, 2.10 "White Space Handling"). Note that validating XML processors
639
are <em>required</em> by XML 1.0 to provide this information.
640
If there is no declaration for the containing element, this property has
641
no value for white space characters.
642
If no declaration has been read, but the [all declarations processed]
643
property of the document information item is false (so there may be an
644
unread declaration), then the value of this property is unknown for
645
white space characters.
646
It is always false for characters that are not white space.
648
<li><strong>[parent]</strong> The element information
649
item which contains this information item in its [children] property.</li>
652
<h3><a name="infoitem.comment">2.7. Comment Information Items</a></h3>
653
<p class="xml-def"><em><strong>XML Definition:</strong> <a href="http://www.w3.org/TR/REC-xml#dt-comment">comment</a> (Section 2.5, <cite>
654
Comments</cite>)</em></p> <p class="xml-syntax"><em><strong>
655
XML Syntax:</strong> [15] <a href="http://www.w3.org/TR/REC-xml#NT-Comment">
656
Comment</a> (Section 2.5, <cite>Comments</cite>)</em></p> <p>
657
There is a <dfn><strong>comment information item</strong></dfn>
658
for each XML comment in the original document, except for those appearing
659
in the DTD (which are not represented).</p>
660
<p>A comment information item has
661
the following properties:</p> <ol>
662
<li><strong>[content]</strong> A string representing the content of the comment.
664
<li><strong>[parent]</strong> The document or element
665
information item which contains this information item in its [children] property.
669
<h3><a name="infoitem.doctype">2.8. The Document Type Declaration Information Item</a></h3>
670
<p class="xml-def"><em><strong>
671
XML Definition:</strong> <a href="http://www.w3.org/TR/REC-xml#dt-doctype">
672
document type declaration</a> (section 2.8, <cite>Prolog and Document Type
673
Declaration</cite>)</em></p> <p class="xml-syntax"><em><strong>
674
XML Syntax:</strong> [28] <a href="http://www.w3.org/TR/REC-xml#NT-doctypedecl">
675
doctypedecl</a> (section 2.8, <cite>Prolog and Document Type Declaration</cite>)
676
</em></p> <p>If the XML document has a document type declaration,
677
then the information set contains a single <dfn><strong>document type declaration
678
information item</strong></dfn>. Note that entities and notations
680
properties of the document information item, not the document type declaration
681
information item.</p> <p>A document type declaration information item has
682
the following properties:</p> <ol>
684
<strong>[system identifier]</strong>
685
The system identifier of the external subset, as it appears in the DOCTYPE
686
declaration, without any additional URI escaping applied by the processor.
687
If there is no external subset this property has no value.
690
<strong>[public identifier]</strong>
691
The public identifier of the external subset, normalized as described in
692
<a href="http://www.w3.org/TR/REC-xml#dt-pubid">4.2.2 External Entities</a>
693
<a href="#XML">[XML]</a>.
694
If there is no external subset or if it has no public identifier,
695
this property has no value.
697
<li><strong>[children]</strong> An ordered list of
698
<a href="#infoitem.pi">processing instruction</a> information items
699
representing processing instructions appearing
700
in the DTD, in the original document order. Items from the internal DTD subset
701
appear before those in the external subset.</li>
702
<li><strong>[parent]</strong> The document information item.</li>
706
<h3><a name="infoitem.entity.unparsed">2.9. Unparsed Entity Information Items</a></h3>
707
<p class="xml-def"><em><strong>XML Definition:
708
</strong> <a href="http://www.w3.org/TR/REC-xml#dt-entity">entity</a> (section
709
4, <cite>Physical Structures</cite>)</em></p> <p
710
class="xml-syntax"><em><strong>XML Syntax:</strong> [71] <a href="http://www.w3.org/TR/REC-xml#NT-GEDecl">
711
GEDecl</a> (section 4.2, <cite>Entities</cite>)</em></p>
713
There is an <dfn><strong>unparsed entity information item</strong></dfn>
714
for each unparsed general entity declared in the DTD.
717
An unparsed entity information item has the following properties:
721
<strong>[name]</strong>
722
The name of the entity.
725
<strong>[system identifier]</strong>
726
The system identifier of the entity, as it appears in the declaration
727
of the entity, without any additional URI escaping applied by the processor.
730
<strong>[public identifier]</strong>
731
The public identifier of the entity, normalized as described in
732
<a href="http://www.w3.org/TR/REC-xml#dt-pubid">4.2.2 External Entities</a>
733
<a href="#XML">[XML]</a>.
734
If the entity has no public identifier, this property has no value.
737
<strong>[declaration base URI]</strong>
738
The base URI relative to which the system identifier should be resolved
739
(i.e. the base URI of the resource within which the entity declaration occurs).
742
<strong>[notation name]</strong>
743
The notation name associated with the entity.
746
<strong>[notation]</strong>
747
The <a href="#infoitem.notation">notation</a>
748
information item named by the notation name.
749
If there is no declaration for a notation with that name, this
750
property has no value. If no declaration has been read, but the [all
751
declarations processed] property of the document information item is
752
false (so there may be an unread declaration), then the value of this
759
<h3><a name="infoitem.notation">2.10. Notation Information Items</a></h3>
760
<p class="xml-def"><em><strong>XML Definition:</strong> <a href="http://www.w3.org/TR/REC-xml#dt-notation">notation</a> (section 4.7, <cite>
761
Notations</cite>)</em></p> <p class="xml-syntax"><em><strong>
762
XML Syntax:</strong> [82] <a href="http://www.w3.org/TR/REC-xml#NT-NotationDecl">
763
NotationDecl</a> (section 4.7, <cite>Notations</cite>)</em></p>
764
<p>There is a <dfn><strong>notation information item</strong></dfn> for
765
each notation declared in the DTD.</p> <p>A notation information item has
766
the following properties:</p> <ol>
767
<li><strong>[name]</strong> The name of the notation.</li>
768
<li><strong>[system identifier]</strong> The system identifier of the notation,
769
as it appears in the declaration of the notation,
770
without any additional URI escaping applied by the processor.
771
If no system identifier was specified, this property has no value.</li>
772
<li><strong>[public identifier]</strong>
773
The public identifier of the notation, normalized as described in
774
<a href="http://www.w3.org/TR/REC-xml#dt-pubid">4.2.2 External Entities</a>
775
<a href="#XML">[XML]</a>.
776
If the notation has no public identifier,
777
this property has no value.</li>
779
<strong>[declaration base URI]</strong>
780
The base URI relative to which the system identifier should be resolved
781
(i.e. the base URI of the resource within which the notation declaration
788
<h3><a name="infoitem.namespace">2.11. Namespace Information Items</a></h3>
790
Each element in the document has a <dfn><strong>namespace
791
information item</strong></dfn> for each namespace that is in scope
793
</p> <p>A namespace information item has the following properties:
795
<li><strong>[prefix]</strong> The prefix whose binding this item describes.
797
is the part of the attribute name following the <code>xmlns:</code> prefix.
798
If the attribute name is simply <code>xmlns</code>, so that the
799
declaration is of the default namespace, this property
802
<li><strong>[namespace name]</strong> The namespace name to which the
803
prefix is bound.</li>
808
<h2><a name="conformance">3. Conformance</a></h2>
810
Since the purpose of the Information Set is to provide a set of definitions,
811
conformance is a property of specifications that use those
812
definitions, rather than of implementations.
815
Specifications referring to the Infoset must:
819
Indicate the information items and properties that are needed to implement
820
the specification. (This indirectly imposes conformance requirements
821
on processors used to implement the specification.)
824
Specify how other information items and properties are treated (for
825
example, they might be passed through unchanged).
828
Note any information required from an XML document that is not defined
832
Note any difference in the use of terms defined by the Infoset (this
837
If a specification allows the construction of an infoset that has
838
inconsistencies as described above under
839
<a href="#intro.synthetic">Synthetic Infosets</a>
841
those inconsistencies are to be resolved, and should do so if it
842
provides for serialization of the infoset.
846
<h2><a name="references">Appendix A. References</a></h2>
848
<h3><a name="references.normative">Normative References</a></h3>
850
<dt><strong><a name="ISO10646">ISO/IEC 10646</a></strong></dt>
851
<dd>ISO (International Organization for Standardization). <cite>ISO/IEC 10646-1993
852
(E). Information technology -- Universal Multiple-Octet Coded Character Set
853
(UCS) -- Part 1: Architecture and Basic Multilingual Plane.</cite> [Geneva]:
854
International Organization for Standardization, 1993 (plus amendments AM
855
1 through AM 7). </dd>
856
<dt><strong><a name="Namespaces">Namespaces</a></strong></dt>
857
<dd><cite>Namespaces in XML,</cite> W3C, eds. Tim Bray, Dave Hollander, Andrew
858
Layman. 14 January 1999. Available at <code><a href="http://www.w3.org/TR/REC-xml-names/">
859
http://www.w3.org/TR/REC-xml-names/</a></code>.</dd>
860
<dt><strong><a name="RFC2119">RFC2119</a></strong></dt>
861
<dd><cite>Key words for use in RFCs to Indicate Requirement Levels,</cite>
862
ed. S. Bradner. March 1997. Available at <code><a href="http://www.ietf.org/rfc/rfc2119.txt">
863
http://www.ietf.org/rfc/rfc2119.txt</a></code>.</dd>
864
<dt><strong><a name="XML">XML</a></strong></dt>
865
<dd><cite>Extensible Markup Language (XML) 1.0 (Second Edition),</cite>
866
W3C, eds. Tim Bray, Jean Paoli, C.M. Sperberg-McQueen, Eve Maler. 6 October 2000.
867
Available at <code><a href="http://www.w3.org/TR/REC-xml">http://www.w3.org/TR/REC-xml</a></code>.
869
<dt><strong><a name="XMLBase">XML Base</a></strong></dt>
870
<dd><cite>XML Base,</cite> W3C, ed. Jonathan Marsh. February 2000. Available at <code><a href="http://www.w3.org/TR/xmlbase">http://www.w3.org/TR/xmlbase</a></code>.
875
<h3><a name="references.informative">Informative References</a></h3>
877
<dt><strong><a name="DOM">DOM</a></strong></dt>
878
<dd><cite>Document Object Model (DOM) Level 1 Specification,</cite> W3C, eds. Vidur
879
Apparao, Steve Byrne, Mike Champion, et al. 1 October 1998. Available
880
at <code><a href="http://www.w3.org/TR/REC-DOM-Level-1/">http://www.w3.org/TR/REC-DOM-Level-1/</a></code>.</dd>
881
<dt><strong><a name="XPointer-Liaison">XPointer-Liaison</a></strong></dt>
882
<dd><cite>XPointer-Information Set Liaison Statement,</cite> W3C, ed. Steven J.
883
DeRose. 24 February 1999. Available at <code><a href="http://www.w3.org/TR/NOTE-xptr-infoset-liaison">
884
http://www.w3.org/TR/NOTE-xptr-infoset-liaison</a></code>.</dd>
885
<dt><strong><a name="RelNS">Relative Namespace URI References</a></strong></dt>
887
<cite>Results of W3C XML Plenary Ballot on relative URI References
888
in namespace declarations, 3-17 July 2000,</cite> W3C, eds. Dave Hollander,
889
C. M. Sperberg-McQueen. 6 September 2000. Available at
890
<code><a href="http://www.w3.org/2000/09/xppa">http://www.w3.org/2000/09/xppa</a></code>.
892
<dt><strong><a name="RDFNote">RDF Schema for the XML Information Set</a></strong></dt>
894
<cite>RDF Schema for the XML Information Set,</cite> W3C, ed. Richard Tobin. 6 April 2001. Available at
895
<code><a href="http://www.w3.org/TR/xml-infoset-rdfs">http://www.w3.org/TR/xml-infoset-rdfs</a></code>.
899
<h2><a name="reporting">Appendix B: XML 1.0 Reporting Requirements (informative)</a></h2>
900
<p>Although the XML 1.0 Recommendation <a href="#XML">[XML]</a> is primarily concerned with XML syntax, it also includes
901
some specific reporting requirements for XML processors.</p> <p>The reporting
902
requirements include errors, which are outside the scope of this specification,
903
and document information. All of the XML 1.0 requirements for document information
904
reporting have been integrated into the XML Information Set; numbers in parentheses
905
refer to sections of the XML Recommendation:</p> <ol>
906
<li>An XML processor must always provide all characters in a document that
907
are not part of markup to the application (2.10).</li>
908
<li>A validating XML processor must inform the application which of the character
909
data in a document is white space appearing within element content (2.10).
911
<li>An XML processor must normalize line-ends to LF before passing
912
them to the application (2.11).</li>
913
<li>An XML processor must normalize the value of attributes according to the
914
rules in clause 3.3.3 before passing them to the application.
916
<li>An XML processor must pass the names and external identifiers (system
917
identifiers, public identifiers or both) of declared notations to the application
919
<li>When the name of an unparsed entity appears as the explicit or default
920
value of an ENTITY or ENTITIES attribute, an XML processor must provide the
921
names, system identifiers, and (if present) public identifiers of both the
922
entity and its notation to the application (4.6, 4.7).</li>
923
<li>An XML processor must pass processing instructions to the application
925
<li>An XML processor (necessarily a non-validating one) that does not include
926
the replacement text of an external parsed entity in place of an entity reference
927
must notify the application that it recognized but did not read the entity
929
<li>A validating XML processor must include the replacement text of an entity
930
in place of an entity reference (5.2).</li>
931
<li>An XML processor must supply the default value of attributes
932
declared in the DTD for a given element type but not appearing in the element's
933
start tag (3.3.2).</li>
936
<h2><a name="example">Appendix C: Example (informative)</a></h2>
938
Consider the following example XML document:
941
<pre><?xml version="1.0"?>
943
<msg:message doc:date="19990421"
944
xmlns:doc="http://doc.example.org/namespaces/doc"
945
xmlns:msg="http://message.example.org/"
946
>Phone home!</msg:message></pre>
949
The information set for this XML document
950
contains the following information items:
955
<li>A <a href="#infoitem.document">document</a> information item.</li>
958
An <a href="#infoitem.element">element</a> information item
959
with namespace name "<code>http://message.example.org/</code>",
960
local part "<code>message</code>",
961
and prefix "<code>msg</code>".
965
An <a href="#infoitem.attribute">attribute</a> information item with the
966
namespace name "<code>http://doc.example.org/namespaces/doc</code>",
967
local part "<code>date</code>",
968
prefix "<code>doc</code>",
969
and normalized value "<code>19990421</code>".
973
Three <a href="#infoitem.namespace">namespace</a> information items
975
<code>http://www.w3.org/XML/1998/namespace</code>,
976
<code>http://doc.example.org/namespaces/doc</code>, and
977
<code>http://message.example.org/</code> namespaces.
981
Two <a href="#infoitem.attribute">attribute</a> information items
982
for the namespace attributes.
986
Eleven <a href="#infoitem.character">character</a> information items
987
for the character data.
995
<h2><a name="omitted">Appendix D: What is not in the Information Set</a></h2>
996
<p>The following information is not represented in the
997
current version of the XML Information Set (this list is not intended to
998
be exhaustive):</p> <ol>
999
<li>The content models of elements, from ELEMENT declarations in the DTD.
1001
<li>The grouping and ordering of attribute declarations in ATTLIST declarations.
1003
<li>The document type name.</li>
1004
<li>White space outside the document element.</li>
1005
<li>White space immediately following the target name of a PI.</li>
1006
<li>Whether characters are represented by character references.</li>
1007
<li>The difference between the two forms of an empty element: <code><foo/>
1008
</code> and <code><foo></foo></code>.</li>
1009
<li>White space within start-tags (other than significant white space in attribute
1010
values) and end-tags.</li>
1011
<li>The difference between CR, CR-LF, and LF line termination.</li>
1012
<li>The order of attributes within a start-tag.</li>
1013
<li>The order of declarations within the DTD.</li>
1014
<li>The boundaries of conditional sections in the DTD.</li>
1015
<li>The boundaries of parameter entities in the DTD.</li>
1016
<li>Comments in the DTD.</li>
1017
<li>The location of declarations (whether in internal or external subset or
1018
parameter entities).</li>
1019
<li>Any ignored declarations, including those within an IGNORE conditional
1020
section, as well as entity and attribute declarations ignored because previous
1021
declarations override them. </li>
1022
<li>The kind of quotation marks (single or double) used to quote attribute
1024
<li>The boundaries of general parsed entities.</li>
1025
<li>The boundaries of CDATA marked sections.</li>
1026
<li>The default value of attributes declared in the DTD.</li>
1029
<h2><a name="rdfschema">Appendix E: RDF Schema (informative)</a></h2>
1031
See <a href="#RDFNote">RDF Schema for the XML Information Set</a> for a formal
1032
characterization of the Infoset.
1034
</div> </div> </div></body>