4
<title>Namespace Routing Language (NRL)</title>
9
Mention NRL as subschema for NRL.
11
Explain how NRL can improve RELAX NG modularity
14
<h1>Namespace Routing Language (NRL)</h1>
16
<div class="titlepage">
17
<p><b>Author:</b> James Clark <jjc@thaiopensource.com><br />
18
<b>Date:</b> 2003-06-13</p>
19
<p>Copyright © Thai Open Source Software Center Ltd</p>
23
<p>The XML Namespaces Recommendation allows an XML document to be
24
composed of elements and attributes from multiple independent
25
namespaces. Each of these namespaces may have its own schema; the
26
schemas for different namespaces may be in different schema languages.
27
The problem then arises of how the schemas can be composed in order to
28
allow validation of the complete document. This document proposes the
29
Namespace Routing Language (NRL) as a solution to this problem. NRL
30
is an evolution of the author's earlier <bib ref="mns">Modular
31
Namespaces (MNS)</bib> language.</p>
33
<p>A sample implementation of NRL is included in <bib ref="jing">Jing</bib>.</p>
40
<h2>Getting started</h2>
42
<p>In its simplest form, an NRL schema consists of a mapping from
43
namespace URIs to schema URIs. An NRL schema is written in XML. Here
47
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
48
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
49
<validate schema="soap-envelope.xsd"/>
51
<namespace ns="http://www.w3.org/1999/xhtml">
52
<validate schema="xhtml.rng"/>
57
<p>We will call a schema referenced by an NRL schema a
58
<i>subschema</i>. In the above example,
59
<code>soap-envelope.xsd</code> is the subschema for the namespace URI
60
<code>http://schemas.xmlsoap.org/soap/envelope/</code> and
61
<code>xhtml.rng</code> is the subschema for the namespace URI
62
<code>http://www.w3.org/1999/xhtml</code>.</p>
64
<p>The absent namespace can be mapped to a schema by using
65
<code>ns=""</code>.</p>
70
<h2>Processing model</h2>
72
<p>NRL validation has two inputs: a document to be validated and an
73
NRL schema. We will call the document to be validated the
74
<i>instance</i>. NRL validation divides the instance into sections,
75
each of which contains elements from a single namespace, and validates
76
each section separately against the subschema for its namespace.</p>
78
<p>Thus, the following instance:</p>
81
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"
82
xmlns="http://www.w3.org/1999/xhtml">
86
<title>Document 1</title>
94
<title>Document 2</title>
104
<p>would be divided into three sections, one with the envelope
108
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
114
<p>and two with the XHTML namespace:</p>
117
<html xmlns="http://www.w3.org/1999/xhtml">
119
<title>Document 1</title>
128
<html xmlns="http://www.w3.org/1999/xhtml">
130
<title>Document 2</title>
138
<p>Note that two elements only belong to the same section if they have
139
a common ancestor and if all elements on the path to that common
140
ancestor have the same namespace. Thus, if one of the XHTML documents
141
happened to contain an element from the envelope, it would not be
142
part of the same section as the root element.</p>
144
<p>This validation process can be refined in several ways, which
145
are described in the following sections.</p>
150
<h2>Specifying the schema</h2>
152
<p>In most cases the schema will be in some namespaced XML vocabulary,
153
and the type of schema can be automatically detected from the
154
namespace URI of the root element. In cases where the schema is not
155
in XML and there is no MIME type information available to determine
156
the type, a <code>schemaType</code> attribute can be used to specify the
157
type. The value of this should be a MIME media type. For <bib ref="compact">RELAX NG
158
Compact Syntax</bib>, a value of <code>application/x-rnc</code> should be
162
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
163
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
164
<validate schema="soap-envelope.xsd"/>
166
<namespace ns="http://www.w3.org/1999/xhtml">
167
<validate schema="xhtml.rnc" schemaType="application/x-rnc"/>
172
<p>With many schema languages, there can be different ways to use a
173
particular schema to validate an instance. For example, <bib
174
ref="schematron">Schematron</bib> has the notion of a phase; an
175
instance that is valid with respect to a Schematron schema using one
176
phase may not be valid with respect to the same schema in another
177
phase. NRL allows validation to be controlled by specifying a number
178
of options. For example, to specify that validate with respect to
179
<code>xhtml.sch</code> should use the phase named <code>Full</code>, an option
180
could be specified as follows:</p>
183
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
184
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
185
<validate schema="soap-envelope.xsd"/>
187
<namespace ns="http://www.w3.org/1999/xhtml">
188
<validate schema="xhtml.sch">
189
<option name="http://www.thaiopensource.com/validate/phase" arg="Full"/>
195
<p>Options may have arguments. Some options do not need arguments. For
196
example, for Schematron there is a
197
<code>http://www.thaiopensource.com/validate/diagnose</code> option.
198
If this option is present, then errors will include Schematron
199
diagnostics; if it is not, then errors will not include diagnostics.
200
With this option, no <code>arg</code> attribute is necessary:</p>
203
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
204
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
205
<validate schema="soap-envelope.xsd"/>
207
<namespace ns="http://www.w3.org/1999/xhtml">
208
<validate schema="xhtml.sch">
209
<option name="http://www.thaiopensource.com/validate/diagnose"/>
215
<p>Options are named by URIs. A number of standard options are defined
216
which all start with the URI
217
<code>http://www.thaiopensource.com/validate/</code>:</p>
221
<dt><code>http://www.thaiopensource.com/validate/phase</code></dt>
222
<dd>Argument is a string, specifying Schematron phase</dd>
224
<dt><code>http://www.thaiopensource.com/validate/diagnose</code></dt>
225
<dd>No argument. If present, include Schematron diagnostics in error messages</dd>
227
<dt><code>http://www.thaiopensource.com/validate/check-id-idref</code></dt>
229
<dd>No argument. If present, check ID/IDREF in accordance with
230
<bib ref="dtdcompat">RELAX NG DTD Compatibility</bib> specification.</dd>
232
<dt><code>http://www.thaiopensource.com/validate/feasible</code></dt>
234
<dd>No argument. If present, check that the document is
235
<em>feasibly valid</em>. This applies to <bib ref="relaxng">RELAX NG</bib>. A document is
236
<em>feasibly valid</em> if it could be transformed into a valid
237
document by inserting any number of attributes and child elements
238
anywhere in the tree. This is equivalent to transforming the schema
239
by wrapping every <code>data</code>, <code>list</code>,
240
<code>element</code> and <code>attribute</code> element in an
241
<code>optional</code> element and then validating against the
242
transformed schema. This option is useful while a document is still
243
under construction.</dd>
245
<dt><code>http://www.thaiopensource.com/validate/schema</code></dt>
247
<dd>Argument is a URI specifying an additional schema to be used for
248
validation. This applies to <bib ref="wxs">W3C XML Schema</bib>. This
249
option may be specified multiple times, once for each additional
254
<p>For convenience, the URI specified by the <code>name</code>
255
attribute may be relative; if it is, it will be resolved relative to
256
the NRL namespace URI. The result is that the standard options above
257
can be specified without the
258
<code>http://www.thaiopensource.com/validate/</code> prefix. For
262
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
263
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
264
<validate schema="soap-envelope.xsd"/>
266
<namespace ns="http://www.w3.org/1999/xhtml">
267
<validate schema="xhtml.sch">
268
<option name="phase" arg="Full"/>
274
<p>Normally, an NRL implementation will make a best-effort attempt to
275
support the specified option and will simply ignore options that it
276
does not understand or cannot support. If it is essential that a
277
particular option is supported, then a <code>mustSupport</code>
278
attribute may be added to the <code>option</code> element:</p>
281
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
282
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
283
<validate schema="soap-envelope.xsd"/>
285
<namespace ns="http://www.w3.org/1999/xhtml">
286
<validate schema="xhtml.sch">
287
<option name="phase" arg="Full" mustSupport="true"/>
293
<p>If there is a <code>mustSupport</code> attribute and the NRL
294
implementation cannot support the option, it must report an error.</p>
299
<h2>Concurrent validation</h2>
301
<p>Multiple <code>validate</code> elements can be specified for a
302
single namespace. The effect is to validate against all of the
303
specified schemas.</p>
305
<p>For example, we might have a Schematron schema for XHTML, which
306
makes various checks that cannot be expressed in a grammar. We want
307
to validate against both the Schematron schema and the RELAX NG
308
schema. The NRL schema would be like this:</p>
311
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
312
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
313
<validate schema="soap-envelope.xsd"/>
315
<namespace ns="http://www.w3.org/1999/xhtml">
316
<validate schema="xhtml.rng"/>
317
<validate schema="xhtml.sch"/>
325
<h2>Built-in schemas</h2>
327
<p>Instead of a <code>validate</code> element, you can use an
328
<code>allow</code> element or a <code>reject</code> element. These
329
are equivalent respectively to validating with a schema that allows
330
anything or with a schema that allows nothing.</p>
332
<p>For example, the following would allow SVG without attempting to
336
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
337
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
338
<validate schema="soap-envelope.xsd"/>
340
<namespace ns="http://www.w3.org/1999/xhtml">
341
<validate schema="xhtml.rng"/>
343
<namespace ns="http://www.w3.org/2000/svg">
349
<p>Note that, just as with <code>validate</code>, <code>allow</code>
350
and <code>reject</code> apply to a section not to a whole subtree.
351
Thus, in the above example, if the SVG contained an embedded XHTML
352
section, then that XHTML section would be validated against
353
<code>xhtml.rng</code>.</p>
358
<h2>Namespace wildcards</h2>
360
<p>You can use an <code>anyNamespace</code> element instead of a
361
<code>namespace</code> element. This specifies a rule to be used for
362
an element for which there is no applicable <code>namespace</code>
365
<p>Namespace wildcards are particularly useful in conjunction
366
with <code>allow</code> and <code>reject</code>. The following
367
will validate <i>strictly</i>, rejecting any namespace for
368
which no subschema is specified:</p>
371
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
372
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
373
<validate schema="soap-envelope.xsd"/>
375
<namespace ns="http://www.w3.org/1999/xhtml">
376
<validate schema="xhtml.rng"/>
384
<p>In contrast, the following will validate <i>laxly</i>, allowing any
385
namespace for which no subschema is specified:</p>
388
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
389
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
390
<validate schema="soap-envelope.xsd"/>
392
<namespace ns="http://www.w3.org/1999/xhtml">
393
<validate schema="xhtml.rng"/>
401
<p>The default is to validate strictly. Thus, if there is no
402
<code>anyNamespace</code> rule, then the following rule will be
415
<p>You can apply different rules in different contexts by using
416
<i>modes</i>. For example, you might want to restrict the
417
namespaces allowed for the root element.</p>
419
<p>The <code>rules</code> element for an NRL schema that uses multiple
420
modes does not contain <code>namespace</code> and
421
<code>anyNamespace</code> elements directly. Rather, it contains
422
<code>mode</code> elements that in turn contain <code>namespace</code>
423
and <code>anyNamespace</code> elements. The <code>validate</code>
424
elements can specify a <code>useMode</code> attribute to change the
425
mode in which their child sections are processed. The
426
<code>rules</code> element must have a <code>startMode</code>
427
attribute specifying which mode to use for the root element.</p>
429
<p>For example, suppose we want to require that the root element come from
430
<code>http://schemas.xmlsoap.org/soap/envelope/</code> namespace.</p>
433
<rules xmlns="http://www.thaiopensource.com/validate/nrl" startMode="soap">
435
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
436
<validate schema="soap-envelope.xsd" useMode="body"/>
440
<namespace ns="http://www.w3.org/1999/xhtml">
441
<validate schema="xhtml.rng"/>
447
<p>If a <code>validate</code> element does not specify a
448
<code>useMode</code> attribute, then the mode remains unchanged. Thus,
449
in the above example, child sections inside an XHTML section will be
450
processed in mode <code>body</code>, which does not allow the SOAP
451
namespace; so if the XHTML were to contain a SOAP
452
<code>env:Envelope</code> element, it would be rejected.</p>
454
<p>The <code>reject</code> and <code>allow</code> elements can have a
455
<code>useMode</code> attribute as well.</p>
460
<h2>Related namespaces</h2>
462
<p>A single subschema may not handle just a single namespace; it may
463
be handle two or more related namespaces. To deal with this
464
possibility, NRL allows the rule for a namespace to specify that
465
elements from that namespace are to be attached to a parent section
466
and be validated together with that parent section.</p>
468
<p>Suppose we have RELAX NG schemas for XHTML and for SVG. We could
469
use these directly as subschemas in NRL. But we might prefer instead
470
to use RELAX NG mechanisms to combine these into a single RELAX NG
471
schema. This would allow us conveniently to allow SVG elements only to
472
occur in places where XHTML block and inline elements are allowed and
473
to disallow them in places that make no sense (for example, as
474
children of a <code>ul</code> element). If we have such a combined
475
schema, we could use it as follows:</p>
478
<rules xmlns="http://www.thaiopensource.com/validate/nrl" startMode="soap">
480
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
481
<validate schema="soap-envelope.xsd" useMode="xhtml"/>
485
<namespace ns="http://www.w3.org/1999/xhtml">
486
<validate schema="xhtml+svg.rng" useMode="svg"/>
490
<namespace ns="http://www.w3.org/2000/svg">
497
<p>This will cause SVG sections occurring within XHTML to be attached
498
to the parent XHTML section and be validated as part of it.</p>
500
<p>RDF is another example where <code>attach</code> is necessary.
501
RDF can contain elements from arbitrary namespaces.</p>
504
<rules xmlns="http://www.thaiopensource.com/validate/nrl" startMode="root">
506
<namespace ns="http://www.w3.org/1999/xhtml">
507
<validate schema="xhtml.rng" useMode="body"/>
511
<namespace ns="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
512
<validate schema="rdfxml.rng" useMode="rdf"/>
523
<p>We could use the approach of attaching all namespaces as an
524
alternative solution to the XHTML+SVG example. Instead relying on NRL
525
to reject namespaces other than XHTML and SVG, we can instead attach
526
sections from all namespaces to the XHTML section, and allow the
527
<code>xhtml+svg.rng</code> schema to reject namespaces other than
531
<rules xmlns="http://www.thaiopensource.com/validate/nrl" startMode="soap">
533
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
534
<validate schema="soap-envelope.xsd" useMode="xhtml"/>
538
<namespace ns="http://www.w3.org/1999/xhtml">
539
<validate schema="xhtml+svg.rng" useMode="attach"/>
553
<h2>Built-in modes</h2>
555
<p>There is a built-in mode named <code>#attach</code>, which contains
564
<p>Thus, the last example in the previous section can be simplified to:</p>
567
<rules xmlns="http://www.thaiopensource.com/validate/nrl" startMode="soap">
569
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
570
<validate schema="soap-envelope.xsd" useMode="xhtml"/>
574
<namespace ns="http://www.w3.org/1999/xhtml">
575
<validate schema="xhtml+svg.rng" useMode="#attach"/>
581
<p>Suppose you are not interested in the namespace-sectioning
582
capabilities of NRL, but you just want to validate a document
583
concurrently against two schemas. The simplest way is like this:</p>
586
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
588
<validate schema="xhtml.rng" useMode="#attach"/>
589
<validate schema="xhtml.sch" useMode="#attach"/>
594
<p>The <code>useMode="#attach"</code> ensures that the document will
595
be validated as is, rather than divided into sections.</p>
597
<p>Similarly, there is a built-in mode named <code>#reject</code>,
598
which contains just the rule:</p>
606
<p>and a built-in mode named <code>#allow</code>, which contains just
617
<h2>Open schemas</h2>
619
<p>Up to now, sections validated by one subschema have not
620
participated in the validation of parent sections. Modern schema
621
languages, such as W3C XML Schema and RELAX NG, can use wildcards to
622
allow elements and attributes from any namespace in particular
623
contexts. It is useful to take advantage of this in order to allow
624
one subschema to constrain the contexts in which sections validated by
625
other subschemas can occur. For example, the official schema for
626
<code>http://schemas.xmlsoap.org/soap/envelope/</code> uses wildcards
627
to specify precisely where elements from other namespaces are allowed:
628
they are allowed as children of the <code>env:Body</code> and
629
<code>env:Header</code> elements but not as children of the
630
<code>env:Envelope</code> element. Our NRL schema bypasses these
631
constraints because the XHTML sections are not seen by the SOAP
632
validation. We can use <code>attach</code> to solve this problem:</p>
635
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
636
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
637
<validate schema="soap-envelope.xsd"/>
639
<namespace ns="http://www.w3.org/1999/xhtml">
640
<validate schema="xhtml.rng"/>
646
<p>When an XHTML section occurs inside a SOAP section, the XHTML
647
section will participate in two validations:</p>
651
<li>it will be validated independently against the XHTML schema, and</li>
653
<li>it will be attached to the SOAP section and validated together
654
with the SOAP section against the SOAP schema</li>
661
<h2>Element-name context</h2>
663
<p>So far we have seen how to make the processing of an element depend
664
on the namespace URIs of its ancestors. NRL also allows the
665
processing to depend on the element names of its ancestors. For
666
example, suppose we wish to allow RDF to occur only as a child of the
667
<code>head</code> element of XHTML. We can do this as follows:</p>
670
<rules xmlns="http://www.thaiopensource.com/validate/nrl" startMode="root">
672
<namespace ns="http://www.w3.org/1999/xhtml">
673
<validate schema="xhtml.rng">
674
<context path="head" useMode="rdf"/>
679
<namespace ns="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
680
<validate schema="rdfxml.rng" useMode="#attach"/>
686
<p>Any element that takes a <code>useMode</code> attribute can also
687
have one or more <code>context</code> children that override the
688
<code>useMode</code> attribute in specific contexts. The
689
<code>path</code> attribute specifies a test to be applied to the
690
parent element of the section to be processed.
691
The <code>path</code> attribute allows a restricted form of XPath: a
692
list of one or more choices separated by <code>|</code>, where each
693
choice is a list of one or more unqualified names separated by
694
<code>/</code>, optionally preceded by <code>/</code>. It is
695
interpreted like a pattern in XSLT, except that the names are
696
implicitly qualified with the namespace URI of the containing
697
<code>namespace</code> element. When more than one path matches, the
698
most specific is chosen. It is an error to have two or more equally
699
specific paths. The path is tested against a single section not the
700
entire document: a path of <code>/foo</code> means a <code>foo</code>
701
element that is the root of a section; it does not mean a
702
<code>foo</code> element that is the root of the document.</p>
709
<p>Up to now, we have considered attributes to be inseparably attached
710
to their parent elements. Although this is the default behaviour is
711
to attach attributes to their parent elements, attributes are in fact
712
considered to be separate sections and can be processed
713
separately. Attributes with the same namespace URI and same parent
714
element are grouped in a single section. Such sections are called
715
attribute sections; sections that contain elements are called element
718
<p>A <code>namespace</code> or <code>anyNamespace</code> element can
719
have a <code>match</code> attribute, whose value must be a list of one
720
or two of the tokens <code>attributes</code> and
721
<code>elements</code>. If the value includes the token
722
<code>attributes</code>, the rule matches attribute sections.</p>
724
<p>The default behaviours of attaching attributes to their parent
725
elements occurs because the default value of the <code>match</code>
726
attribute is <code>elements</code> and because all of the built-in
727
modes include a rule:</p>
730
<anyNamespace match="attributes">
735
<p>Most, if not all, XML schema languages do not have any notion of
736
validating a set of attributes; they know only how to validate an XML
737
element. Therefore, before validating an attribute section, NRL
738
transforms it into an XML element by creating a dummy element to hold
739
the attributes. NRL also performs a corresponding transformation on
740
the schema. This is schema-language dependent. For example, in the
741
case of RELAX NG, a schema <var>s</var> is transformed to
742
<code><element><anyName/> <var>s</var> </element></code>.</p>
744
<p>For example, suppose <code>xmlatts.rng</code> contains a schema for
745
the attributes in the <code>xml:</code> namespace written in RELAX
749
<group xmlns="http://relaxng.org/ns/structure/1.0"
750
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
752
<attribute name="xml:lang">
754
<data type="language"/>
760
<attribute name="xml:base">
761
<data type="anyURI"/>
765
<attribute name="xml:space">
767
<value>preserve</value>
768
<value>default</value>
775
<p>An NRL schema could use this as follows:</p>
778
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
779
<namespace ns="http://www.w3.org/1999/xhtml">
780
<validate schema="xhtml.rng"/>
782
<namespace ns="http://www.w3.org/XML/1998/namespace" match="attributes">
783
<validate schema="xmlatts.rng"/>
791
<h2>Mode inheritance</h2>
793
<p>One mode can <i>extend</i> another mode. Suppose in our SOAP+XHTML
794
example, we want to allow both SOAP element and XHTML elements to
795
contain RDF. By putting the rule for RDF in its own mode and
796
extending that mode, we can avoid having to specify the rule for RDF
800
<rules xmlns="http://www.thaiopensource.com/validate/nrl" startMode="soap">
802
<namespace ns="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
803
<validate schema="rdfxml.rng" useMode="#attach"/>
806
<mode name="soap" extends="common">
807
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
808
<validate schema="soap-envelope.xsd" useMode="body"/>
811
<mode name="body" extends="common">
812
<namespace ns="http://www.w3.org/1999/xhtml">
813
<validate schema="xhtml.rng"/>
819
<p>It is possible to extend a built-in mode. Thus, a mode that
820
validates laxly can be specified simply just by extending
821
<code>#allow</code>. This works because of how wildcards and
822
inheritance interact. Suppose mode <var>x</var> extends mode
823
<var>y</var>; then when using mode <var>x</var>, the following order
824
will be used to search for a matching rule:</p>
827
<li>a non-wildcard rule in <var>x</var></li>
828
<li>a non-wildcard rule in <var>y</var></li>
829
<li>a wildcard rule in <var>x</var></li>
830
<li>a wildcard rule in <var>y</var></li>
833
<p>The requirement that there is an implicit rule of</p>
841
<p>can be restated as a requirement that the default value of the
842
<code>extends</code> attribute is <code>#reject</code>.</p>
847
<h2>Transparent namespaces</h2>
849
<p>Many schema languages can deal with the kind of extensibility that
850
involves adding child elements or attributes from different
851
namespaces. A more difficult kind of extensibility is where we need
852
to be able to wrap an extension element around an existing
853
non-extension element. This can arise with namespaces describing
854
templating and versioning. Imagine XHTML inside an XSLT stylesheet:
855
in such a document we might have a <code>ul</code> element containing
856
an <code>xsl:for-each</code> element containing an <code>li</code>
857
element, although the schema for XHTML requires <code>li</code>
858
elements to occur as direct children of <code>ul</code> elements. In
859
such a situation, we need to need to make the XHTML schema
860
<i>unwrap</i> the <code>xsl:for-each</code> element, ignoring its
861
start-tag and end-tag, but not ignoring its content.</p>
863
<p>Suppose we have a namespace
864
<code>http://www.example.org/edit</code> containing elements
865
<code>inserted</code> and <code>deleted</code>, which describe edits
866
that have been made to a document, and suppose we want to use these
867
elements inside an XHTML document. The following NRL schema would
868
allow us still to validate the XHTML document.</p>
871
<rules xmlns="http://www.thaiopensource.com/validate/nrl" startMode="root">
873
<namespace ns="http://www.w3.org/1999/xhtml">
874
<validate schema="xhtml.rng" useMode="xhtml"/>
878
<namespace ns="http://www.example.org/edit">
881
<namespace ns="http://www.w3.org/1999/xhtml">
888
<p>When <code>unwrap</code> is applied to an element section
889
<var>e</var>, it ignores the elements in <var>e</var> and their
890
attributes and just processes the child element sections of
891
<var>e</var>; if processing the child element sections causes a
892
section to try to attach to <var>e</var>, it will instead attach to
893
the parent of <var>e</var>. Thus, in the above schema the section
894
from the edit namespace will be ignored, but child sections will be
895
processed according to rules applicable in the <code>xhtml</code>
896
mode. When a edit section has an XHTML child section, then that XHTML
897
child section will be attached to the parent of the edit section
898
(which can only be another XHTML section).</p>
900
<p>The above schema does not deal with validating the edit
901
namespace. Let us suppose that <code>inserted</code> and
902
<code>deleted</code> elements cannot nest. Our schema
903
<code>edit.rnc</code> for the edit namespace is just two lines:</p>
906
default namespace = "http://www.example.org/edit"
907
element inserted|deleted { empty }
910
<p>The following NRL schema would allow validation of the edit
914
<rules xmlns="http://www.thaiopensource.com/validate/nrl" startMode="root">
916
<namespace ns="http://www.w3.org/1999/xhtml">
917
<validate schema="xhtml.rng" useMode="xhtml"/>
920
<mode name="xhtml" extends="noEdit">
921
<namespace ns="http://www.example.org/edit">
922
<validate schema="edit.rnc"
923
schemaType="application/x-rnc"
925
<unwrap useMode="noEdit"/>
929
<namespace ns="http://www.w3.org/1999/xhtml">
936
<p>The above schema is still not quite right. Suppose a
937
<code>title</code> element was both inserted and deleted. With the
938
above NRL schema, XHTML validation would see two <code>title</code>
939
elements, which would get an error. We should instead do XHTML
940
validation twice, once including the content of the
941
<code>inserted</code> elements and ignoring the content of the
942
<code>deleted</code> elements and once doing the opposite. We only
943
need to validate the edit elements once. The following NRL schema
944
accomplishes this:</p>
947
<rules xmlns="http://www.thaiopensource.com/validate/nrl" startMode="root">
949
<namespace ns="http://www.w3.org/1999/xhtml">
950
<validate schema="xhtml.rng" useMode="new"/>
951
<validate schema="xhtml.rng" useMode="old"/>
954
<mode name="new" extends="noEdit">
955
<namespace ns="http://www.example.org/edit">
956
<validate schema="edit.rnc"
957
schemaType="application/x-rnc"
960
<unwrap useMode="noEdit">
961
<context path="deleted" useMode="#allow"/>
965
<mode name="old" extends="noEdit">
966
<namespace ns="http://www.example.org/edit">
967
<unwrap useMode="noEdit">
968
<context path="inserted" useMode="#allow"/>
973
<namespace ns="http://www.w3.org/1999/xhtml">
983
<h2>Related work</h2>
985
<p>The fundamental idea of dividing the instance into sections, each
986
of which contains elements from a single namespace, and then
987
validating each section separately against the schema for its
988
namespace originated in Murata Makoto's <bib ref="relaxns">RELAX
989
Namespace</bib>. ISO/IEC JTC1/SC34 (the ISO subcommittee responsible
990
for Document Description and Processing Languages) is developing
991
ISO/IEC 19757 Document Schema Definition Languages (DSDL) as a
992
multi-part standard. A <bib ref="N363">Committee Draft (CD) of Part
993
4: Selection of Validation Candidates</bib>, which was based on RELAX
994
Namespace, has been approved. Comments on the CD have been <bib
995
ref="N415">resolved</bib>. <bib ref="mns">MNS</bib>, the predecessor
996
to NRL, was input to the CD comment resolution process. In response
997
to MNS, Rick Jelliffe produced the <bib ref="nsswitch">Namespace
998
Switchboard</bib>, which was also input to the CD comment resolution
999
process. Some of the evolution of NRL from MNS was inspired by the
1000
Namespace Switchboard. A Final Committee Draft (FCD) of Part 4 is
1001
currently in preparation; NRL will be submitted as input.</p>
1003
<p>At this stage, no guarantees can be made about how NRL will relate
1004
to the FCD. In the opinion of this document's author and of the DSDL
1005
Part 4 project editor (Murata Makoto), the functionality is likely to
1006
be similar, with the following possible exceptions:</p>
1010
<li>There are concerns about <href>Element-name context</href>: some
1011
feel it is too complicated; some feel it is too simple.</li>
1013
<li>The functionality corresponding to <href>Transparent
1014
namespaces</href>, was rejected on the last occasion it was discussed;
1015
one reason was the lack of implementation experience. It is hoped that
1016
this can be reconsidered in the light of NRL.</li>
1018
<li>The functionality provided by the <code>option</code> element in
1019
<href>Specifying the schema</href> has not yet been considered for the
1024
<p>However, the syntax may well be different. In particular:</p>
1028
<li>Names of elements and attributes may be different.</li>
1030
<li>Syntactic sugar for modes may be different. The FCD may not
1031
provide <href>Mode inheritance</href>. The FCD may use nesting to
1032
avoid the need to name modes in some cases.</li>
1034
<li>The FCD is expected to provide syntactic sugar for an action
1035
equivalent to <code><attach useMode="<var>x</var>"/></code>,
1036
where <var>x</var> is a built-in mode like <code>#allow</code> except
1037
that it allows attributes as well as elements. The idea is to allow
1038
subschemas to use empty elements as placeholders.</li>
1040
<li>The FCD is expected to provide a schema inclusion mechanism (not just
1041
using NRL as a subschema).</li>
1043
<li>The FCD is expected to allow inline schemas, for example, by
1044
allowing <code>validate</code> to have a <code>schema</code> element
1045
containing the schema as an alternative to the <code>schema</code>
1046
attribute containing the schema's URL.</li>
1050
<p>The group working on DSDL (SC34/WG1) welcomes public discussion of
1051
DSDL. Comments on NRL would be useful input to the Part 4 FCD
1052
preparation process. See the <bib ref="dsdl.org">DSDL web site</bib>
1053
for information on how to make comments.</p>
1058
<h2>Acknowledgements</h2>
1060
<p>Thanks to Murata Makoto and Rick Jelliffe for helpful comments.</p>
1072
<bibentry name="dsdl.org">DSDL Web Site,
1073
<url>http://www.dsdl.org</url></bibentry>
1075
<bibentry name="N363">Committee Draft of Document Schema Definition Languages
1076
(DSDL) -- Part 4: Selection of Validation Candidates,
1077
<url>http://www.y12.doe.gov/sgml/sc34/document/0363.htm</url></bibentry>
1079
<bibentry name="N415">Comment Disposition of Committee Draft
1080
Ballot of Document Schema Definition Languages (DSDL) -- Part 4:
1081
Selection of Validation Candidates,
1082
<url>http://www.y12.doe.gov/sgml/sc34/document/0415.htm</url></bibentry>
1084
<bibentry name="jing">Jing,
1085
<url>http://www.thaiopensource.com/relaxng/jing.html</url></bibentry>
1087
<bibentry name="nsswitch">Namespace Switchboard,
1088
<url>http://www.topologi.com/resources/NamespaceSwitchboard.html</url></bibentry>
1090
<bibentry name="relaxns">RELAX Namespace,
1091
<url>http://www.y-adagio.com/public/standards/tr_relax_ns/toc.htm</url></bibentry>
1093
<bibentry name="relaxcore">RELAX Core,
1094
<url>http://www.xml.gr.jp/relax/</url></bibentry>
1096
<bibentry name="relaxng">RELAX NG, <url>http://relaxng.org</url></bibentry>
1098
<bibentry name="compact">RELAX NG Compact Syntax,
1099
<url>http://www.oasis-open.org/committees/relax-ng/compact-20021121.html</url></bibentry>
1101
<bibentry name="dtdcompat">RELAX NG DTD Compatibility,
1102
<url>http://www.oasis-open.org/committees/relax-ng/compatibility-20011203.html</url></bibentry>
1104
<bibentry name="schematron">Schematron,
1105
<url>http://www.ascc.net/xml/resource/schematron/schematron.html</url></bibentry>
1107
<bibentry name="wxs">W3C XML Schema,
1108
<url>http://www.w3.org/TR/xmlschema-1/</url></bibentry>
1110
<bibentry name="mns">Modular Namespaces (MNS),
1111
<url>http://www.thaiopensource.com/relaxng/mns.html</url></bibentry>
1120
<p>NRL elements can be extended with arbitrary attributes provided the
1121
attributes are namespace qualified and their namespace is not the NRL
1122
namespace; they can also be extended with arbitrary child elements
1123
with any namespace (including the absent namespace) other than the NRL
1124
namespace. We could provide a RELAX NG schema that fully described
1125
NRL, but the extensibility would make the schema harder to understand.
1126
So instead we provide a RELAX NG schema (in compact syntax) that does
1127
not allow extensibility, and provide an NRL schema to make it
1130
<p>Thus, NRL is described by the following NRL schema:</p>
1133
<rules xmlns="http://www.thaiopensource.com/validate/nrl" startMode="root">
1135
<namespace ns="http://www.thaiopensource.com/validate/nrl">
1136
<validate schema="nrl.rnc" schemaType="application/x-rnc" useMode="extend"/>
1139
<mode name="extend">
1140
<namespace ns="http://www.thaiopensource.com/validate/nrl" match="attributes">
1143
<namespace ns="" match="attributes">
1146
<anyNamespace match="elements attributes">
1147
<allow useMode="#attach"/>
1153
<p>where <code>nrl.rnc</code> is as follows:</p>
1156
default namespace = "http://www.thaiopensource.com/validate/nrl"
1161
(rule* | (attribute startMode { modeName }, mode+))
1166
attribute name { userModeName },
1167
attribute extends { modeName }?,
1173
attribute ns { xsd:anyURI },
1176
| element anyNamespace { ruleModel }
1178
ruleModel = attribute match { elementsOrAttributes }?, actions
1180
elementsOrAttributes =
1182
("elements", "attributes")
1183
| ("attributes", "elements")
1189
noResultAction*, (noResultAction|resultAction), noResultAction*
1193
attribute schema { xsd:anyURI },
1198
| element allow|reject { modeUsage }
1201
element attach|unwrap { modeUsage }
1205
attribute name { xsd:anyURI },
1206
attribute arg { text }?,
1207
attribute mustSupport { xsd:boolean }?
1211
attribute useMode { modeName }?,
1213
attribute path { path },
1214
attribute useMode { modeName }?
1217
modeName = userModeName | builtinModeName
1219
userModeName = xsd:NCName
1220
builtinModeName = "#attach" | "#allow" | "#reject" | "#unwrap"
1222
schemaType = attribute schemaType { mediaType }
1223
mediaType = xsd:string # should do better than this
1226
pattern = "\s*(/\s*)?\i\c*(\s*/\s*\i\c*)*\s*"
1227
~ "(|\s*(/\s*)?\i\c*(\s*/\s*\i\c*)*\s*)*"
1234
<h2>Formal semantics</h2>
1236
<p>In order to describe the semantics of NRL, it is convenient to
1237
construct a new section-based data model. This data model is
1238
constructed from the RELAX NG data model. An implementation wouldn't
1239
actually have to construct this, but the semantics are simpler to
1240
describe in terms of this data model rather than in terms of the RELAX
1241
NG data model. Note that the information content is exactly
1242
equivalent to the RELAX NG data model.</p>
1244
<p>There are two kinds of section: attribute sections and element
1245
sections. Two attributes belong to the same section iff they have the
1246
same parent and the same namespace URI. An element belongs to the
1247
same section as its parent iff it has the same namespace URI as its
1248
parent. An attribute section is simply a non-empty unordered set of
1249
attributes (as in RELAX NG), where each member of the set has the same
1250
namespace URI. An element section is a little more complicated. First
1251
we need the concept of a node. There are three kinds of node: an
1252
element node, a text node and a slot node. An element node has a
1253
name, a context (as in RELAX NG), and a list of child nodes. A text
1254
node has a string value. A slot node has no additional information; it
1255
is merely a placeholder for a element section. A list of child nodes
1256
never has two adjacent text nodes and never has two adjacent slot
1257
nodes. An element section is a triple <<i>nd</i>, <i>lsa</i>,
1258
<i>lle</i>>, where <i>nd</i> is an element node, <i>lsa</i> is a list
1259
of unordered sets of attribute sections, and <i>lle</i> is a list of
1260
lists of element sections. <i>lsa</i> has one member for each element
1261
node in <i>nd</i>. The unordered set of attribute sections that is the
1262
<i>n</i>-th member of <i>lsa</i> gives the attributes for the
1263
<i>n</i>-th element node in <i>nd</i> (iterating in document order).
1264
<i>lle</i> has one member for each slot node in <i>nd</i>. The list
1265
of element sections that is the <i>n</i>-th member of <i>lle</i>
1266
corresponds to the <i>n</i>-th slot node in <i>nd</i> (iterating in
1267
document order).</p>
1269
<p>An NRL schema consists of a set of modes. A mode consists of a set
1270
of rules. A mode maps a section to an action based on the section's
1271
namespace URI and on whether the section is an attribute section or an
1272
element section. An action can be applied to element sections and
1273
attribute sections. An action returns two values, one of which is
1274
always error information. When an action is applied to an element
1275
section, it returns error information and a (possibly empty) list of
1276
element sections. When an action is applied to an attribute section,
1277
it returns error information and either an attribute section or an
1280
<p>In the NRL syntax, a rule can contain multiple actions. This is
1281
represented in the formalization using a Sequence action. The
1282
sequence action discards results (other than error information) from
1283
the first action. Only two actions can produce results other than
1284
error information: attach and unwrap. The NRL syntax allows at most
1285
one such action in a rule. When constructing a sequence representing
1286
a set of actions in a rule, this action, if any, must be the last
1287
action in the sequence.</p>
1289
<p>Here is a formalization in Haskell:</p>
1293
type LocalName = String
1294
type QName = (Uri, LocalName)
1295
type Prefix = String
1296
type Context = (Uri, [(Prefix, Uri)])
1298
data Node = ElementNode QName Context [Node]
1302
type AttributeSection = [(QName, String)]
1304
data ElementSection = ElementSection Node [[AttributeSection]] [[ElementSection]]
1305
data ElementsOrAttributes = Elements | Attributes
1307
type Mode = ElementsOrAttributes -> Uri -> Action
1309
data Action = Attach Mode
1314
| Sequence Action Action
1316
data ErrorReport = AttributeError AttributeSection String
1317
| ElementError ElementSection String
1319
type ErrorInfo = [ErrorReport]
1321
data Validated a = Validated ErrorInfo a
1323
applyElementAction :: Action -> ElementSection -> Validated [ElementSection]
1325
applyElementAction (Reject m) e@(ElementSection nd lsa lle) =
1326
Validated ([ElementError e "namespace rejected"]
1327
++ (errors (plsa m lsa))
1328
++ (errors (plle m lle)))
1330
applyElementAction (Attach m) (ElementSection nd lsa lle)
1331
= listV (elementSectionV nd (plsa m lsa) (plle m lle))
1332
applyElementAction (Unwrap m) (ElementSection _ _ lle) = ple m (concat lle)
1333
applyElementAction (Allow m) (ElementSection nd lsa lle)
1334
= valid2 (\x y -> []) (plsa m lsa) (plle m lle)
1335
applyElementAction (Validate s m) (ElementSection nd lsa lle)
1336
= Validated (validate s (elementSectionV nd (plsa m lsa) (plle m lle)))
1338
applyElementAction (Sequence a1 a2) e
1339
= actionSequence (applyElementAction a1 e) (applyElementAction a2 e)
1341
validate :: Uri -> Validated ElementSection -> ErrorInfo
1342
validate uri (Validated errs e) = errs ++ (validateElement uri e)
1344
elementSectionV :: Node -> Validated [[AttributeSection]] -> Validated [[ElementSection]] -> Validated ElementSection
1345
elementSectionV nd lsa lle = valid2 (ElementSection nd) lsa lle
1347
applyAttributeAction :: Action -> AttributeSection -> Validated (Maybe AttributeSection)
1348
applyAttributeAction (Allow m) a = Validated [] Nothing
1349
applyAttributeAction (Reject m) a = Validated [AttributeError a "namespace rejected"] Nothing
1350
applyAttributeAction (Attach m) a = Validated [] (Just a)
1351
applyAttributeAction (Unwrap _) _ = Validated [] Nothing
1352
applyAttributeAction (Validate s m) a
1353
= Validated (validateAttribute s a) Nothing
1354
applyAttributeAction (Sequence a1 a2) a
1355
= actionSequence (applyAttributeAction a1 a) (applyAttributeAction a2 a)
1358
actionSequence :: Validated a -> Validated a -> Validated a
1359
actionSequence (Validated errs1 _) (Validated errs2 x) = Validated (errs1 ++ errs2) x
1362
-- these are provided by an external validation library
1364
validateElement :: Uri -> ElementSection -> ErrorInfo
1365
validateElement _ _ = []
1367
validateAttribute :: Uri -> AttributeSection -> ErrorInfo
1368
validateAttribute _ _ = []
1370
-- processing functions
1372
pe :: Mode -> ElementSection -> Validated [ElementSection]
1373
pe m e = applyElementAction (m Elements (elementSectionNs e)) e
1375
ple :: Mode -> [ElementSection] -> Validated [ElementSection]
1376
ple m le = concatMapV (pe m) le
1378
plle :: Mode -> [[ElementSection]] -> Validated [[ElementSection]]
1379
plle m lle = mapV (ple m) lle
1381
pa :: Mode -> AttributeSection -> Validated (Maybe AttributeSection)
1382
pa m a = applyAttributeAction (m Attributes (attributeSectionNs a)) a
1384
psa :: Mode -> [AttributeSection] -> Validated [AttributeSection]
1385
psa m sa = dropMapV (pa m) sa
1387
plsa :: Mode -> [[AttributeSection]] -> Validated [[AttributeSection]]
1388
plsa m lsa = mapV (psa m) lsa
1390
elementSectionNs :: ElementSection -> Uri
1391
elementSectionNs (ElementSection (ElementNode (ns, _) _ _) _ _) = ns
1393
attributeSectionNs :: AttributeSection -> Uri
1394
attributeSectionNs (((ns, _),_):_) = ns
1396
-- functions for the Validated type
1398
errors :: Validated a -> ErrorInfo
1399
errors (Validated e _) = e
1401
valid1 :: (a -> b) -> Validated a -> Validated b
1402
valid1 f (Validated e x) = Validated e (f x)
1404
valid2 :: (a -> b -> c) -> Validated a -> Validated b -> Validated c
1405
valid2 f (Validated e x) (Validated e' y) = Validated (e ++ e') (f x y)
1407
listV :: Validated a -> Validated [a]
1408
listV x = valid1 (\y -> [y]) x
1410
mapV :: (a -> Validated b) -> [a] -> Validated [b]
1412
mapV f [] = Validated [] []
1413
mapV f (x:xs) = valid2 (\ x xs -> (x:xs)) (f x) (mapV f xs)
1415
concatMapV :: (a -> Validated [b]) -> [a] -> Validated [b]
1416
concatMapV f xs = valid1 concat (mapV f xs)
1418
dropMapV :: (a -> Validated (Maybe b)) -> [a] -> Validated [b]
1419
dropMapV f [] = Validated [] []
1420
dropMapV f (x:xs) = valid2 maybeCons (f x) (dropMapV f xs)
1422
maybeCons :: (Maybe a) -> [a] -> [a]
1423
maybeCons Nothing x = x
1424
maybeCons (Just x) xs = (x:xs)
1427
<p>This does not yet deal with element-name context. To deal with this,
1428
we would need to change each of the Actions that has a Mode parameter
1429
to take a more complex structure.</p>