4
>Class-based processing of the node tree</TITLE
7
CONTENT="Modular DocBook HTML Stylesheet Version 1.46"><LINK
9
TITLE="The PXP user's guide"
10
HREF="index.html"><LINK
13
HREF="c533.html"><LINK
15
TITLE="How to parse a document from an application"
16
HREF="x550.html"><LINK
18
TITLE="Example: An HTML backend for the readme
20
HREF="x740.html"><LINK
23
HREF="markup.css"></HEAD
42
>The PXP user's guide</TH
57
>Chapter 2. Using <SPAN
80
>2.3. Class-based processing of the node tree</A
83
>By default, the parsed node tree consists of objects of the same class; this is
84
a good design as long as you want only to access selected parts of the
85
document. For complex transformations, it may be better to use different
86
classes for objects describing different element types.</P
88
>For example, if the DTD declares the element types <TT
98
>, and if the task is to convert
99
an arbitrary document into a printable format, the idea is to define for every
100
element type a separate class that has a method <TT
114
>, and every class implements
118
> such that elements of the type corresponding to the
119
class are converted to the output format.</P
121
>The parser supports such a design directly. As it is impossible to derive
122
recursive classes in O'Caml<A
126
>, the specialized element classes cannot be formed by
127
simply inheriting from the built-in classes of the parser and adding methods
128
for customized functionality. To get around this limitation, every node of the
129
document tree is represented by <I
132
> objects, one called
133
"the node" and containing the recursive definition of the tree, one called "the
134
extension". Every node object has a reference to the extension, and the
135
extension has a reference to the node. The advantage of this model is that it
136
is now possible to customize the extension without affecting the typing
137
constraints of the recursive node definition.</P
139
>Every extension must have the three methods <TT
153
> creates a deep copy of the extension object and
157
> returns the node object for this extension
161
> is used to tell the extension object
162
which node is associated with it, this method is automatically called when the
163
node tree is initialized. The following definition is a good starting point
164
for these methods; usually <TT
167
> must be further refined
168
when instance variables are added to the class:
171
CLASS="PROGRAMLISTING"
172
>class custom_extension =
175
val mutable node = (None : custom_extension node option)
177
method clone = {< >}
189
This part of the extension is usually the same for all classes, so it is a good
192
>custom_extension</TT
193
> as the super-class of the
194
further class definitions. Continuining the example of above, we can define the
195
element type classes as follows:
198
CLASS="PROGRAMLISTING"
199
>class virtual custom_extension =
201
... clone, node, set_node defined as above ...
203
method virtual print : out_channel -> unit
208
inherit custom_extension
209
method print ch = ...
214
inherit custom_extension
215
method print ch = ...
220
inherit custom_extension
221
method print ch = ...
228
> can now be implemented for every element
229
type separately. Note that you get the associated node by invoking
232
CLASS="PROGRAMLISTING"
236
and you get the extension object of a node <TT
242
CLASS="PROGRAMLISTING"
246
It is guaranteed that
249
CLASS="PROGRAMLISTING"
250
>self # node # extension == self</PRE
255
>Here are sample definitions of the <TT
262
CLASS="PROGRAMLISTING"
265
inherit custom_extension
267
(* Nodes <a>...</a> are only containers: *)
268
output_string ch "(";
270
(fun n -> n # extension # print ch)
271
(self # node # sub_nodes);
272
output_string ch ")";
277
inherit custom_extension
279
(* Print the value of the CDATA attribute "print": *)
280
match self # node # attribute "print" with
281
Value s -> output_string ch s
282
| Implied_value -> output_string ch "<missing>"
283
| Valuelist l -> assert false
284
(* not possible because the att is CDATA *)
289
inherit custom_extension
291
(* Print the contents of this element: *)
292
output_string ch (self # node # data)
295
class null_extension =
297
inherit custom_extension
298
method print ch = assert false
302
>The remaining task is to configure the parser such that these extension classes
303
are actually used. Here another problem arises: It is not possible to
304
dynamically select the class of an object to be created. As workaround,
308
> allows the user to specify <I
312
the various element types; instead of creating the nodes of the tree by
316
> operator the nodes are produced by
317
duplicating the exemplars. As object duplication preserves the class of the
318
object, one can create fresh objects of every class for which previously an
319
exemplar has been registered.</P
321
>Exemplars are meant as objects without contents, the only interesting thing is
322
that exemplars are instances of a certain class. The creation of an exemplar
323
for an element node can be done by:
326
CLASS="PROGRAMLISTING"
327
>let element_exemplar = new element_impl extension_exemplar</PRE
330
And a data node exemplar is created by:
333
CLASS="PROGRAMLISTING"
334
>let data_exemplar = new data_impl extension_exemplar</PRE
344
are defined in the module <TT
348
initialize the fresh objects as empty objects, i.e. without children, without
349
data contents, and so on. The <TT
351
>extension_exemplar</TT
353
initial extension object the exemplars are associated with. </P
355
>Once the exemplars are created and stored somewhere (e.g. in a hash table), you
356
can take an exemplar and create a concrete instance (with contents) by
357
duplicating it. As user of the parser you are normally not concerned with this
358
as this is part of the internal logic of the parser, but as background knowledge
359
it is worthwhile to mention that the two methods
367
perform the duplication of the exemplar for which they are invoked,
368
additionally apply modifications to the clone, and finally return the new
369
object. Moreover, the extension object is copied, too, and the new node object
370
is associated with the fresh extension object. Note that this is the reason why
371
every extension object must have a <TT
376
>The configuration of the set of exemplars is passed to the
379
>parse_document_entity</TT
380
> function as third argument. In our
381
example, this argument can be set up as follows:
384
CLASS="PROGRAMLISTING"
387
~data_exemplar: (new data_impl (new null_extension))
388
~default_element_exemplar: (new element_impl (new null_extension))
390
[ "a", new element_impl (new eltype_a);
391
"b", new element_impl (new eltype_b);
392
"c", new element_impl (new eltype_c);
400
> function argument defines the mapping
401
from element types to exemplars as associative list. The argument
405
> specifies the exemplar for data nodes, and
408
>~default_element_exemplar</TT
409
> is used whenever the parser
410
finds an element type for which the associative list does not define an
413
>The configuration is now complete. You can still use the same parsing
414
functions, only the initialization is a bit different. For example, call the
418
CLASS="PROGRAMLISTING"
419
>let d = parse_document_entity default_config (from_file "doc.xml") spec</PRE
422
Note that the resulting document <TT
429
> method we added is visible. So you can
430
print your document by
433
CLASS="PROGRAMLISTING"
434
>d # root # extension # print stdout</PRE
437
>This object-oriented approach looks rather complicated; this is mostly caused
438
by working around some problems of the strict typing system of O'Caml. Some
439
auxiliary concepts such as extensions were needed, but the practical
440
consequences are low. In the next section, one of the examples of the
441
distribution is explained, a converter from <I
445
documents to HTML.</P
461
HREF="x677.html#AEN690"
469
>The problem is that the subclass is
470
usually not a subtype in this case because O'Caml has a contravariant subtyping
515
>How to parse a document from an application</TD
528
>Example: An HTML backend for the <I
b'\\ No newline at end of file'