~ubuntu-branches/ubuntu/natty/libxml++2.6/natty

<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Parsers</title><meta name="generator" content="DocBook XSL Stylesheets V1.73.2"><link rel="start" href="index.html" title="libxml++ - An XML Parser for C++"><link rel="up" href="index.html" title="libxml++ - An XML Parser for C++"><link rel="prev" href="index.html" title="libxml++ - An XML Parser for C++"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Parsers</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="index.html">Prev</a>�</td><th width="60%" align="center">�</th><td width="20%" align="right">�</td></tr></table><hr></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="parsers"></a>Parsers</h2></div></div></div><p>Like the underlying libxml library, libxml++ allows the use of 3 parsers, depending on your needs - the DOM, SAX, and TextReader parsers. The relative advantages and behaviour of these parsers will be explained here.</p><p>All of the parsers may parse XML documents directly from disk, a string, or a C++ std::istream. Although the libxml++ API uses only Glib::ustring, and therefore the UTF-8 encoding, libxml++ can parse documents in any encoding, converting to UTF-8 automatically. This conversion will not lose any information because UTF-8 can represent any locale.</p><p>Remember that white space is usually significant in XML documents, so the parsers might provide unexpected text nodes that contain only spaces and new lines. The parser does not know whether you care about these text nodes, but your application may choose to ignore them.</p><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2849563"></a>DOM Parser</h3></div></div></div><p>The DOM parser parses the whole document at once and stores the structure in memory, available via <code class="literal">Parser::get_document()</code>. With methods such as <code class="literal">Document::get_root_node()</code> and <code class="literal">Node::get_children()</code>, you may then navigate into the heirarchy of XML nodes without restriction, jumping forwards or backwards in the document based on the information that you encounter. Therefore the DOM parser uses a relatively large amount of memory.</p><p>You should use C++ RTTI (via <code class="literal">dynamic_cast<></code>) to identify the specific node type and to perform actions which are not possible with all node types. For instance, only <code class="literal">Element</code>s have attributes. Here is the inheritance hierarchy of node types:</p><p>

<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Parsers</title><meta name="generator" content="DocBook XSL Stylesheets V1.73.2"><link rel="start" href="index.html" title="libxml++ - An XML Parser for C++"><link rel="up" href="index.html" title="libxml++ - An XML Parser for C++"><link rel="prev" href="index.html" title="libxml++ - An XML Parser for C++"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Parsers</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="index.html">Prev</a>�</td><th width="60%" align="center">�</th><td width="20%" align="right">�</td></tr></table><hr></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="parsers"></a>Parsers</h2></div></div></div><p>Like the underlying libxml library, libxml++ allows the use of 3 parsers, depending on your needs - the DOM, SAX, and TextReader parsers. The relative advantages and behaviour of these parsers will be explained here.</p><p>All of the parsers may parse XML documents directly from disk, a string, or a C++ std::istream. Although the libxml++ API uses only Glib::ustring, and therefore the UTF-8 encoding, libxml++ can parse documents in any encoding, converting to UTF-8 automatically. This conversion will not lose any information because UTF-8 can represent any locale.</p><p>Remember that white space is usually significant in XML documents, so the parsers might provide unexpected text nodes that contain only spaces and new lines. The parser does not know whether you care about these text nodes, but your application may choose to ignore them.</p><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id305693"></a>DOM Parser</h3></div></div></div><p>The DOM parser parses the whole document at once and stores the structure in memory, available via <code class="literal">Parser::get_document()</code>. With methods such as <code class="literal">Document::get_root_node()</code> and <code class="literal">Node::get_children()</code>, you may then navigate into the heirarchy of XML nodes without restriction, jumping forwards or backwards in the document based on the information that you encounter. Therefore the DOM parser uses a relatively large amount of memory.</p><p>You should use C++ RTTI (via <code class="literal">dynamic_cast<></code>) to identify the specific node type and to perform actions which are not possible with all node types. For instance, only <code class="literal">Element</code>s have attributes. Here is the inheritance hierarchy of node types:</p><p>

</p><div class="itemizedlist"><ul type="disc"><li>xmlpp::Node:

<div class="itemizedlist"><ul type="circle"><li>xmlpp::Attribute</li><li>xmlpp::ContentNode

<div class="itemizedlist"><ul type="square"><li>xmlpp::CdataNode</li><li>xmlpp::CommentNode</li><li>xmlpp::ProcessingInstructionNode</li><li>xmlpp::TextNode</li></ul></div></li><li>xmlpp::Element</li><li>xmlpp::EntityReference</li></ul></div></li></ul></div><p>

</p><p>Although you may obtain pointers to the <code class="literal">Node</code>s, these <code class="literal">Node</code>s are always owned by their parent Nodes. In most cases that means that the Node will exist, and your pointer will be valid, as long as the <code class="literal">Document</code> instance exists.</p><p>There are also several methods which can create new child <code class="literal">Node</code>s. By using these, and one of the <code class="literal">Document::write_*()</code> methods, you can use libxml++ to build a new XML document.</p><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id2849880"></a>Example</h4></div></div></div><p>This example looks in the document for expected elements and then examines them. All these examples are included in the libxml++ source distribution.</p><p><a class="ulink" href="../../../examples/dom_parser" target="_top">Source Code</a></p><p>File: main.cc

</p><p>Although you may obtain pointers to the <code class="literal">Node</code>s, these <code class="literal">Node</code>s are always owned by their parent Nodes. In most cases that means that the Node will exist, and your pointer will be valid, as long as the <code class="literal">Document</code> instance exists.</p><p>There are also several methods which can create new child <code class="literal">Node</code>s. By using these, and one of the <code class="literal">Document::write_*()</code> methods, you can use libxml++ to build a new XML document.</p><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id306264"></a>Example</h4></div></div></div><p>This example looks in the document for expected elements and then examines them. All these examples are included in the libxml++ source distribution.</p><p><a class="ulink" href="../../../examples/dom_parser" target="_top">Source Code</a></p><p>File: main.cc

</p><pre class="programlisting">

#ifdef HAVE_CONFIG_H

#include <config.h>

137

}

138

139

</pre><p>

140

</p></div></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2849991"></a>SAX Parser</h3></div></div></div><p>The SAX parser presents each node of the XML document in sequence. So when you process one node, you must have already stored information about any relevant previous nodes, and you have no information at that time about subsequent nodes. The SAX parser uses less memory than the DOM parser and it is a suitable abstraction for documents that can be processed sequentially rather than as a whole.</p><p>By using the <code class="literal">parse_chunk()</code> method instead of <code class="literal">parse()</code>, you can even parse parts of the XML document before you have received the whole document.</p><p>As shown in the example, you should derive your own class from SaxParser and override some of the virtual methods. These "handler" methods will be called while the document is parsed.</p><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id2850032"></a>Example</h4></div></div></div><p>This example shows how the handler methods are called during parsing.</p><p><a class="ulink" href="../../../examples/sax_parser" target="_top">Source Code</a></p><p>File: myparser.h

140

</p></div></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id306336"></a>SAX Parser</h3></div></div></div><p>The SAX parser presents each node of the XML document in sequence. So when you process one node, you must have already stored information about any relevant previous nodes, and you have no information at that time about subsequent nodes. The SAX parser uses less memory than the DOM parser and it is a suitable abstraction for documents that can be processed sequentially rather than as a whole.</p><p>By using the <code class="literal">parse_chunk()</code> method instead of <code class="literal">parse()</code>, you can even parse parts of the XML document before you have received the whole document.</p><p>As shown in the example, you should derive your own class from SaxParser and override some of the virtual methods. These "handler" methods will be called while the document is parsed.</p><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id306370"></a>Example</h4></div></div></div><p>This example shows how the handler methods are called during parsing.</p><p><a class="ulink" href="../../../examples/sax_parser" target="_top">Source Code</a></p><p>File: myparser.h

141

</p><pre class="programlisting">

142

#ifndef __LIBXMLPP_EXAMPLES_MYPARSER_H

143

#define __LIBXMLPP_EXAMPLES_MYPARSER_H

359

}

360

361

</pre><p>

362

</p></div></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2850502"></a>TextReader Parser</h3></div></div></div><p>Like the SAX parser, the TextReader parser is suitable for sequential parsing, but instead of implementing handlers for specific parts of the document, it allows you to detect the current node type, process the node accordingly, and skip forward in the document as much as necessary. Unlike the DOM parser, you may not move backwards in the XML document. And unlike the SAX parser, you must not waste time processing nodes that do not interest you. </p><p>All methods are on the single parser instance, but their result depends on the current context. For instance, use <code class="literal">read()</code> to move to the next node, and <code class="literal">move_to_element()</code> to navigate to child nodes. These methods will return false when no more nodes are available. Then use methods such as <code class="literal">get_name()</code> and <code class="literal">get_value()</code> to examine the elements and their attributes.</p><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id2850551"></a>Example</h4></div></div></div><p>This example examines each node in turn, then moves to the next node.</p><p><a class="ulink" href="../../../examples/textreader" target="_top">Source Code</a></p><p>File: main.cc

362

</p></div></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id345058"></a>TextReader Parser</h3></div></div></div><p>Like the SAX parser, the TextReader parser is suitable for sequential parsing, but instead of implementing handlers for specific parts of the document, it allows you to detect the current node type, process the node accordingly, and skip forward in the document as much as necessary. Unlike the DOM parser, you may not move backwards in the XML document. And unlike the SAX parser, you must not waste time processing nodes that do not interest you. </p><p>All methods are on the single parser instance, but their result depends on the current context. For instance, use <code class="literal">read()</code> to move to the next node, and <code class="literal">move_to_element()</code> to navigate to child nodes. These methods will return false when no more nodes are available. Then use methods such as <code class="literal">get_name()</code> and <code class="literal">get_value()</code> to examine the elements and their attributes.</p><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id345097"></a>Example</h4></div></div></div><p>This example examines each node in turn, then moves to the next node.</p><p><a class="ulink" href="../../../examples/textreader" target="_top">Source Code</a></p><p>File: main.cc

363

</p><pre class="programlisting">

364

#ifdef HAVE_CONFIG_H

365

#include <config.h>

Older »