1
<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Parsers</title><meta name="generator" content="DocBook XSL Stylesheets V1.49"><link rel="home" href="index.html" title="libxml++ - An XML Parser for C++"><link rel="up" href="index.html" title="libxml++ - An XML Parser for C++"><link rel="previous" href="index.html" title="libxml++ - An XML Parser for C++"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Parsers</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="index.html">Prev</a>�</td><th width="60%" align="center">�</th><td width="20%" align="right">�</td></tr></table><hr></div><div class="sect1"><div class="titlepage"><div><h2 class="title" style="clear: both"><a name="parsers"></a>Parsers</h2></div></div><p>Like the underlying libxml library, libxml++ allows the use of 3 parsers, depending on your needs - the DOM, SAX, and TextReader parsers. The relative advantages and behaviour of these parsers will be explained here.</p><p>All of the parsers may parse XML documents directly from disk, a string, or a C++ std::istream. Although the libxml++ API uses only Glib::ustring, and therefore the UTF-8 encoding, libxml++ can parse documents in any encoding, converting to UTF-8 automatically. This conversion will not lose any information because UTF-8 can represent any locale.</p><p>Remember that white space is usually significant in XML documents, so the parsers might provide unexpected text nodes that contain only spaces and new lines. The parser does not know whether you care about these text nodes, but your application may choose to ignore them.</p><div class="sect2"><div class="titlepage"><div><h3 class="title"><a name="id2402161"></a>DOM Parser</h3></div></div><p>The DOM parser parses the whole document at once and stores the structure in memory, available via <tt>Parser::get_document()</tt>. With methods such as <tt>Document::get_root_node()</tt> and <tt>Node::get_children()</tt>, you may then navigate into the heirarchy of XML nodes without restriction, jumping forwards or backwards in the document based on the information that you encounter. Therefore the DOM parser uses a relatively large amount of memory.</p><p>You should use C++ RTTI (via <tt>dynamic_cast<></tt>) to identify the specific node type and to perform actions which are not possible with all node types. For instance, only <tt>Element</tt>s have attributes. Here is the inheritance hierarchy of node types:</p><p>
2
</p><div class="itemizedlist"><ul type="disc"><li>xmlpp::Node:
3
<div class="itemizedlist"><ul type="round"><li>xmlpp::Attribute</li><li>xmlpp::ContentNode
4
<div class="itemizedlist"><ul type="square"><li>xmlpp::CdataNode</li><li>xmlpp::CommentNode</li><li>xmlpp::ProcessingInstructionNode</li><li>xmlpp::TextNode</li></ul></div></li><li>xmlpp::Element</li><li>xmlpp::EntityReference</li></ul></div></li></ul></div><p>
5
</p><p>Although you may obtain pointers to the <tt>Node</tt>s, these <tt>Node</tt>s are always owned by their parent Nodes. In most cases that means that the Node will exist, and your pointer will be valid, as long as the <tt>Document</tt> instance exists.</p><p>There are also several methods which can create new child <tt>Node</tt>s. By using these, and one of the <tt>Document::write_*()</tt> methods, you can use libxml++ to build a new XML document.</p><div class="sect3"><div class="titlepage"><div><h4 class="title"><a name="id2402298"></a>Example</h4></div></div><p>This example looks in the document for expected elements and then examines them.</p><p><a href="../../../examples/dom_parser" target="_top">Source Code</a></p><p>File: main.cc
6
<pre class="programlisting">
8
#include <config.h>
11
#include <libxml++/libxml++.h>
13
#include <iostream>
15
void print_indentation(unsigned int indentation)
17
for(unsigned int i = 0; i < indentation; ++i)
18
std::cout << " ";
21
void print_node(const xmlpp::Node* node, unsigned int indentation = 0)
23
std::cout << std::endl; //Separate nodes by an empty line.
25
const xmlpp::ContentNode* nodeContent = dynamic_cast<const xmlpp::ContentNode*>(node);
26
const xmlpp::TextNode* nodeText = dynamic_cast<const xmlpp::TextNode*>(node);
27
const xmlpp::CommentNode* nodeComment = dynamic_cast<const xmlpp::CommentNode*>(node);
29
if(nodeText && nodeText->is_white_space()) //Let's ignore the indenting - you don't always want to do this.
32
Glib::ustring nodename = node->get_name();
34
if(!nodeText && !nodeComment && !nodename.empty()) //Let's not say "name: text".
36
print_indentation(indentation);
37
std::cout << "Node name = " << node->get_name() << std::endl;
38
std::cout << "Node name = " << nodename << std::endl;
40
else if(nodeText) //Let's say when it's text. - e.g. let's say what that white space is.
42
print_indentation(indentation);
43
std::cout << "Text Node" << std::endl;
46
//Treat the various node types differently:
49
print_indentation(indentation);
50
std::cout << "text = \"" << nodeText->get_content() << "\"" << std::endl;
54
print_indentation(indentation);
55
std::cout << "comment = " << nodeComment->get_content() << std::endl;
59
print_indentation(indentation);
60
std::cout << "content = " << nodeContent->get_content() << std::endl;
62
else if(const xmlpp::Element* nodeElement = dynamic_cast<const xmlpp::Element*>(node))
64
//A normal Element node:
66
//line() works only for ElementNodes.
67
print_indentation(indentation);
68
std::cout << " line = " << node->get_line() << std::endl;
71
const xmlpp::Element::AttributeList& attributes = nodeElement->get_attributes();
72
for(xmlpp::Element::AttributeList::const_iterator iter = attributes.begin(); iter != attributes.end(); ++iter)
74
const xmlpp::Attribute* attribute = *iter;
75
print_indentation(indentation);
76
std::cout << " Attribute " << attribute->get_name() << " = " << attribute->get_value() << std::endl;
79
const xmlpp::Attribute* attribute = nodeElement->get_attribute("title");
82
std::cout << "title found: =" << attribute->get_value() << std::endl;
89
//Recurse through child nodes:
90
xmlpp::Node::NodeList list = node->get_children();
91
for(xmlpp::Node::NodeList::iterator iter = list.begin(); iter != list.end(); ++iter)
93
print_node(*iter, indentation + 2); //recursive
98
int main(int argc, char* argv[])
100
Glib::ustring filepath;
102
filepath = argv[1]; //Allow the user to specify a different XML file to parse.
104
filepath = "example.xml";
108
xmlpp::DomParser parser;
109
parser.set_validate();
110
parser.set_substitute_entities(); //We just want the text to be resolved/unescaped automatically.
111
parser.parse_file(filepath);
115
const xmlpp::Node* pNode = parser.get_document()->get_root_node(); //deleted by DomParser.
119
catch(const std::exception& ex)
121
std::cout << "Exception caught: " << ex.what() << std::endl;
128
</p></div></div><div class="sect2"><div class="titlepage"><div><h3 class="title"><a name="id2402334"></a>SAX Parser</h3></div></div><p>The SAX parser presents each node of the XML document in sequence. So when you process one node, you must have already stored information about any relevant previous nodes, and you have no information at that time about subsequent nodes. The SAX parser uses less memory than the DOM parser and it is a suitable abstraction for documents that can be processed sequentially rather than as a whole.</p><p>By using the <tt>parse_chunk()</tt> method instead of <tt>parse()</tt>, you can even parse parts of the XML document before you have received the whole document.</p><p>As shown in the example, you should derive your own class from SaxParser and override some of the virtual methods. These "handler" methods will be called while the document is parsed.</p><div class="sect3"><div class="titlepage"><div><h4 class="title"><a name="id2402374"></a>Example</h4></div></div><p>This example shows how the handler methods are called during parsing.</p><p><a href="../../../examples/sax_parser" target="_top">Source Code</a></p><p>File: myparser.h
129
<pre class="programlisting">
130
#ifndef __LIBXMLPP_EXAMPLES_MYPARSER_H
131
#define __LIBXMLPP_EXAMPLES_MYPARSER_H
133
#include <libxml++/libxml++.h>
135
class MySaxParser : public xmlpp::SaxParser
139
virtual ~MySaxParser();
143
virtual void on_start_document();
144
virtual void on_end_document();
145
virtual void on_start_element(const Glib::ustring& name,
146
const AttributeList& properties);
147
virtual void on_end_element(const Glib::ustring& name);
148
virtual void on_characters(const Glib::ustring& characters);
149
virtual void on_comment(const Glib::ustring& text);
150
virtual void on_warning(const Glib::ustring& text);
151
virtual void on_error(const Glib::ustring& text);
152
virtual void on_fatal_error(const Glib::ustring& text);
156
#endif //__LIBXMLPP_EXAMPLES_MYPARSER_H
159
<pre class="programlisting">
161
#include <config.h>
164
#include <fstream>
165
#include <iostream>
167
#include "myparser.h"
170
main(int argc, char* argv[])
172
Glib::ustring filepath;
174
filepath = argv[1]; //Allow the user to specify a different XML file to parse.
176
filepath = "example.xml";
178
// Parse the entire document in one go:
182
parser.set_substitute_entities(true); //
183
parser.parse_file(filepath);
185
catch(const xmlpp::exception& ex)
187
std::cout << "libxml++ exception: " << ex.what() << std::endl;
191
// Demonstrate incremental parsing, sometimes useful for network connections:
193
//std::cout << "Incremental SAX Parser:" << std:endl;
195
std::ifstream is(filepath.c_str());
201
Glib::ustring input(buffer, is.gcount());
203
parser.parse_chunk(input);
207
parser.finish_chunk_parsing();
215
</p><p>File: myparser.cc
216
<pre class="programlisting">
217
#include "myparser.h"
219
#include <iostream>
221
MySaxParser::MySaxParser()
226
MySaxParser::~MySaxParser()
230
void MySaxParser::on_start_document()
232
std::cout << "on_start_document()" << std::endl;
235
void MySaxParser::on_end_document()
237
std::cout << "on_end_document()" << std::endl;
240
void MySaxParser::on_start_element(const Glib::ustring& name,
241
const AttributeList& attributes)
243
std::cout << "node name=" << name << std::endl;
246
for(xmlpp::SaxParser::AttributeList::const_iterator iter = attributes.begin(); iter != attributes.end(); ++iter)
248
std::cout << " Attribute " << iter->name << " = " << iter->value << std::endl;
252
void MySaxParser::on_end_element(const Glib::ustring& name)
254
std::cout << "on_end_element()" << std::endl;
257
void MySaxParser::on_characters(const Glib::ustring& text)
259
std::cout << "on_characters(): " << text << std::endl;
262
void MySaxParser::on_comment(const Glib::ustring& text)
264
std::cout << "on_comment(): " << text << std::endl;
267
void MySaxParser::on_warning(const Glib::ustring& text)
269
std::cout << "on_warning(): " << text << std::endl;
272
void MySaxParser::on_error(const Glib::ustring& text)
274
std::cout << "on_error(): " << text << std::endl;
277
void MySaxParser::on_fatal_error(const Glib::ustring& text)
279
std::cout << "on_fatal_error(): " << text << std::endl;
283
</p></div></div><div class="sect2"><div class="titlepage"><div><h3 class="title"><a name="id2395103"></a>TextReader Parser</h3></div></div><p>Like the SAX parser, the TextReader parser is suitable for sequential parsing, but instead of implementing handlers for specific parts of the document, it allows you to detect the current node type, process the node accordingly, and skip forward in the document as much as necessary. Unlike the DOM parser, you may not move backwards in the XML document. And unlike the SAX parser, you must not waste time processing nodes that do not interest you. </p><p>All methods are on the single parser instance, but their result depends on the current context. For instance, use <tt>read()</tt> to move to the next node, and <tt>move_to_element()</tt> to navigate to child nodes. These methods will return false when no more nodes are available. Then use methods such as <tt>get_name()</tt> and <tt>get_value()</tt> to examine the elements and their attributes.</p><div class="sect3"><div class="titlepage"><div><h4 class="title"><a name="id2395151"></a>Example</h4></div></div><p>This example examines each node in turn, then moves to the next node.</p><p><a href="../../../examples/textreader" target="_top">Source Code</a></p><p>File: main.cc
284
<pre class="programlisting">
286
#include <config.h>
289
#include <libxml++/libxml++.h>
290
#include <libxml++/parsers/textreader.h>
292
#include <iostream>
296
indent(int depth): depth_(depth) {};
299
std::ostream & operator<<(std::ostream & o, indent const & in)
301
for(int i = 0; i != in.depth_; ++i)
309
main(int argc, char* argv[])
313
xmlpp::TextReader reader("example.xml");
317
int depth = reader.get_depth();
318
std::cout << indent(depth) << "--- node ---" << std::endl;
319
std::cout << indent(depth) << "name: " << reader.get_name() << std::endl;
320
std::cout << indent(depth) << "depth: " << reader.get_depth() << std::endl;
322
if(reader.has_attributes())
324
std::cout << indent(depth) << "attributes: " << std::endl;
325
reader.move_to_first_attribute();
328
std::cout << indent(depth) << " " << reader.get_name() << ": " << reader.get_value() << std::endl;
329
} while(reader.move_to_next_attribute());
330
reader.move_to_element();
334
std::cout << indent(depth) << "no attributes" << std::endl;
337
if(reader.has_value())
338
std::cout << indent(depth) << "value: '" << reader.get_value() << "'" << std::endl;
340
std::cout << indent(depth) << "novalue" << std::endl;
344
catch(const std::exception& e)
346
std::cout << "Exception caught: " << e.what() << std::endl;
351
</p></div></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="index.html">Prev</a>�</td><td width="20%" align="center"><a accesskey="u" href="index.html">Up</a></td><td width="40%" align="right">�</td></tr><tr><td width="40%" align="left" valign="top">libxml++ - An XML Parser for C++�</td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top">�</td></tr></table></div></body></html>