~ubuntu-branches/ubuntu/precise/libxml++2.6/precise

<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Parsers</title><meta name="generator" content="DocBook XSL Stylesheets V1.49"><link rel="home" href="index.html" title="libxml++ - An XML Parser for C++"><link rel="up" href="index.html" title="libxml++ - An XML Parser for C++"><link rel="previous" href="index.html" title="libxml++ - An XML Parser for C++"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Parsers</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="index.html">Prev</a>�</td><th width="60%" align="center">�</th><td width="20%" align="right">�</td></tr></table><hr></div><div class="sect1"><div class="titlepage"><div><h2 class="title" style="clear: both"><a name="parsers"></a>Parsers</h2></div></div><p>Like the underlying libxml library, libxml++ allows the use of 3 parsers, depending on your needs - the DOM, SAX, and TextReader parsers. The relative advantages and behaviour of these parsers will be explained here.</p><p>All of the parsers may parse XML documents directly from disk, a string, or a C++ std::istream. Although the libxml++ API uses only Glib::ustring, and therefore the UTF-8 encoding, libxml++ can parse documents in any encoding, converting to UTF-8 automatically. This conversion will not lose any information because UTF-8 can represent any locale.</p><p>Remember that white space is usually significant in XML documents, so the parsers might provide unexpected text nodes that contain only spaces and new lines. The parser does not know whether you care about these text nodes, but your application may choose to ignore them.</p><div class="sect2"><div class="titlepage"><div><h3 class="title"><a name="id2402161"></a>DOM Parser</h3></div></div><p>The DOM parser parses the whole document at once and stores the structure in memory, available via <tt>Parser::get_document()</tt>. With methods such as <tt>Document::get_root_node()</tt> and <tt>Node::get_children()</tt>, you may then navigate into the heirarchy of XML nodes without restriction, jumping forwards or backwards in the document based on the information that you encounter. Therefore the DOM parser uses a relatively large amount of memory.</p><p>You should use C++ RTTI (via <tt>dynamic_cast<></tt>) to identify the specific node type and to perform actions which are not possible with all node types. For instance, only <tt>Element</tt>s have attributes. Here is the inheritance hierarchy of node types:</p><p>

</p><div class="itemizedlist"><ul type="disc"><li>xmlpp::Node:

<div class="itemizedlist"><ul type="round"><li>xmlpp::Attribute</li><li>xmlpp::ContentNode

<div class="itemizedlist"><ul type="square"><li>xmlpp::CdataNode</li><li>xmlpp::CommentNode</li><li>xmlpp::ProcessingInstructionNode</li><li>xmlpp::TextNode</li></ul></div></li><li>xmlpp::Element</li><li>xmlpp::EntityReference</li></ul></div></li></ul></div><p>

</p><p>Although you may obtain pointers to the <tt>Node</tt>s, these <tt>Node</tt>s are always owned by their parent Nodes. In most cases that means that the Node will exist, and your pointer will be valid, as long as the <tt>Document</tt> instance exists.</p><p>There are also several methods which can create new child <tt>Node</tt>s. By using these, and one of the <tt>Document::write_*()</tt> methods, you can use libxml++ to build a new XML document.</p><div class="sect3"><div class="titlepage"><div><h4 class="title"><a name="id2402298"></a>Example</h4></div></div><p>This example looks in the document for expected elements and then examines them.</p><p><a href="../../../examples/dom_parser" target="_top">Source Code</a></p><p>File: main.cc

#ifdef HAVE_CONFIG_H

#include <config.h>

#endif

#include <libxml++/libxml++.h>

#include <iostream>

void print_indentation(unsigned int indentation)

{

for(unsigned int i = 0; i < indentation; ++i)

std::cout << " ";

}

void print_node(const xmlpp::Node* node, unsigned int indentation = 0)

{

std::cout << std::endl; //Separate nodes by an empty line.

const xmlpp::ContentNode* nodeContent = dynamic_cast<const xmlpp::ContentNode*>(node);

const xmlpp::TextNode* nodeText = dynamic_cast<const xmlpp::TextNode*>(node);

const xmlpp::CommentNode* nodeComment = dynamic_cast<const xmlpp::CommentNode*>(node);

if(nodeText && nodeText->is_white_space()) //Let's ignore the indenting - you don't always want to do this.

return;

Glib::ustring nodename = node->get_name();

if(!nodeText && !nodeComment && !nodename.empty()) //Let's not say "name: text".

{

print_indentation(indentation);

std::cout << "Node name = " << node->get_name() << std::endl;

std::cout << "Node name = " << nodename << std::endl;

}

else if(nodeText) //Let's say when it's text. - e.g. let's say what that white space is.

{

print_indentation(indentation);

std::cout << "Text Node" << std::endl;

}

//Treat the various node types differently:

if(nodeText)

{

print_indentation(indentation);

std::cout << "text = \"" << nodeText->get_content() << "\"" << std::endl;

}

else if(nodeComment)

{

print_indentation(indentation);

std::cout << "comment = " << nodeComment->get_content() << std::endl;

}

else if(nodeContent)

{

print_indentation(indentation);

std::cout << "content = " << nodeContent->get_content() << std::endl;

}

else if(const xmlpp::Element* nodeElement = dynamic_cast<const xmlpp::Element*>(node))

{

//A normal Element node:

//line() works only for ElementNodes.

print_indentation(indentation);

std::cout << " line = " << node->get_line() << std::endl;

//Print attributes:

const xmlpp::Element::AttributeList& attributes = nodeElement->get_attributes();

for(xmlpp::Element::AttributeList::const_iterator iter = attributes.begin(); iter != attributes.end(); ++iter)

{

const xmlpp::Attribute* attribute = *iter;

print_indentation(indentation);

std::cout << " Attribute " << attribute->get_name() << " = " << attribute->get_value() << std::endl;

}

const xmlpp::Attribute* attribute = nodeElement->get_attribute("title");

if(attribute)

{

std::cout << "title found: =" << attribute->get_value() << std::endl;

}

if(!nodeContent)

{

//Recurse through child nodes:

xmlpp::Node::NodeList list = node->get_children();

for(xmlpp::Node::NodeList::iterator iter = list.begin(); iter != list.end(); ++iter)

{

print_node(*iter, indentation + 2); //recursive

}

int main(int argc, char* argv[])

{

100

Glib::ustring filepath;

101

if(argc > 1 )

102

filepath = argv[1]; //Allow the user to specify a different XML file to parse.

103

else

104

filepath = "example.xml";

105

106

try

107

{

108

xmlpp::DomParser parser;

109

parser.set_validate();

110

parser.set_substitute_entities(); //We just want the text to be resolved/unescaped automatically.

111

parser.parse_file(filepath);

112

if(parser)

113

{

114

//Walk the tree:

115

const xmlpp::Node* pNode = parser.get_document()->get_root_node(); //deleted by DomParser.

116

print_node(pNode);

117

}

118

}

119

catch(const std::exception& ex)

120

{

121

std::cout << "Exception caught: " << ex.what() << std::endl;

122

}

123

124

return 0;

125

}

126

127

</pre>

128

</p></div></div><div class="sect2"><div class="titlepage"><div><h3 class="title"><a name="id2402334"></a>SAX Parser</h3></div></div><p>The SAX parser presents each node of the XML document in sequence. So when you process one node, you must have already stored information about any relevant previous nodes, and you have no information at that time about subsequent nodes. The SAX parser uses less memory than the DOM parser and it is a suitable abstraction for documents that can be processed sequentially rather than as a whole.</p><p>By using the <tt>parse_chunk()</tt> method instead of <tt>parse()</tt>, you can even parse parts of the XML document before you have received the whole document.</p><p>As shown in the example, you should derive your own class from SaxParser and override some of the virtual methods. These "handler" methods will be called while the document is parsed.</p><div class="sect3"><div class="titlepage"><div><h4 class="title"><a name="id2402374"></a>Example</h4></div></div><p>This example shows how the handler methods are called during parsing.</p><p><a href="../../../examples/sax_parser" target="_top">Source Code</a></p><p>File: myparser.h

129

130

#ifndef __LIBXMLPP_EXAMPLES_MYPARSER_H

131

#define __LIBXMLPP_EXAMPLES_MYPARSER_H

132

133

#include <libxml++/libxml++.h>

134

135

class MySaxParser : public xmlpp::SaxParser

136

{

137

public:

138

MySaxParser();

139

virtual ~MySaxParser();

140

141

protected:

142

//overrides:

143

virtual void on_start_document();

144

virtual void on_end_document();

145

virtual void on_start_element(const Glib::ustring& name,

146

const AttributeList& properties);

147

virtual void on_end_element(const Glib::ustring& name);

148

virtual void on_characters(const Glib::ustring& characters);

149

virtual void on_comment(const Glib::ustring& text);

150

virtual void on_warning(const Glib::ustring& text);

151

virtual void on_error(const Glib::ustring& text);

152

virtual void on_fatal_error(const Glib::ustring& text);

153

};

154

155

156

#endif //__LIBXMLPP_EXAMPLES_MYPARSER_H

157

</pre>

158

</p><p>File: main.cc

159

160

#ifdef HAVE_CONFIG_H

161

#include <config.h>

162

#endif

163

164

#include <fstream>

165

#include <iostream>

166

167

#include "myparser.h"

168

169

int

170

main(int argc, char* argv[])

171

{

172

Glib::ustring filepath;

173

if(argc > 1 )

174

filepath = argv[1]; //Allow the user to specify a different XML file to parse.

175

else

176

filepath = "example.xml";

177

178

// Parse the entire document in one go:

179

try

180

{

181

MySaxParser parser;

182

parser.set_substitute_entities(true); //

183

parser.parse_file(filepath);

184

}

185

catch(const xmlpp::exception& ex)

186

{

187

std::cout << "libxml++ exception: " << ex.what() << std::endl;

188

}

189

190

191

// Demonstrate incremental parsing, sometimes useful for network connections:

192

{

193

//std::cout << "Incremental SAX Parser:" << std:endl;

194

195

std::ifstream is(filepath.c_str());

196

char buffer[64];

197

198

MySaxParser parser;

199

do {

200

is.read(buffer, 63);

201

Glib::ustring input(buffer, is.gcount());

202

203

parser.parse_chunk(input);

204

}

205

while(is);

206

207

parser.finish_chunk_parsing();

208

}

209

210

211

return 0;

212

}

213

214

</pre>

215

</p><p>File: myparser.cc

216

217

#include "myparser.h"

218

219

#include <iostream>

220

221

MySaxParser::MySaxParser()

222

: xmlpp::SaxParser()

223

{

224

}

225

226

MySaxParser::~MySaxParser()

227

{

228

}

229

230

void MySaxParser::on_start_document()

231

{

232

std::cout << "on_start_document()" << std::endl;

233

}

234

235

void MySaxParser::on_end_document()

236

{

237

std::cout << "on_end_document()" << std::endl;

238

}

239

240

void MySaxParser::on_start_element(const Glib::ustring& name,

241

const AttributeList& attributes)

242

{

243

std::cout << "node name=" << name << std::endl;

244

245

// Print attributes:

246

for(xmlpp::SaxParser::AttributeList::const_iterator iter = attributes.begin(); iter != attributes.end(); ++iter)

247

{

248

std::cout << " Attribute " << iter->name << " = " << iter->value << std::endl;

249

}

250

}

251

252

void MySaxParser::on_end_element(const Glib::ustring& name)

253

{

254

std::cout << "on_end_element()" << std::endl;

255

}

256

257

void MySaxParser::on_characters(const Glib::ustring& text)

258

{

259

std::cout << "on_characters(): " << text << std::endl;

260

}

261

262

void MySaxParser::on_comment(const Glib::ustring& text)

263

{

264

std::cout << "on_comment(): " << text << std::endl;

265

}

266

267

void MySaxParser::on_warning(const Glib::ustring& text)

268

{

269

std::cout << "on_warning(): " << text << std::endl;

270

}

271

272

void MySaxParser::on_error(const Glib::ustring& text)

273

{

274

std::cout << "on_error(): " << text << std::endl;

275

}

276

277

void MySaxParser::on_fatal_error(const Glib::ustring& text)

278

{

279

std::cout << "on_fatal_error(): " << text << std::endl;

280

}

281

282

</pre>

283

</p></div></div><div class="sect2"><div class="titlepage"><div><h3 class="title"><a name="id2395103"></a>TextReader Parser</h3></div></div><p>Like the SAX parser, the TextReader parser is suitable for sequential parsing, but instead of implementing handlers for specific parts of the document, it allows you to detect the current node type, process the node accordingly, and skip forward in the document as much as necessary. Unlike the DOM parser, you may not move backwards in the XML document. And unlike the SAX parser, you must not waste time processing nodes that do not interest you. </p><p>All methods are on the single parser instance, but their result depends on the current context. For instance, use <tt>read()</tt> to move to the next node, and <tt>move_to_element()</tt> to navigate to child nodes. These methods will return false when no more nodes are available. Then use methods such as <tt>get_name()</tt> and <tt>get_value()</tt> to examine the elements and their attributes.</p><div class="sect3"><div class="titlepage"><div><h4 class="title"><a name="id2395151"></a>Example</h4></div></div><p>This example examines each node in turn, then moves to the next node.</p><p><a href="../../../examples/textreader" target="_top">Source Code</a></p><p>File: main.cc

284

285

#ifdef HAVE_CONFIG_H

286

#include <config.h>

287

#endif

288

289

#include <libxml++/libxml++.h>

290

#include <libxml++/parsers/textreader.h>

291

292

#include <iostream>

293

294

struct indent {

295

int depth_;

296

indent(int depth): depth_(depth) {};

297

};

298

299

std::ostream & operator<<(std::ostream & o, indent const & in)

300

{

301

for(int i = 0; i != in.depth_; ++i)

302

{

303

o << " ";

304

}

305

return o;

306

}

307

308

int

309

main(int argc, char* argv[])

310

{

311

try

312

{

313

xmlpp::TextReader reader("example.xml");

314

315

while(reader.read())

316

{

317

int depth = reader.get_depth();

318

std::cout << indent(depth) << "--- node ---" << std::endl;

319

std::cout << indent(depth) << "name: " << reader.get_name() << std::endl;

320

std::cout << indent(depth) << "depth: " << reader.get_depth() << std::endl;

321

322

if(reader.has_attributes())

323

{

324

std::cout << indent(depth) << "attributes: " << std::endl;

325

reader.move_to_first_attribute();

326

327

{

328

std::cout << indent(depth) << " " << reader.get_name() << ": " << reader.get_value() << std::endl;

329

} while(reader.move_to_next_attribute());

330

reader.move_to_element();

331

}

332

else

333

{

334

std::cout << indent(depth) << "no attributes" << std::endl;

335

}

336

337

if(reader.has_value())

338

std::cout << indent(depth) << "value: '" << reader.get_value() << "'" << std::endl;

339

else

340

std::cout << indent(depth) << "novalue" << std::endl;

341

342

}

343

}

344

catch(const std::exception& e)

345

{

346

std::cout << "Exception caught: " << e.what() << std::endl;

347

}

348

}

349

350

</pre>

351

</p></div></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="index.html">Prev</a>�</td><td width="20%" align="center"><a accesskey="u" href="index.html">Up</a></td><td width="40%" align="right">�</td></tr><tr><td width="40%" align="left" valign="top">libxml++ - An XML Parser for C++�</td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top">�</td></tr></table></div></body></html>

Older »