~ubuntu-branches/ubuntu/lucid/libxml-libxml-perl/lucid

Viewing changes to lib/XML/LibXML/Parser.pod

Committer: Bazaar Package Importer
Author(s): gregor herrmann
Date: 2009-05-31 14:36:13 UTC
mfrom: (4.1.7 jaunty)
Revision ID: james.westby@ubuntu.com-20090531143613-xxpnwmrz62kwtejq

Tags: 1.69.ds-2

http://bugs.debian.org/523275

* Remove Florian Ragwitz from Uploaders (closes: #523275).
* Set Standards-Version to 3.8.1 (no changes).
* Remove duplicate fields from debian/control.
* Minimize debian/rules, bump quilt and debhelper build dependencies.

files added:
debian/README.source

debian/libxml-libxml-perl.docs

debian/libxml-libxml-perl.examples

debian/patches/no_linking_with_libm.patch

debian/repack.sh

example/utf-16-1.html

example/utf-16-2.html

example/utf-16-2.xml

lib/XML/LibXML/ErrNo.pm

lib/XML/LibXML/ErrNo.pod

lib/XML/LibXML/Error.pm

lib/XML/LibXML/Error.pod

lib/XML/LibXML/Pattern.pod

lib/XML/LibXML/XPathExpression.pod

t/21catalog.t

t/60struct_error.t

t/61error.t

t/80registryleak.t

files modified:
Changes

LibXML.pm

LibXML.pod

LibXML.xs

MANIFEST

META.yml

Makefile.PL

README

debian/changelog

debian/compat

debian/control

debian/copyright

debian/patches/fix_manpage_typos

debian/patches/series

debian/rules

debian/watch

docs/libxml.dbk

dom.c

dom.h

example/xmllibxmldocs.pl

lib/XML/LibXML/Attr.pod

lib/XML/LibXML/Boolean.pm

lib/XML/LibXML/CDATASection.pod

lib/XML/LibXML/Comment.pod

lib/XML/LibXML/DOM.pod

lib/XML/LibXML/Document.pod

lib/XML/LibXML/Dtd.pod

lib/XML/LibXML/Element.pod

lib/XML/LibXML/InputCallback.pod

lib/XML/LibXML/Literal.pm

lib/XML/LibXML/Namespace.pod

lib/XML/LibXML/Node.pod

lib/XML/LibXML/NodeList.pm

lib/XML/LibXML/Number.pm

lib/XML/LibXML/PI.pod

lib/XML/LibXML/Parser.pod

lib/XML/LibXML/Reader.pm

lib/XML/LibXML/Reader.pod

lib/XML/LibXML/RelaxNG.pod

lib/XML/LibXML/SAX.pm

lib/XML/LibXML/SAX.pod

lib/XML/LibXML/SAX/Builder.pm

lib/XML/LibXML/SAX/Builder.pod

lib/XML/LibXML/SAX/Generator.pm

lib/XML/LibXML/SAX/Parser.pm

lib/XML/LibXML/Schema.pod

lib/XML/LibXML/Text.pod

lib/XML/LibXML/XPathContext.pm

lib/XML/LibXML/XPathContext.pod

perl-libxml-mm.c

perl-libxml-mm.h

perl-libxml-sax.c

t/02parse.t

t/03doc.t

t/06elements.t

t/09xpath.t

t/10ns.t

t/14sax.t

t/30xpathcontext.t

t/40reader.t

t/90threads.t

typemap

xpath.c

xpath.h

Show diffs side-by-side

added added

removed removed

lib/XML/LibXML/Parser.pod

use XML::LibXML;

my $parser = XML::LibXML->new();

my $doc = $parser->parse_string(<<'EOT');

<some-xml/>

EOT

my $fdoc = $parser->parse_file( $xmlfile );

my $fhdoc = $parser->parse_fh( $xmlstream );

my $fragment = $parser->parse_xml_chunk( $xml_wb_chunk );

$parser = XML::LibXML->new();

=over 4

=item B<new>

=item new

$parser = XML::LibXML->new();

116

117

XML::LibXML's DOM parser is not only capable to parse XML data, but also

118

(strict) HTML files. There are three ways to parse documents - as a string, as

119

a Perl filehandle, or as a filename/URL. The return value from each is a L<<<<<< XML::LibXML DOM Document Class|XML::LibXML DOM Document Class >>>>>> object, which is a DOM object.

119

a Perl filehandle, or as a filename/URL. The return value from each is a L<<<<<< XML::LibXML::Document >>>>>> object, which is a DOM object.

120

121

All of the functions listed below will throw an exception if the document is

122

invalid. To prevent this causing your program exiting, wrap the call in an

124

125

=over 4

126

127

=item B<parse_file>

127

=item parse_file

128

129

$doc = $parser->parse_file( $xmlfilename );

130

133

the fastest choice, about 6-8 times faster then parse_fh().

134

135

136

=item B<parse_fh>

136

=item parse_fh

137

138

$doc = $parser->parse_fh( $io_fh );

139

148

my $doc = $parser->parse_fh( $io_fh, $baseuri );

149

150

151

=item B<parse_string>

151

=item parse_string

152

153

$doc = $parser->parse_string( $xmlstring);

154

161

my $doc = $parser->parse_string( $xmlstring, $baseuri );

162

163

164

=item B<parse_html_file>

164

=item parse_html_file

165

166

$doc = $parser->parse_html_file( $htmlfile, \%opts );

167

170

171

An optional second argument can be used to pass some options to the HTML parser

172

as a HASH reference. Possible options are: Possible options are: encoding and

173

URI for libxml2 < 2.6.27, and for later versions of libxml2 additionally:

174

recover, suppress_errors, suppress_warnings, pedantic_parser, no_blanks, and

175

no_network.

176

177

178

=item B<parse_html_fh>

173

URI, and with libxml2 > 2.6.27 additionally: recover, suppress_errors,

174

suppress_warnings, pedantic_parser, no_blanks, and no_network.

175

176

177

=item parse_html_fh

179

178

180

179

$doc = $parser->parse_html_fh( $io_fh, \%opts );

181

180

182

181

Similar to parse_fh() but parses HTML (strict) streams.

183

182

184

183

An optional second argument can be used to pass some options to the HTML parser

185

as a HASH reference. Possible options are: encoding and URI for libxml2 <

186

2.6.27, and for later versions of libxml2 additionally: recover,

187

suppress_errors, suppress_warnings, pedantic_parser, no_blanks, and no_network.

188

Note: encoding option may not work correctly with this function in libxml2 <

189

2.6.27 if the HTML file declares charset using a META tag.

190

191

192

=item B<parse_html_string>

184

as a HASH reference. Possible options are: encoding and URI, and with libxml2 >

185

2.6.27 additionally: recover, suppress_errors, suppress_warnings,

186

pedantic_parser, no_blanks, and no_network. Note: encoding option may not work

187

correctly with this function in libxml2 < 2.6.27 if the HTML file declares

188

charset using a META tag.

189

190

191

=item parse_html_string

193

192

194

193

$doc = $parser->parse_html_string( $htmlstring, \%opts );

195

194

196

195

Similar to parse_string() but parses HTML (strict) strings.

197

196

198

197

An optional second argument can be used to pass some options to the HTML parser

199

as a HASH reference. Possible options are: encoding and URI for libxml2 <

200

2.6.27, and for later versions of libxml2 additionally: recover,

201

suppress_errors, suppress_warnings, pedantic_parser, no_blanks, and no_network.

198

as a HASH reference. Possible options are: encoding and URI, and with libxml2 >

199

2.6.27 additionally: recover, suppress_errors, suppress_warnings,

200

pedantic_parser, no_blanks, and no_network.

202

201

203

202

204

203

219

218

220

219

=over 4

221

220

222

=item B<parse_balanced_chunk>

221

=item parse_balanced_chunk

223

222

224

223

$fragment = $parser->parse_balanced_chunk( $wbxmlstring );

225

224

226

This function parses a well balanced XML string into a L<<<<<< XML::LibXML's DOM L2 Document Fragment Implementation|XML::LibXML's DOM L2 Document Fragment Implementation >>>>>>.

227

228

229

=item B<parse_xml_chunk>

225

This function parses a well balanced XML string into a L<<<<<< XML::LibXML::DocumentFragment >>>>>>.

226

227

228

=item parse_xml_chunk

230

229

231

230

$fragment = $parser->parse_xml_chunk( $wbxmlstring );

232

231

243

242

244

243

=over 4

245

244

246

=item B<process_xincludes>

245

=item process_xincludes

247

246

248

247

$parser->process_xincludes( $doc );

249

248

265

264

the parsed document.

266

265

267

266

268

=item B<processXIncludes>

267

=item processXIncludes

269

268

270

269

$parser->processXIncludes( $doc );

271

270

299

298

300

299

=over 4

301

300

302

=item B<parse_chunk>

301

=item parse_chunk

303

302

304

303

$parser->parse_chunk($string, $terminate);

305

304

326

325

327

326

=over 4

328

327

329

=item B<start_push>

328

=item start_push

330

329

331

330

$parser->start_push();

332

331

333

332

Initializes the push parser.

334

333

335

334

336

=item B<push>

335

=item push

337

336

338

337

$parser->push(@data);

339

338

341

340

entry in @data must be a normal scalar!

342

341

343

342

344

=item B<finish_push>

343

=item finish_push

345

344

346

345

$doc = $parser->finish_push( $recover );

347

346

370

369

$doc = $parser->finish_push(1); # will return the data parsed

371

370

# unless an error happened

372

371

};

373

372

374

373

print $doc->toString(); # returns "<foo>bar</foo>"

375

374

376

375

Of course finish_push() will return nothing if there was no data pushed to the

408

407

entire document into a DOM and serialises it. Some people couldn't read that in

409

408

the paragraph above so I've added this warning.

410

409

411

If you want a streaming SAX parser look at the L<<<<<< XML::LibXML direct SAX parser|XML::LibXML direct SAX parser >>>>>> man page

410

If you want a streaming SAX parser look at the L<<<<<< XML::LibXML::SAX >>>>>> man page

412

411

413

412

414

413

=head1 SERIALIZATION

415

414

416

415

XML::LibXML provides some functions to serialize nodes and documents. The

417

serialization functions are described on the L<<<<<< Abstract Base Class of XML::LibXML Nodes|Abstract Base Class of XML::LibXML Nodes >>>>>> manpage or the L<<<<<< XML::LibXML DOM Document Class|XML::LibXML DOM Document Class >>>>>> manpage. XML::LibXML checks three global flags that alter the serialization

416

serialization functions are described on the L<<<<<< XML::LibXML::Node >>>>>> manpage or the L<<<<<< XML::LibXML::Document >>>>>> manpage. XML::LibXML checks three global flags that alter the serialization

418

417

process:

419

418

420

419

475

474

476

475

=over 4

477

476

478

=item B<validation>

477

=item validation

479

478

480

479

$parser->validation(1);

481

480

482

481

Turn validation on (or off). Defaults to off.

483

482

484

483

485

=item B<recover>

484

=item recover

486

485

487

486

$parser->recover(1);

488

487

502

501

$parser->recover_silently(1); or, equivalently, $parser->recover(2).

503

502

504

503

505

=item B<recover_silently>

504

=item recover_silently

506

505

507

506

$parser->recover_silently(1);

508

507

516

515

mode.

517

516

518

517

519

=item B<expand_entities>

518

=item expand_entities

520

519

521

520

$parser->expand_entities(0);

522

521

525

524

Probably not very useful for most purposes.

526

525

527

526

528

=item B<keep_blanks>

527

=item keep_blanks

529

528

530

529

$parser->keep_blanks(0);

531

530

533

532

white-space in the document.

534

533

535

534

536

=item B<pedantic_parser>

535

=item pedantic_parser

537

536

538

537

$parser->pedantic_parser(1);

539

538

540

539

You can make XML::LibXML more pedantic if you want to.

541

540

542

541

543

=item B<line_numbers>

542

=item line_numbers

544

543

545

544

$parser->line_numbers(1);

546

545

550

549

XML::LibXML::Node::line_number()).

551

550

552

551

IMPORTANT: Due to limitations in the libxml2 library line numbers greater than

553

65535 will be returned as 65535. Please see L<<<<<< http://bugzilla.gnome.org/show_bug.cgi?id=325533|http://bugzilla.gnome.org/show_bug.cgi?id=325533 >>>>>> for more details.

552

65535 will be returned as 65535. Please see L<<<<<< http://bugzilla.gnome.org/show_bug.cgi?id=325533 >>>>>> for more details.

554

553

555

554

By default line numbering is switched off (0).

556

555

557

556

558

=item B<load_ext_dtd>

557

=item load_ext_dtd

559

558

560

559

$parser->load_ext_dtd(1);

561

560

570

569

default is 1 (activated)

571

570

572

571

573

=item B<complete_attributes>

572

=item complete_attributes

574

573

575

574

$parser->complete_attributes(1);

576

575

578

577

By default, this option is enabled.

579

578

580

579

581

=item B<expand_xinclude>

580

=item expand_xinclude

582

581

583

582

$parser->expand_xinclude(1);

584

583

586

585

assures that the parser callbacks are used while parsing the included document.

587

586

588

587

589

=item B<load_catalog>

588

=item load_catalog

590

589

591

590

$parser->load_catalog( $catalog_file );

592

591

599

598

resolving systems at the same time.

600

599

601

600

602

=item B<base_uri>

601

=item base_uri

603

602

604

603

$parser->base_uri( $your_base_uri );

605

604

608

607

one has to set a separate base URI, that is then used for the parsed documents.

609

608

610

609

611

=item B<gdome_dom>

610

=item gdome_dom

612

611

613

612

$parser->gdome_dom(1);

614

613

624

623

XML::LibXML to use this library. For this you need to rebuild XML::LibXML!

625

624

626

625

627

=item B<clean_namespaces>

626

=item clean_namespaces

628

627

629

628

$parser->clean_namespaces( 1 );

630

629

633

632

default no namespace cleanup is done.

634

633

635

634

636

=item B<no_network>

635

=item no_network

637

636

638

637

$parser->no_network(1);

639

638

651

650

=head1 ERROR REPORTING

652

651

653

652

XML::LibXML throws exceptions during parsing, validation or XPath processing

654

(and some other occasions). These errors can be caught by using I<<<<<< eval >>>>>> blocks. The error then will be stored in I<<<<<< $@ >>>>>>.

653

(and some other occasions). These errors can be caught by using I<<<<<< eval >>>>>> blocks. The error then will be stored in I<<<<<< $@ >>>>>>. There are two implementations: the old one throws $@ which is a flat string,

654

in the new one $@ is an object from the class XML::LibXML::Error; this class

655

overrides the operator "" so that when printed, the object flattens to the

656

usual error message.

655

657

656

658

XML::LibXML throws errors as they occurs and does not wait if a user test for

657

659

them. This is a very common misunderstanding in the use of XML::LibXML. If the

Older »