117
117
XML::LibXML's DOM parser is not only capable to parse XML data, but also
118
118
(strict) HTML files. There are three ways to parse documents - as a string, as
119
a Perl filehandle, or as a filename/URL. The return value from each is a L<<<<<< XML::LibXML DOM Document Class|XML::LibXML DOM Document Class >>>>>> object, which is a DOM object.
119
a Perl filehandle, or as a filename/URL. The return value from each is a L<<<<<< XML::LibXML::Document >>>>>> object, which is a DOM object.
121
121
All of the functions listed below will throw an exception if the document is
122
122
invalid. To prevent this causing your program exiting, wrap the call in an
171
171
An optional second argument can be used to pass some options to the HTML parser
172
172
as a HASH reference. Possible options are: Possible options are: encoding and
173
URI for libxml2 < 2.6.27, and for later versions of libxml2 additionally:
174
recover, suppress_errors, suppress_warnings, pedantic_parser, no_blanks, and
178
=item B<parse_html_fh>
173
URI, and with libxml2 > 2.6.27 additionally: recover, suppress_errors,
174
suppress_warnings, pedantic_parser, no_blanks, and no_network.
180
179
$doc = $parser->parse_html_fh( $io_fh, \%opts );
182
181
Similar to parse_fh() but parses HTML (strict) streams.
184
183
An optional second argument can be used to pass some options to the HTML parser
185
as a HASH reference. Possible options are: encoding and URI for libxml2 <
186
2.6.27, and for later versions of libxml2 additionally: recover,
187
suppress_errors, suppress_warnings, pedantic_parser, no_blanks, and no_network.
188
Note: encoding option may not work correctly with this function in libxml2 <
189
2.6.27 if the HTML file declares charset using a META tag.
192
=item B<parse_html_string>
184
as a HASH reference. Possible options are: encoding and URI, and with libxml2 >
185
2.6.27 additionally: recover, suppress_errors, suppress_warnings,
186
pedantic_parser, no_blanks, and no_network. Note: encoding option may not work
187
correctly with this function in libxml2 < 2.6.27 if the HTML file declares
188
charset using a META tag.
191
=item parse_html_string
194
193
$doc = $parser->parse_html_string( $htmlstring, \%opts );
196
195
Similar to parse_string() but parses HTML (strict) strings.
198
197
An optional second argument can be used to pass some options to the HTML parser
199
as a HASH reference. Possible options are: encoding and URI for libxml2 <
200
2.6.27, and for later versions of libxml2 additionally: recover,
201
suppress_errors, suppress_warnings, pedantic_parser, no_blanks, and no_network.
198
as a HASH reference. Possible options are: encoding and URI, and with libxml2 >
199
2.6.27 additionally: recover, suppress_errors, suppress_warnings,
200
pedantic_parser, no_blanks, and no_network.
222
=item B<parse_balanced_chunk>
221
=item parse_balanced_chunk
224
223
$fragment = $parser->parse_balanced_chunk( $wbxmlstring );
226
This function parses a well balanced XML string into a L<<<<<< XML::LibXML's DOM L2 Document Fragment Implementation|XML::LibXML's DOM L2 Document Fragment Implementation >>>>>>.
229
=item B<parse_xml_chunk>
225
This function parses a well balanced XML string into a L<<<<<< XML::LibXML::DocumentFragment >>>>>>.
228
=item parse_xml_chunk
231
230
$fragment = $parser->parse_xml_chunk( $wbxmlstring );
408
407
entire document into a DOM and serialises it. Some people couldn't read that in
409
408
the paragraph above so I've added this warning.
411
If you want a streaming SAX parser look at the L<<<<<< XML::LibXML direct SAX parser|XML::LibXML direct SAX parser >>>>>> man page
410
If you want a streaming SAX parser look at the L<<<<<< XML::LibXML::SAX >>>>>> man page
414
413
=head1 SERIALIZATION
416
415
XML::LibXML provides some functions to serialize nodes and documents. The
417
serialization functions are described on the L<<<<<< Abstract Base Class of XML::LibXML Nodes|Abstract Base Class of XML::LibXML Nodes >>>>>> manpage or the L<<<<<< XML::LibXML DOM Document Class|XML::LibXML DOM Document Class >>>>>> manpage. XML::LibXML checks three global flags that alter the serialization
416
serialization functions are described on the L<<<<<< XML::LibXML::Node >>>>>> manpage or the L<<<<<< XML::LibXML::Document >>>>>> manpage. XML::LibXML checks three global flags that alter the serialization
550
549
XML::LibXML::Node::line_number()).
552
551
IMPORTANT: Due to limitations in the libxml2 library line numbers greater than
553
65535 will be returned as 65535. Please see L<<<<<< http://bugzilla.gnome.org/show_bug.cgi?id=325533|http://bugzilla.gnome.org/show_bug.cgi?id=325533 >>>>>> for more details.
552
65535 will be returned as 65535. Please see L<<<<<< http://bugzilla.gnome.org/show_bug.cgi?id=325533 >>>>>> for more details.
555
554
By default line numbering is switched off (0).
558
=item B<load_ext_dtd>
560
559
$parser->load_ext_dtd(1);
651
650
=head1 ERROR REPORTING
653
652
XML::LibXML throws exceptions during parsing, validation or XPath processing
654
(and some other occasions). These errors can be caught by using I<<<<<< eval >>>>>> blocks. The error then will be stored in I<<<<<< $@ >>>>>>.
653
(and some other occasions). These errors can be caught by using I<<<<<< eval >>>>>> blocks. The error then will be stored in I<<<<<< $@ >>>>>>. There are two implementations: the old one throws $@ which is a flat string,
654
in the new one $@ is an object from the class XML::LibXML::Error; this class
655
overrides the operator "" so that when printed, the object flattens to the
656
658
XML::LibXML throws errors as they occurs and does not wait if a user test for
657
659
them. This is a very common misunderstanding in the use of XML::LibXML. If the