5
Pod::Simple::XHTML -- format Pod as validating XHTML
9
use Pod::Simple::XHTML;
11
my $parser = Pod::Simple::XHTML->new();
15
$parser->parse_file('path/to/file.pod');
19
This class is a formatter that takes Pod and renders it as XHTML
22
This is a subclass of L<Pod::Simple::Methody> and inherits all its
23
methods. The implementation is entirely different than
24
L<Pod::Simple::HTML>, but it largely preserves the same interface.
28
package Pod::Simple::XHTML;
30
use vars qw( $VERSION @ISA $HAS_HTML_ENTITIES );
33
use Pod::Simple::Methody ();
34
@ISA = ('Pod::Simple::Methody');
37
$HAS_HTML_ENTITIES = eval "require HTML::Entities; 1";
49
return HTML::Entities::encode_entities( $_[0] ) if $HAS_HTML_ENTITIES;
51
my $ents = join '', keys %entities;
52
$str =~ s/([$ents])/'&' . $entities{$1} . ';'/ge;
56
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
60
Pod::Simple::XHTML offers a number of methods that modify the format of
61
the HTML output. Call these after creating the parser object, but before
62
the call to C<parse_file>:
64
my $parser = Pod::PseudoPod::HTML->new();
65
$parser->set_optional_param("value");
66
$parser->parse_file($file);
68
=head2 perldoc_url_prefix
70
In turning L<Foo::Bar> into http://whatever/Foo%3a%3aBar, what
71
to put before the "Foo%3a%3aBar". The default value is
72
"http://search.cpan.org/perldoc?".
74
=head2 perldoc_url_postfix
76
What to put after "Foo%3a%3aBar" in the URL. This option is not set by
81
In turning C<< L<crontab(5)> >> into http://whatever/man/1/crontab, what
82
to put before the "1/crontab". The default value is
83
"http://man.he.net/man".
85
=head2 man_url_postfix
87
What to put after "1/crontab" in the URL. This option is not set by default.
89
=head2 title_prefix, title_postfix
91
What to put before and after the title in the head. The values should
96
$parser->html_css('path/to/style.css');
98
The URL or relative path of a CSS file to include. This option is not
101
=head2 html_javascript
103
The URL or relative path of a JavaScript file to pull in. This option is
108
A document type tag for the file. This option is not set by default.
110
=head2 html_header_tags
112
Additional arbitrary HTML tags for the header of the document. The
113
default value is just a content type header tag:
115
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
117
Add additional meta tags here, or blocks of inline CSS or JavaScript
118
(wrapped in the appropriate tags).
122
This is the level of HTML "Hn" element to which a Pod "head1" corresponds. For
123
example, if C<html_h_level> is set to 2, a head1 will produce an H2, a head2
124
will produce an H3, and so on.
128
Set a default title for the page if no title can be determined from the
129
content. The value of this string should already be &-escaped.
133
Force a title for the page (don't try to determine it from the content).
134
The value of this string should already be &-escaped.
136
=head2 html_header, html_footer
138
Set the HTML output at the beginning and end of each file. The default
139
header includes a title, a doctype tag (if C<html_doctype> is set), a
140
content tag (customized by C<html_header_tags>), a tag for a CSS file
141
(if C<html_css> is set), and a tag for a Javascript file (if
142
C<html_javascript> is set). The default footer simply closes the C<html>
145
The options listed above customize parts of the default header, but
146
setting C<html_header> or C<html_footer> completely overrides the
147
built-in header or footer. These may be useful if you want to use
148
template tags instead of literal HTML headers and footers or are
149
integrating converted POD pages in a larger website.
151
If you want no headers or footers output in the HTML, set these options
156
Whether to add a table-of-contents at the top of each page (called an
157
index for the sake of tradition).
162
__PACKAGE__->_accessorize(
163
'perldoc_url_prefix',
164
'perldoc_url_postfix',
167
'title_prefix', 'title_postfix',
173
'title', # Used internally for the title extracted from the content
179
'batch_mode', # whether we're in batch mode
180
'batch_mode_current_level',
181
# When in batch mode, how deep the current module is: 1 for "LWP",
182
# 2 for "LWP::Procotol", 3 for "LWP::Protocol::GHTTP", etc
185
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
189
If the standard options aren't enough, you may want to subclass
190
Pod::Simple::XHMTL. These are the most likely candidates for methods
191
you'll want to override when subclassing.
197
my $new = $self->SUPER::new(@_);
198
$new->{'output_fh'} ||= *STDOUT{IO};
199
$new->perldoc_url_prefix('http://search.cpan.org/perldoc?');
200
$new->man_url_prefix('http://man.he.net/man');
201
$new->html_header_tags('<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />');
202
$new->nix_X_codes(1);
203
$new->codes_in_verbatim(1);
204
$new->{'scratch'} = '';
205
$new->{'to_index'} = [];
206
$new->{'output'} = [];
207
$new->{'saved'} = [];
210
$new->{'__region_targets'} = [];
211
$new->{'__literal_targets'} = {};
212
$new->accept_targets_as_html( 'html', 'HTML' );
217
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
221
This method handles the body of text within any element: it's the body
222
of a paragraph, or everything between a "=begin" tag and the
223
corresponding "=end" tag, or the text within an L entity, etc. You would
224
want to override this if you are adding a custom element type that does
225
more than just display formatted text. Perhaps adding a way to generate
226
HTML tables from an extended version of POD.
228
So, let's say you want add a custom element called 'foo'. In your
229
subclass's C<new> method, after calling C<SUPER::new> you'd call:
231
$new->accept_targets_as_text( 'foo' );
233
Then override the C<start_for> method in the subclass to check for when
234
"$flags->{'target'}" is equal to 'foo' and set a flag that marks that
235
you're in a foo block (maybe "$self->{'in_foo'} = 1"). Then override the
236
C<handle_text> method to check for the flag, and pass $text to your
237
custom subroutine to construct the HTML output for 'foo' elements,
241
my ($self, $text) = @_;
242
if ($self->{'in_foo'}) {
243
$self->{'scratch'} .= build_foo_html($text);
245
$self->{'scratch'} .= $text;
249
=head2 accept_targets_as_html
251
This method behaves like C<accept_targets_as_text>, but also marks the region
252
as one whose content should be emitted literally, without HTML entity escaping
253
or wrapping in a C<div> element.
257
sub __in_literal_xhtml_region {
258
return unless @{ $_[0]{__region_targets} };
259
my $target = $_[0]{__region_targets}[-1];
260
return $_[0]{__literal_targets}{ $target };
263
sub accept_targets_as_html {
264
my ($self, @targets) = @_;
265
$self->accept_targets(@targets);
266
$self->{__literal_targets}{$_} = 1 for @targets;
270
# escape special characters in HTML (<, >, &, etc)
271
$_[0]{'scratch'} .= $_[0]->__in_literal_xhtml_region
273
: encode_entities( $_[1] );
276
sub start_Para { $_[0]{'scratch'} = '<p>' }
277
sub start_Verbatim { $_[0]{'scratch'} = '<pre><code>' }
279
sub start_head1 { $_[0]{'in_head'} = 1 }
280
sub start_head2 { $_[0]{'in_head'} = 2 }
281
sub start_head3 { $_[0]{'in_head'} = 3 }
282
sub start_head4 { $_[0]{'in_head'} = 4 }
284
sub start_item_number {
285
$_[0]{'scratch'} = "</li>\n" if $_[0]{'in_li'};
286
$_[0]{'scratch'} .= '<li><p>';
290
sub start_item_bullet {
291
$_[0]{'scratch'} = "</li>\n" if $_[0]{'in_li'};
292
$_[0]{'scratch'} .= '<li><p>';
296
sub start_item_text {
297
if ($_[0]{'in_dd'}[ $_[0]{'dl_level'} ]) {
298
$_[0]{'scratch'} = "</dd>\n";
299
$_[0]{'in_dd'}[ $_[0]{'dl_level'} ] = 0;
301
$_[0]{'scratch'} .= '<dt>';
304
sub start_over_bullet { $_[0]{'scratch'} = '<ul>'; $_[0]->emit }
305
sub start_over_block { $_[0]{'scratch'} = '<ul>'; $_[0]->emit }
306
sub start_over_number { $_[0]{'scratch'} = '<ol>'; $_[0]->emit }
307
sub start_over_text {
308
$_[0]{'scratch'} = '<dl>';
310
$_[0]{'in_dd'} ||= [];
314
sub end_over_block { $_[0]{'scratch'} .= '</ul>'; $_[0]->emit }
316
sub end_over_number {
317
$_[0]{'scratch'} = "</li>\n" if delete $_[0]{'in_li'};
318
$_[0]{'scratch'} .= '</ol>';
322
sub end_over_bullet {
323
$_[0]{'scratch'} = "</li>\n" if delete $_[0]{'in_li'};
324
$_[0]{'scratch'} .= '</ul>';
329
if ($_[0]{'in_dd'}[ $_[0]{'dl_level'} ]) {
330
$_[0]{'scratch'} = "</dd>\n";
331
$_[0]{'in_dd'}[ $_[0]{'dl_level'} ] = 0;
333
$_[0]{'scratch'} .= '</dl>';
338
# . . . . . Now the actual formatters:
340
sub end_Para { $_[0]{'scratch'} .= '</p>'; $_[0]->emit }
342
$_[0]{'scratch'} .= '</code></pre>';
347
my $h = delete $_[0]{in_head};
349
my $add = $_[0]->html_h_level;
350
$add = 1 unless defined $add;
353
my $id = $_[0]->idify($_[0]{scratch});
354
my $text = $_[0]{scratch};
355
$_[0]{'scratch'} = qq{<h$h id="$id">$text</h$h>};
357
push @{ $_[0]{'to_index'} }, [$h, $id, $text];
360
sub end_head1 { shift->_end_head(@_); }
361
sub end_head2 { shift->_end_head(@_); }
362
sub end_head3 { shift->_end_head(@_); }
363
sub end_head4 { shift->_end_head(@_); }
365
sub end_item_bullet { $_[0]{'scratch'} .= '</p>'; $_[0]->emit }
366
sub end_item_number { $_[0]{'scratch'} .= '</p>'; $_[0]->emit }
369
$_[0]{'scratch'} .= "</dt>\n<dd>";
370
$_[0]{'in_dd'}[ $_[0]{'dl_level'} ] = 1;
374
# This handles =begin and =for blocks of all kinds.
376
my ($self, $flags) = @_;
378
push @{ $self->{__region_targets} }, $flags->{target_matching};
380
unless ($self->__in_literal_xhtml_region) {
381
$self->{scratch} .= '<div';
382
$self->{scratch} .= qq( class="$flags->{target}") if $flags->{target};
383
$self->{scratch} .= '>';
392
$self->{'scratch'} .= '</div>' unless $self->__in_literal_xhtml_region;
394
pop @{ $self->{__region_targets} };
400
if (defined $self->html_header) {
401
$self->{'scratch'} .= $self->html_header;
402
$self->emit unless $self->html_header eq "";
404
my ($doctype, $title, $metatags);
405
$doctype = $self->html_doctype || '';
406
$title = $self->force_title || $self->title || $self->default_title || '';
407
$metatags = $self->html_header_tags || '';
408
if ($self->html_css) {
409
$metatags .= "\n<link rel='stylesheet' href='" .
410
$self->html_css . "' type='text/css'>";
412
if ($self->html_javascript) {
413
$metatags .= "\n<script type='text/javascript' src='" .
414
$self->html_javascript . "'></script>";
416
$self->{'scratch'} .= <<"HTML";
420
<title>$title</title>
431
my $to_index = $self->{'to_index'};
432
if ($self->index && @{ $to_index } ) {
437
my $id = ' id="index"';
439
for my $h (@{ $to_index }, [0]) {
440
my $target_level = $h->[0];
441
# Get to target_level by opening or closing ULs
442
if ($level == $target_level) {
444
} elsif ($level > $target_level) {
445
$out[-1] .= '</li>' if $out[-1] =~ /^\s+<li>/;
446
while ($level > $target_level) {
448
push @out, (' ' x --$indent) . '</li>' if @out && $out[-1] =~ m{^\s+<\/ul};
449
push @out, (' ' x --$indent) . '</ul>';
451
push @out, (' ' x --$indent) . '</li>' if $level;
453
while ($level < $target_level) {
455
push @out, (' ' x ++$indent) . '<li>' if @out && $out[-1]=~ /^\s*<ul/;
456
push @out, (' ' x ++$indent) . "<ul$id>";
463
$space = ' ' x $indent;
464
push @out, sprintf '%s<li><a href="#%s">%s</a>',
465
$space, $h->[1], $h->[2];
467
# Splice the index in between the HTML headers and the first element.
468
my $offset = defined $self->html_header ? $self->html_header eq '' ? 0 : 1 : 1;
469
splice @{ $self->{'output'} }, $offset, 0, join "\n", @out;
472
if (defined $self->html_footer) {
473
$self->{'scratch'} .= $self->html_footer;
474
$self->emit unless $self->html_footer eq "";
476
$self->{'scratch'} .= "</body>\n</html>";
481
print {$self->{'output_fh'}} join ("\n\n", @{ $self->{'output'} }), "\n\n";
482
@{$self->{'output'}} = ();
488
sub start_B { $_[0]{'scratch'} .= '<b>' }
489
sub end_B { $_[0]{'scratch'} .= '</b>' }
491
sub start_C { $_[0]{'scratch'} .= '<code>' }
492
sub end_C { $_[0]{'scratch'} .= '</code>' }
494
sub start_F { $_[0]{'scratch'} .= '<i>' }
495
sub end_F { $_[0]{'scratch'} .= '</i>' }
497
sub start_I { $_[0]{'scratch'} .= '<i>' }
498
sub end_I { $_[0]{'scratch'} .= '</i>' }
501
my ($self, $flags) = @_;
502
my ($type, $to, $section) = @{$flags}{'type', 'to', 'section'};
503
my $url = $type eq 'url' ? $to
504
: $type eq 'pod' ? $self->resolve_pod_page_link($to, $section)
505
: $type eq 'man' ? $self->resolve_man_page_link($to, $section)
508
# If it's an unknown type, use an attribute-less <a> like HTML.pm.
509
$self->{'scratch'} .= '<a' . ($url ? ' href="'. $url . '">' : '>');
512
sub end_L { $_[0]{'scratch'} .= '</a>' }
514
sub start_S { $_[0]{'scratch'} .= '<nobr>' }
515
sub end_S { $_[0]{'scratch'} .= '</nobr>' }
520
push @{ $self->{'output'} }, $self->{'scratch'};
522
print {$self->{'output_fh'}} $self->{'scratch'}, "\n\n";
524
$self->{'scratch'} = '';
528
=head2 resolve_pod_page_link
530
my $url = $pod->resolve_pod_page_link('Net::Ping', 'INSTALL');
531
my $url = $pod->resolve_pod_page_link('perlpodspec');
532
my $url = $pod->resolve_pod_page_link(undef, 'SYNOPSIS');
534
Resolves a POD link target (typically a module or POD file name) and section
535
name to a URL. The resulting link will be returned for the above examples as:
537
http://search.cpan.org/perldoc?Net::Ping#INSTALL
538
http://search.cpan.org/perldoc?perlpodspec
541
Note that when there is only a section argument the URL will simply be a link
542
to a section in the current document.
546
sub resolve_pod_page_link {
547
my ($self, $to, $section) = @_;
548
return undef unless defined $to || defined $section;
549
if (defined $section) {
550
$section = '#' . $self->idify($section, 1);
551
return $section unless defined $to;
556
return ($self->perldoc_url_prefix || '')
557
. encode_entities($to) . $section
558
. ($self->perldoc_url_postfix || '');
561
=head2 resolve_man_page_link
563
my $url = $pod->resolve_man_page_link('crontab(5)', 'EXAMPLE CRON FILE');
564
my $url = $pod->resolve_man_page_link('crontab');
566
Resolves a man page link target and numeric section to a URL. The resulting
567
link will be returned for the above examples as:
569
http://man.he.net/man5/crontab
570
http://man.he.net/man1/crontab
572
Note that the first argument is required. The section number will be parsed
573
from it, and if it's missing will default to 1. The second argument is
574
currently ignored, as L<man.he.net|http://man.he.net> does not currently
575
include linkable IDs or anchor names in its pages. Subclass to link to a
576
different man page HTTP server.
580
sub resolve_man_page_link {
581
my ($self, $to, $section) = @_;
582
return undef unless defined $to;
583
my ($page, $part) = $to =~ /^([^(]+)(?:[(](\d+)[)])?$/;
584
return undef unless $page;
585
return ($self->man_url_prefix || '')
586
. ($part || 1) . "/" . encode_entities($page)
587
. ($self->man_url_postfix || '');
593
my $id = $pod->idify($text);
594
my $hash = $pod->idify($text, 1);
596
This method turns an arbitrary string into a valid XHTML ID attribute value.
597
The rules enforced, following
598
L<http://webdesign.about.com/od/htmltags/a/aa031707.htm>, are:
604
The id must start with a letter (a-z or A-Z)
608
All subsequent characters can be letters, numbers (0-9), hyphens (-),
609
underscores (_), colons (:), and periods (.).
613
Each id must be unique within the document.
617
In addition, the returned value will be unique within the context of the
618
Pod::Simple::XHTML object unless a second argument is passed a true value. ID
619
attributes should always be unique within a single XHTML document, but pass
620
the true value if you are creating not an ID but a URL hash to point to
621
an ID (i.e., if you need to put the "#foo" in C<< <a href="#foo">foo</a> >>.
626
my ($self, $t, $not_unique) = @_;
628
s/<[^>]+>//g; # Strip HTML.
629
s/&[^;]+;//g; # Strip entities.
630
s/^([^a-zA-Z]+)$/pod$1/; # Prepend "pod" if no valid chars.
631
s/^[^a-zA-Z]+//; # First char must be a letter.
632
s/[^-a-zA-Z0-9_:.]+/-/g; # All other chars must be valid.
634
return $t if $not_unique;
636
$i++ while $self->{ids}{"$t$i"}++;
640
=head2 batch_mode_page_object_init
642
$pod->batch_mode_page_object_init($batchconvobj, $module, $infile, $outfile, $depth);
644
Called by L<Pod::Simple::HTMLBatch> so that the class has a chance to
645
initialize the converter. Internally it sets the C<batch_mode> property to
646
true and sets C<batch_mode_current_level()>, but Pod::Simple::XHTML does not
647
currently use those features. Subclasses might, though.
651
sub batch_mode_page_object_init {
652
my ($self, $batchconvobj, $module, $infile, $outfile, $depth) = @_;
653
$self->batch_mode(1);
654
$self->batch_mode_current_level($depth);
664
L<Pod::Simple>, L<Pod::Simple::Text>, L<Pod::Spell>
668
Questions or discussion about POD and Pod::Simple should be sent to the
669
pod-people@perl.org mail list. Send an empty email to
670
pod-people-subscribe@perl.org to subscribe.
672
This module is managed in an open GitHub repository,
673
L<http://github.com/theory/pod-simple/>. Feel free to fork and contribute, or
674
to clone L<git://github.com/theory/pod-simple.git> and send patches!
676
Patches against Pod::Simple are welcome. Please send bug reports to
677
<bug-pod-simple@rt.cpan.org>.
679
=head1 COPYRIGHT AND DISCLAIMERS
681
Copyright (c) 2003-2005 Allison Randal.
683
This library is free software; you can redistribute it and/or modify it
684
under the same terms as Perl itself.
686
This program is distributed in the hope that it will be useful, but
687
without any warranty; without even the implied warranty of
688
merchantability or fitness for a particular purpose.
690
=head1 ACKNOWLEDGEMENTS
692
Thanks to L<Hurricane Electrict|http://he.net/> for permission to use its
693
L<Linux man pages online|http://man.he.net/> site for man page links.
695
Thanks to L<search.cpan.org|http://search.cpan.org/> for permission to use the
696
site for Perl module links.
700
Pod::Simpele::XHTML was created by Allison Randal <allison@perl.org>.
702
Pod::Simple was created by Sean M. Burke <sburke@cpan.org>.
703
But don't bother him, he's retired.
705
Pod::Simple is maintained by:
709
=item * Allison Randal C<allison@perl.org>
711
=item * Hans Dieter Pearcey C<hdp@cpan.org>
713
=item * David E. Wheeler C<dwheeler@cpan.org>