~ubuntu-branches/ubuntu/trusty/libtext-markdown-perl/trusty

Committer: Bazaar Package Importer
Author(s): Jonathan Yu, Jonathan Yu, gregor herrmann, Nathan Handler
Date: 2009-08-26 11:14:08 UTC
mfrom: (1.1.5 upstream)
Revision ID: james.westby@ubuntu.com-20090826111408-p5lw469iekiybf4t

Tags: 1.0.25-1

[ Jonathan Yu ]
* New upstream release
+ MultiMarkdown is now removed from upstream package; it is now
available as a separate standalone package
* Updated copyright information
* Added myself to Uploaders and Copyright
* Updated rules to new short format
* Updated control file description
* Standards-Version 3.8.3 (remove version dependency on perl)
* Install Markdown.pl as markdown, as before, but use overrides
* Add header for pod_name patch
* Remove .manpages file, they are installed automatically
* Suggest libtext-multimarkdown-perl (since it's a separate package now)
* Add libhtml-tidy-perl to B-D-I for testing
* Add a NEWS item for the libtext-multimarkdown-perl split

[ gregor herrmann ]
* debian/control: Added: ${misc:Depends} to Depends: field.

[ Nathan Handler ]
* debian/watch: Update to ignore development releases.

files added:
debian/NEWS

files removed:
debian/libtext-markdown-perl.manpages

inc/Module/Install/Scripts.pm

lib/Text/MultiMarkdown.pm

script/MultiMarkdown.pl

t/01use.t

t/03podspelling.t

t/04markdown.t

t/05options.t

t/06wiki.t

t/07fulldoc.t

t/08exporter.t

t/09base_url.t

t/10use_metadata.t

t/11docment_format.t

t/12empty_element_suffix.t

t/13multimarkdown-nometadata-firstpara.t

t/14unicode.t

t/15inlinehtmldoenoturnoffmarkdown.t

t/16headingsinlists.t

t/17olulreportedonlist.t

t/18codespanextra.t

t/20fulldocs-text-multimarkdown.t

t/22fulldocs-multimarkdown-todo.t

t/23afulldocs-multimarkdown.t

t/24fulldocs-pythonmarkdown2-tm.t

t/26fulldocs-php-markdown.t

t/27fulldocs-php-markdown-fails.t

t/28fulldocs-php-markdown-extra.t

t/29fulldocs-maruku-unittest.t

t/30disable_multimarkdown_features.t

t/35commandlinemultimarkdown.t

t/36multimarkdown_featurebleed.t

t/38listshorizontalrule.t

t/39listsindentededgecase.t

t/MultiMarkdown.mdtest

t/MultiMarkdown.mdtest/Backslash_escapes.text

t/MultiMarkdown.mdtest/Backslash_escapes.xhtml

t/MultiMarkdown.mdtest/Footnotes.text

t/MultiMarkdown.mdtest/Footnotes.xhtml

t/MultiMarkdown.mdtest/Inline_HTML_(Advanced).text

t/MultiMarkdown.mdtest/Inline_HTML_(Advanced).xhtml

t/MultiMarkdown.mdtest/Inline_HTML_(Simple).text

t/MultiMarkdown.mdtest/Inline_HTML_(Simple).xhtml

t/MultiMarkdown.mdtest/Inline_HTML_comments.text

t/MultiMarkdown.mdtest/Inline_HTML_comments.xhtml

t/MultiMarkdown.mdtest/Links_inline_style.text

t/MultiMarkdown.mdtest/Links_inline_style.xhtml

t/MultiMarkdown.mdtest/Links_reference_style.text

t/MultiMarkdown.mdtest/Links_reference_style.xhtml

t/MultiMarkdown.mdtest/Literal_quotes_in_titles.text

t/MultiMarkdown.mdtest/Literal_quotes_in_titles.xhtml

t/MultiMarkdown.mdtest/Nested_blockquotes.text

t/MultiMarkdown.mdtest/Nested_blockquotes.xhtml

t/MultiMarkdown.mdtest/Ordered_and_unordered_lists.text

t/MultiMarkdown.mdtest/Ordered_and_unordered_lists.xhtml

t/Text-MultiMarkdown.mdtest

t/Text-MultiMarkdown.mdtest/Amps_and_angle_encoding.html

t/Text-MultiMarkdown.mdtest/Amps_and_angle_encoding.text

t/Text-MultiMarkdown.mdtest/Amps_and_angles_encoding-advanced.text

t/Text-MultiMarkdown.mdtest/Amps_and_angles_encoding-advanced.xhtml

t/Text-MultiMarkdown.mdtest/Auto_links.text

t/Text-MultiMarkdown.mdtest/Auto_links.xhtml

t/Text-MultiMarkdown.mdtest/Backslash_escapes.html

t/Text-MultiMarkdown.mdtest/Backslash_escapes.text

t/Text-MultiMarkdown.mdtest/Blockquotes_with_code_blocks.html

t/Text-MultiMarkdown.mdtest/Blockquotes_with_code_blocks.text

t/Text-MultiMarkdown.mdtest/Citations.text

t/Text-MultiMarkdown.mdtest/Citations.xhtml

t/Text-MultiMarkdown.mdtest/Cross-References.text

t/Text-MultiMarkdown.mdtest/Cross-References.xhtml

t/Text-MultiMarkdown.mdtest/Emphasis.text

t/Text-MultiMarkdown.mdtest/Emphasis.xhtml

t/Text-MultiMarkdown.mdtest/Footnotes.text

t/Text-MultiMarkdown.mdtest/Footnotes.xhtml

t/Text-MultiMarkdown.mdtest/HTML-Comment-encoding.text

t/Text-MultiMarkdown.mdtest/HTML-Comment-encoding.xhtml

t/Text-MultiMarkdown.mdtest/Hard-wrapped_paragraphs_with_list-like_lines.text

t/Text-MultiMarkdown.mdtest/Hard-wrapped_paragraphs_with_list-like_lines.xhtml

t/Text-MultiMarkdown.mdtest/Horizontal_rules.text

t/Text-MultiMarkdown.mdtest/Horizontal_rules.xhtml

t/Text-MultiMarkdown.mdtest/Images.text

t/Text-MultiMarkdown.mdtest/Images.xhtml

t/Text-MultiMarkdown.mdtest/Inline_HTML_(Advanced).text

t/Text-MultiMarkdown.mdtest/Inline_HTML_(Advanced).xhtml

t/Text-MultiMarkdown.mdtest/Inline_HTML_(Simple).text

t/Text-MultiMarkdown.mdtest/Inline_HTML_(Simple).xhtml

t/Text-MultiMarkdown.mdtest/Inline_HTML_comments.text

t/Text-MultiMarkdown.mdtest/Inline_HTML_comments.xhtml

t/Text-MultiMarkdown.mdtest/Link_Attributes.text

t/Text-MultiMarkdown.mdtest/Link_Attributes.xhtml

t/Text-MultiMarkdown.mdtest/Links_brackets.text

t/Text-MultiMarkdown.mdtest/Links_brackets.xhtml

t/Text-MultiMarkdown.mdtest/Links_inline_style.text

t/Text-MultiMarkdown.mdtest/Links_inline_style.xhtml

t/Text-MultiMarkdown.mdtest/Links_multiline_bugs_1.html

t/Text-MultiMarkdown.mdtest/Links_multiline_bugs_1.text

t/Text-MultiMarkdown.mdtest/Links_multiline_bugs_2.html

t/Text-MultiMarkdown.mdtest/Links_multiline_bugs_2.text

t/Text-MultiMarkdown.mdtest/Links_reference_style.text

t/Text-MultiMarkdown.mdtest/Links_reference_style.xhtml

t/Text-MultiMarkdown.mdtest/Lists-multilevel-md5-edgecase.text

t/Text-MultiMarkdown.mdtest/Lists-multilevel-md5-edgecase.xhtml

t/Text-MultiMarkdown.mdtest/Lists.text

t/Text-MultiMarkdown.mdtest/Lists.xhtml

t/Text-MultiMarkdown.mdtest/Literal_quotes_in_titles.text

t/Text-MultiMarkdown.mdtest/Literal_quotes_in_titles.xhtml

t/Text-MultiMarkdown.mdtest/Markdown_Documentation_-_Basics.text

t/Text-MultiMarkdown.mdtest/Markdown_Documentation_-_Basics.xhtml

t/Text-MultiMarkdown.mdtest/Markdown_Documentation_-_Syntax.text

t/Text-MultiMarkdown.mdtest/Markdown_Documentation_-_Syntax.xhtml

t/Text-MultiMarkdown.mdtest/Metadata_(Complete).text

t/Text-MultiMarkdown.mdtest/Metadata_(Complete).xhtml

t/Text-MultiMarkdown.mdtest/Metadata_(Snippet).html

t/Text-MultiMarkdown.mdtest/Metadata_(Snippet).text

t/Text-MultiMarkdown.mdtest/Nested_blockquotes.text

t/Text-MultiMarkdown.mdtest/Nested_blockquotes.xhtml

t/Text-MultiMarkdown.mdtest/Ordered_and_unordered_lists.text

t/Text-MultiMarkdown.mdtest/Ordered_and_unordered_lists.xhtml

t/Text-MultiMarkdown.mdtest/PHP-ASP_tags.text

t/Text-MultiMarkdown.mdtest/PHP-ASP_tags.xhtml

t/Text-MultiMarkdown.mdtest/Special_Characters.text

t/Text-MultiMarkdown.mdtest/Special_Characters.xhtml

t/Text-MultiMarkdown.mdtest/Strong_and_em_together.text

t/Text-MultiMarkdown.mdtest/Strong_and_em_together.xhtml

t/Text-MultiMarkdown.mdtest/Tables.text

t/Text-MultiMarkdown.mdtest/Tables.xhtml

t/Text-MultiMarkdown.mdtest/Tabs.text

t/Text-MultiMarkdown.mdtest/Tabs.xhtml

t/Text-MultiMarkdown.mdtest/Warnings-rt34856.text

t/Text-MultiMarkdown.mdtest/Warnings-rt34856.xhtml

t/Text-MultiMarkdown.mdtest/Wiki_Features.text

t/Text-MultiMarkdown.mdtest/Wiki_Features.xhtml

t/Text-MultiMarkdown.mdtest/complex_escaping.text

t/Text-MultiMarkdown.mdtest/complex_escaping.xhtml

t/docs-multimarkdown-todo

t/docs-multimarkdown-todo/Amps_and_angle_encoding.html

t/docs-multimarkdown-todo/Amps_and_angle_encoding.text

t/docs-multimarkdown-todo/Email_Obfuscation.html

t/docs-multimarkdown-todo/Email_Obfuscation.text

t/docs-multimarkdown-todo/Tidyness.html

t/docs-multimarkdown-todo/Tidyness.text

files modified:
Changes

MANIFEST

META.yml

Makefile.PL

debian/changelog

debian/control

debian/copyright

debian/patches/pod_name

debian/rules

debian/watch

inc/Module/Install.pm

inc/Module/Install/Base.pm

inc/Module/Install/Can.pm

inc/Module/Install/Fetch.pm

inc/Module/Install/Makefile.pm

inc/Module/Install/Metadata.pm

inc/Module/Install/Win32.pm

inc/Module/Install/WriteAll.pm

lib/Text/Markdown.pm

t/21fulldocs-text-markdown.t

t/23fulldocs-markdown.t

t/25fulldocs-phpmarkdown.t

Show diffs side-by-side

added added

removed removed

lib/Text/Markdown.pm

use Carp qw(croak);

use base 'Exporter';

our $VERSION = '1.0.24';

our $VERSION = '1.0.25';

our @EXPORT_OK = qw(markdown);

=head1 NAME

=head1 OPTIONS

Text::Markdown supports a number of options to it's processor which control the behaviour of the output document.

Text::Markdown supports a number of options to its processor which control the behaviour of the output document.

These options can be supplied to the constructor, on in a hash with the individual calls to the markdown method.

See the synopsis for examples of both of the above styles.

These options can be supplied to the constructor, or in a hash within individual calls to the markdown() method.

See the synopsis for examples of both styles.

The options for the processor are:

=item empty_element_suffix

This option can be used to generate normal HTML output. By default, it is ' />', which is xHTML, change to '>' for normal HTML.

This option controls the end of empty element tags:

'/>' for XHTML (default)

'>' for HTML

=item tab_width

Controls indent width in the generated markup, defaults to 4

Controls indent width in the generated markup. Defaults to 4.

=item markdown_in_html_blocks

Controls if Markdown is processed when inside HTML blocks. Defaults to 0.

Controls if Markdown is processed when inside HTML blocks. Defaults to 0 in

order to not inadvertently parse as Markdown chunks of HTML that the user may

have pasted in their document (e.g. web counters in a wiki page).

=item trust_list_start_value

102

107

103

108

# Regex to match balanced [brackets]. See Friedl's

104

109

# "Mastering Regular Expressions", 2nd Ed., pp. 328-331.

105

our ($g_nested_brackets, $g_nested_parens);

110

our ($g_nested_brackets, $g_nested_parens);

106

111

$g_nested_brackets = qr{

107

112

(?> # Atomic matching

108

113

[^\[\]]+ # Anything other than brackets

109

114

110

115

111

116

(??{ $g_nested_brackets }) # Recursive set of nested brackets

112

117

114

119

}x;

115

120

# Doesn't allow for whitespace, because we're using it to match URLs:

116

121

$g_nested_parens = qr{

117

(?> # Atomic matching

118

[^()\s]+ # Anything other than parens or whitespace

119

120

121

(??{ $g_nested_parens }) # Recursive set of nested brackets

122

123

122

(?> # Atomic matching

123

[^()\s]+ # Anything other than parens or whitespace

124

125

126

(??{ $g_nested_parens }) # Recursive set of nested brackets

127

128

124

129

}x;

125

130

126

131

# Table of hash values for escaped characters:

139

144

140

145

sub new {

141

146

my ($class, %p) = @_;

142

143

$p{base_url} ||= ''; # This is the base url to be used for WikiLinks

144

147

148

$p{base_url} ||= ''; # This is the base URL to be used for WikiLinks

149

145

150

$p{tab_width} = 4 unless (defined $p{tab_width} and $p{tab_width} =~ m/^\d+$/);

146

151

147

152

$p{empty_element_suffix} ||= ' />'; # Change to ">" for HTML output

148

153

149

154

# Is markdown processed in HTML blocks? See t/15inlinehtmldonotturnoffmarkdown.t

150

155

$p{markdown_in_html_blocks} = $p{markdown_in_html_blocks} ? 1 : 0;

151

156

152

157

$p{trust_list_start_value} = $p{trust_list_start_value} ? 1 : 0;

153

158

154

159

my $self = { params => \%p };

155

160

bless $self, ref($class) || $class;

156

161

return $self;

166

171

sub markdown {

167

172

my ( $self, $text, $options ) = @_;

168

173

169

# Detect functional mode, and create an instance for this run..

174

# Detect functional mode, and create an instance for this run

170

175

unless (ref $self) {

171

176

if ( $self ne __PACKAGE__ ) {

172

177

my $ob = __PACKAGE__->new();

183

188

%$self = (%{ $self->{params} }, %$options, params => $self->{params});

184

189

185

190

$self->_CleanUpRunData($options);

186

191

187

192

return $self->_Markdown($text);

188

193

}

189

194

192

197

# Clear the global hashes. If we don't clear these, you get conflicts

193

198

# from other articles when generating a page which contains more than

194

199

# one article (e.g. an index page that shows the N most recent

195

# articles):

200

# articles).

196

201

$self->{_urls} = $options->{urls} ? $options->{urls} : {}; # FIXME - document passing this option (tested in 05options.t).

197

202

$self->{_titles} = {};

198

203

$self->{_html_blocks} = {};

212

217

my ($self, $text, $options) = @_;

213

218

214

219

$text = $self->_CleanUpDoc($text);

215

216

# Turn block-level HTML blocks into hash entries

220

221

# Turn block-level HTML elements into hash entries if we are not supposed to parse the Markdown in them

217

222

$text = $self->_HashHTMLBlocks($text) unless $self->{markdown_in_html_blocks};

218

223

219

224

$text = $self->_StripLinkDefinitions($text);

220

225

221

226

$text = $self->_RunBlockGamut($text);

222

227

223

228

$text = $self->_UnescapeSpecialChars($text);

224

229

225

230

$text = $self->_ConvertCopyright($text);

226

231

227

232

return $text . "\n";

238

243

239

244

sub urls {

240

245

my ( $self ) = @_;

241

246

242

247

return $self->{_urls};

243

248

}

244

249

245

250

sub _CleanUpDoc {

246

251

my ($self, $text) = @_;

247

252

248

253

# Standardize line endings:

249

254

$text =~ s{\r\n}{\n}g; # DOS to Unix

250

255

$text =~ s{\r}{\n}g; # Mac to Unix

260

265

# match consecutive blank lines with /\n+/ instead of something

261

266

# contorted like /[ \t]*\n+/ .

262

267

$text =~ s/^[ \t]+$//mg;

263

268

264

269

return $text;

265

270

}

266

271

296

301

$self->{_titles}{lc $1} = $3;

297

302

$self->{_titles}{lc $1} =~ s/"/"/g;

298

303

}

299

304

300

305

}

301

306

302

307

return $text;

303

308

}

304

309

305

310

sub _md5_utf8 {

306

# Internal function used to safely MD5sum chunks of the input, which might be Unicode in Perl's internal representation.

307

my $input = shift;

308

return unless defined $input;

309

if (Encode::is_utf8 $input) {

310

return md5_hex(Encode::encode('utf8', $input));

311

}

311

# Internal function used to safely MD5sum chunks of the input, which might be Unicode in Perl's internal representation.

312

my $input = shift;

313

return unless defined $input;

314

if (Encode::is_utf8 $input) {

315

return md5_hex(Encode::encode('utf8', $input));

316

}

312

317

else {

313

318

return md5_hex($input);

314

319

}

318

323

my ($self, $text) = @_;

319

324

my $less_than_tab = $self->{tab_width} - 1;

320

325

321

# Hashify HTML blocks:

322

# We only want to do this for block-level HTML tags, such as headers,

323

# lists, and tables. That's because we still want to wrap <p>s around

324

# "paragraphs" that are wrapped in non-block-level tags, such as anchors,

325

# phrase emphasis, and spans. The list of tags we're looking for is

326

# hard-coded:

327

my $block_tags = qr{

328

(?:

329

p | div | h[1-6] | blockquote | pre | table |

330

dl | ol | ul | script | noscript | form |

331

fieldset | iframe | math | ins | del

332

)

333

}x;

334

335

my $tag_attrs = qr{

336

(?: # Match one attr name/value pair

337

\s+ # There needs to be at least some whitespace

338

# before each attribute name.

339

[\w.:_-]+ # Attribute name

340

\s*=\s*

341

(?:

342

".+?" # "Attribute value"

343

344

'.+?' # 'Attribute value'

345

)

346

)* # Zero or more

347

}x;

348

349

my $empty_tag = qr{< \w+ $tag_attrs \s* />}oxms;

350

my $open_tag = qr{< $block_tags $tag_attrs \s* >}oxms;

351

my $close_tag = undef; # let Text::Balanced handle this

352

353

use Text::Balanced qw(gen_extract_tagged);

354

my $extract_block = gen_extract_tagged($open_tag, $close_tag, undef, { ignore => [$empty_tag] });

355

356

my @chunks;

357

while ($text =~ s{^(([ ]{0,$less_than_tab}<)?.*\n)}{}m) {

358

my $cur_line = $1;

359

if (defined $2) {

360

# current line could be start of code block

361

362

my ($tag, $remainder) = $extract_block->($cur_line . $text);

363

if ($tag) {

364

my $key = _md5_utf8($tag);

365

$self->{_html_blocks}{$key} = $tag;

366

push @chunks, "\n\n" . $key . "\n\n";

367

$text = $remainder;

368

}

369

else {

370

# No tag match, so toss $cur_line into @chunks

371

push @chunks, $cur_line;

372

}

373

}

374

else {

375

# current line could NOT be start of code block

376

push @chunks, $cur_line;

377

}

378

379

}

380

push @chunks, $text; # Whatever is left.

381

382

$text = join '', @chunks;

383

384

# Special case just for <hr />. It was easier to make a special case than

385

# to make the other regex more complicated.

386

$text = $self->_HashHR($text);

387

326

# Hashify HTML blocks:

327

# We only want to do this for block-level HTML tags, such as headers,

328

# lists, and tables. That's because we still want to wrap <p>s around

329

# "paragraphs" that are wrapped in non-block-level tags, such as anchors,

330

# phrase emphasis, and spans. The list of tags we're looking for is

331

# hard-coded:

332

my $block_tags = qr{

333

(?:

334

p | div | h[1-6] | blockquote | pre | table |

335

dl | ol | ul | script | noscript | form |

336

fieldset | iframe | math | ins | del

337

)

338

}x;

339

340

my $tag_attrs = qr{

341

(?: # Match one attr name/value pair

342

\s+ # There needs to be at least some whitespace

343

# before each attribute name.

344

[\w.:_-]+ # Attribute name

345

\s*=\s*

346

(?:

347

".+?" # "Attribute value"

348

349

'.+?' # 'Attribute value'

350

)

351

)* # Zero or more

352

}x;

353

354

my $empty_tag = qr{< \w+ $tag_attrs \s* />}oxms;

355

my $open_tag = qr{< $block_tags $tag_attrs \s* >}oxms;

356

my $close_tag = undef; # let Text::Balanced handle this

357

358

use Text::Balanced qw(gen_extract_tagged);

359

my $extract_block = gen_extract_tagged($open_tag, $close_tag, undef, { ignore => [$empty_tag] });

360

361

my @chunks;

362

# parse each line...

363

while ($text =~ s{^(([ ]{0,$less_than_tab}<)?.*\n)}{}m) {

364

my $cur_line = $1;

365

if (defined $2) {

366

# current line could be start of code block

367

368

my ($tag, $remainder) = $extract_block->($cur_line . $text);

369

if ($tag) {

370

my $key = _md5_utf8($tag);

371

$self->{_html_blocks}{$key} = $tag;

372

push @chunks, "\n\n" . $key . "\n\n";

373

$text = $remainder;

374

}

375

else {

376

# No tag match, so toss $cur_line into @chunks

377

push @chunks, $cur_line;

378

}

379

}

380

else {

381

# current line could NOT be start of code block

382

push @chunks, $cur_line;

383

}

384

385

}

386

push @chunks, $text; # Whatever is left.

387

388

$text = join '', @chunks;

389

390

# Special case just for <hr />. It was easier to make a special case than

391

# to make the other regex more complicated.

392

$text = $self->_HashHR($text);

393

388

394

$text = $self->_HashHTMLComments($text);

389

395

390

396

$text = $self->_HashPHPASPBlocks($text);

391

397

392

return $text;

398

return $text;

393

399

}

394

400

395

401

sub _HashHR {

396

402

my ($self, $text) = @_;

397

403

my $less_than_tab = $self->{tab_width} - 1;

398

399

$text =~ s{

400

(?:

401

(?<=\n\n) # Starting after a blank line

402

| # or

403

\A\n? # the beginning of the doc

404

)

405

( # save in $1

406

[ ]{0,$less_than_tab}

407

<(hr) # start tag = $2

408

\b # word break

409

([^<>])*? #

410

/?> # the matching end tag

411

[ \t]*

412

(?=\n{2,}|\Z) # followed by a blank line or end of document

413

)

414

}{

415

my $key = _md5_utf8($1);

416

$self->{_html_blocks}{$key} = $1;

417

"\n\n" . $key . "\n\n";

418

}egx;

419

420

return $text;

404

405

$text =~ s{

406

(?:

407

(?<=\n\n) # Starting after a blank line

408

| # or

409

\A\n? # the beginning of the doc

410

)

411

( # save in $1

412

[ ]{0,$less_than_tab}

413

<(hr) # start tag = $2

414

\b # word break

415

([^<>])*? #

416

/?> # the matching end tag

417

[ \t]*

418

(?=\n{2,}|\Z) # followed by a blank line or end of document

419

)

420

}{

421

my $key = _md5_utf8($1);

422

$self->{_html_blocks}{$key} = $1;

423

"\n\n" . $key . "\n\n";

424

}egx;

425

426

return $text;

421

427

}

422

428

423

429

sub _HashHTMLComments {

424

430

my ($self, $text) = @_;

425

431

my $less_than_tab = $self->{tab_width} - 1;

426

432

427

433

# Special case for standalone HTML comments:

428

$text =~ s{

429

(?:

430

(?<=\n\n) # Starting after a blank line

431

| # or

432

\A\n? # the beginning of the doc

433

)

434

( # save in $1

435

[ ]{0,$less_than_tab}

436

(?s:

437

438

(--.*?--\s*)+

439

440

)

441

[ \t]*

442

(?=\n{2,}|\Z) # followed by a blank line or end of document

443

)

444

}{

445

my $key = _md5_utf8($1);

446

$self->{_html_blocks}{$key} = $1;

447

"\n\n" . $key . "\n\n";

448

}egx;

449

450

return $text;

434

$text =~ s{

435

(?:

436

(?<=\n\n) # Starting after a blank line

437

| # or

438

\A\n? # the beginning of the doc

439

)

440

( # save in $1

441

[ ]{0,$less_than_tab}

442

(?s:

443

444

(--.*?--\s*)+

445

446

)

447

[ \t]*

448

(?=\n{2,}|\Z) # followed by a blank line or end of document

449

)

450

}{

451

my $key = _md5_utf8($1);

452

$self->{_html_blocks}{$key} = $1;

453

"\n\n" . $key . "\n\n";

454

}egx;

455

456

return $text;

451

457

}

452

458

453

459

sub _HashPHPASPBlocks {

454

460

my ($self, $text) = @_;

455

461

my $less_than_tab = $self->{tab_width} - 1;

456

462

457

463

# PHP and ASP-style processor instructions (<?…?> and <%…%>)

458

$text =~ s{

459

(?:

460

(?<=\n\n) # Starting after a blank line

461

| # or

462

\A\n? # the beginning of the doc

463

)

464

( # save in $1

465

[ ]{0,$less_than_tab}

466

(?s:

467

<([?%]) # $2

468

.*?

469

\2>

470

)

471

[ \t]*

472

(?=\n{2,}|\Z) # followed by a blank line or end of document

473

)

474

}{

475

my $key = _md5_utf8($1);

476

$self->{_html_blocks}{$key} = $1;

477

"\n\n" . $key . "\n\n";

478

}egx;

479

return $text;

464

$text =~ s{

465

(?:

466

(?<=\n\n) # Starting after a blank line

467

| # or

468

\A\n? # the beginning of the doc

469

)

470

( # save in $1

471

[ ]{0,$less_than_tab}

472

(?s:

473

<([?%]) # $2

474

.*?

475

\2>

476

)

477

[ \t]*

478

(?=\n{2,}|\Z) # followed by a blank line or end of document

479

)

480

}{

481

my $key = _md5_utf8($1);

482

$self->{_html_blocks}{$key} = $1;

483

"\n\n" . $key . "\n\n";

484

}egx;

485

return $text;

480

486

}

481

487

482

488

sub _RunBlockGamut {

488

494

489

495

# Do headers first, as these populate cross-refs

490

496

$text = $self->_DoHeaders($text);

491

492

# And now, protect our tables

493

$text = $self->_HashHTMLBlocks($text) unless $self->{markdown_in_html_blocks};

494

497

495

498

# Do Horizontal Rules:

496

499

my $less_than_tab = $self->{tab_width} - 1;

523

526

my ($self, $text) = @_;

524

527

525

528

$text = $self->_DoCodeSpans($text);

526

$text = $self->_EscapeSpecialCharsWithinTagAttributes($text);

529

$text = $self->_EscapeSpecialCharsWithinTagAttributes($text);

527

530

$text = $self->_EscapeSpecialChars($text);

528

531

529

532

# Process anchor and image tags. Images must come first,

583

586

# value; this is likely overkill, but it should prevent us from colliding

584

587

# with the escape values by accident.

585

588

586

my ($self, $text) = @_;

587

my $tokens ||= $self->_TokenizeHTML($text);

588

$text = ''; # rebuild $text from the tokens

589

my ($self, $text) = @_;

590

my $tokens ||= $self->_TokenizeHTML($text);

591

$text = ''; # rebuild $text from the tokens

589

592

590

foreach my $cur_token (@$tokens) {

591

if ($cur_token->[0] eq "tag") {

592

$cur_token->[1] =~ s! \\ !$g_escape_table{'\\'}!gox;

593

$cur_token->[1] =~ s{ (?<=.)</?code>(?=.) }{$g_escape_table{'`'}}gox;

594

$cur_token->[1] =~ s! \* !$g_escape_table{'*'}!gox;

595

$cur_token->[1] =~ s! _ !$g_escape_table{'_'}!gox;

596

}

597

$text .= $cur_token->[1];

598

}

599

return $text;

593

foreach my $cur_token (@$tokens) {

594

if ($cur_token->[0] eq "tag") {

595

$cur_token->[1] =~ s! \\ !$g_escape_table{'\\'}!gox;

596

$cur_token->[1] =~ s{ (?<=.)</?code>(?=.) }{$g_escape_table{'`'}}gox;

597

$cur_token->[1] =~ s! \* !$g_escape_table{'*'}!gox;

598

$cur_token->[1] =~ s! _ !$g_escape_table{'_'}!gox;

599

}

600

$text .= $cur_token->[1];

601

}

602

return $text;

600

603

}

601

604

602

605

sub _DoAnchors {

629

632

if ($link_id eq "") {

630

633

$link_id = lc $link_text; # for shortcut links like [this][].

631

634

}

632

635

633

636

$link_id =~ s{[ ]*\n}{ }g; # turn embedded newlines into spaces

634

637

635

638

$self->_GenerateAnchor($whole_match, $link_text, $link_id);

636

639

}xsge;

637

640

651

654

(['"]) # quote char = $5

652

655

(.*?) # Title = $6

653

656

\5 # matching quote

654

[ \t]* # ignore any spaces/tabs between closing quote and )

657

[ \t]* # ignore any spaces/tabs between closing quote and )

655

658

)? # title is optional

656

659

657

660

)

661

664

my $link_text = $2;

662

665

my $url = $3;

663

666

my $title = $6;

664

667

665

668

$self->_GenerateAnchor($whole_match, $link_text, undef, $url, $title);

666

669

}xsge;

667

668

669

# Last, handle reference-style shortcuts: [link text]

670

# These must come last in case you've also got [link test][1]

671

# or [link test](/foo)

672

673

$text =~ s{

674

( # wrap whole match in $1

675

676

([^\[\]]+) # link text = $2; can't contain '[' or ']'

677

678

)

679

}{

680

my $result;

681

my $whole_match = $1;

682

my $link_text = $2;

683

(my $link_id = lc $2) =~ s{[ ]*\n}{ }g; # lower-case and turn embedded newlines into spaces

670

671

672

# Last, handle reference-style shortcuts: [link text]

673

# These must come last in case you've also got [link test][1]

674

# or [link test](/foo)

675

676

$text =~ s{

677

( # wrap whole match in $1

678

679

([^\[\]]+) # link text = $2; can't contain '[' or ']'

680

681

)

682

}{

683

my $result;

684

my $whole_match = $1;

685

my $link_text = $2;

686

(my $link_id = lc $2) =~ s{[ ]*\n}{ }g; # lower-case and turn embedded newlines into spaces

684

687

685

688

$self->_GenerateAnchor($whole_match, $link_text, $link_id);

686

}xsge;

689

}xsge;

687

690

688

691

return $text;

689

692

}

691

694

sub _GenerateAnchor {

692

695

# FIXME - Fugly, change to named params?

693

696

my ($self, $whole_match, $link_text, $link_id, $url, $title, $attributes) = @_;

694

697

695

698

my $result;

696

699

697

700

$attributes = '' unless defined $attributes;

698

701

699

702

if ( !defined $url && defined $self->{_urls}{$link_id}) {

700

703

$url = $self->{_urls}{$link_id};

701

704

}

702

705

703

706

if (!defined $url) {

704

707

return $whole_match;

705

708

}

706

709

707

710

$url =~ s! \* !$g_escape_table{'*'}!gox; # We've got to encode these to avoid

708

711

$url =~ s! _ !$g_escape_table{'_'}!gox; # conflicting with italics/bold.

709

$url =~ s{^<(.*)>$}{$1}; # Remove <>'s surrounding URL, if present

710

712

$url =~ s{^<(.*)>$}{$1}; # Remove <>'s surrounding URL, if present

713

711

714

$result = qq{<a href="$url"};

712

715

713

716

if ( !defined $title && defined $link_id && defined $self->{_titles}{$link_id} ) {

714

717

$title = $self->{_titles}{$link_id};

715

718

}

716

719

717

720

if ( defined $title ) {

718

721

$title =~ s/"/"/g;

719

722

$title =~ s! \* !$g_escape_table{'*'}!gox;

720

723

$title =~ s! _ !$g_escape_table{'_'}!gox;

721

724

$result .= qq{ title="$title"};

722

725

}

723

726

724

727

$result .= "$attributes>$link_text</a>";

725

728

726

729

return $result;

754

757

my $whole_match = $1;

755

758

my $alt_text = $2;

756

759

my $link_id = lc $3;

757

760

758

761

if ($link_id eq '') {

759

762

$link_id = lc $alt_text; # for shortcut links like ![this][].

760

763

}

761

764

762

765

$self->_GenerateImage($whole_match, $alt_text, $link_id);

763

766

}xsge;

764

767

802

805

sub _GenerateImage {

803

806

# FIXME - Fugly, change to named params?

804

807

my ($self, $whole_match, $alt_text, $link_id, $url, $title, $attributes) = @_;

805

808

806

809

my $result;

807

810

808

811

$attributes = '' unless defined $attributes;

809

812

810

813

$alt_text ||= '';

811

814

$alt_text =~ s/"/"/g;

812

815

# FIXME - how about >

813

816

814

817

if ( !defined $url && defined $self->{_urls}{$link_id}) {

815

818

$url = $self->{_urls}{$link_id};

816

819

}

817

820

818

821

# If there's no such link ID, leave intact:

819

return $whole_match unless defined $url;

820

822

return $whole_match unless defined $url;

823

821

824

$url =~ s! \* !$g_escape_table{'*'}!ogx; # We've got to encode these to avoid

822

825

$url =~ s! _ !$g_escape_table{'_'}!ogx; # conflicting with italics/bold.

823

$url =~ s{^<(.*)>$}{$1}; # Remove <>'s surrounding URL, if present

824

826

$url =~ s{^<(.*)>$}{$1}; # Remove <>'s surrounding URL, if present

827

825

828

if (!defined $title && length $link_id && defined $self->{_titles}{$link_id} && length $self->{_titles}{$link_id}) {

826

829

$title = $self->{_titles}{$link_id};

827

}

830

}

828

831

829

832

$result = qq{<img src="$url" alt="$alt_text"};

830

833

if (defined $title && length $title) {

840

843

841

844

sub _DoHeaders {

842

845

my ($self, $text) = @_;

843

846

844

847

# Setext-style headers:

845

848

# Header 1

846

849

# ========

847

850

848

851

# Header 2

849

852

# --------

850

853

1124

1127

sub _DoCodeBlocks {

1125

1128

1126

1129

# Process Markdown `<pre><code>` blocks.

1127

1130

1128

1131

1129

1132

my ($self, $text) = @_;

1130

1133

1131

$text =~ s{

1132

(?:\n\n|\A)

1133

( # $1 = the code block -- one or more lines, starting with a space/tab

1134

(?:

1135

(?:[ ]{$self->{tab_width}} | \t) # Lines must start with a tab or a tab-width of spaces

1136

.*\n+

1137

1138

)

1139

((?=^[ ]{0,$self->{tab_width}}\S)|\Z) # Lookahead for non-space at line-start, or end of doc

1140

}{

1141

my $codeblock = $1;

1142

my $result; # return value

1143

1144

$codeblock = $self->_EncodeCode($self->_Outdent($codeblock));

1145

$codeblock = $self->_Detab($codeblock);

1146

$codeblock =~ s/\A\n+//; # trim leading newlines

1147

$codeblock =~ s/\n+\z//; # trim trailing newlines

1148

1149

$result = "\n\n<pre><code>" . $codeblock . "\n</code></pre>\n\n";

1150

1151

$result;

1152

}egmx;

1153

1154

return $text;

1134

$text =~ s{

1135

(?:\n\n|\A)

1136

( # $1 = the code block -- one or more lines, starting with a space/tab

1137

(?:

1138

(?:[ ]{$self->{tab_width}} | \t) # Lines must start with a tab or a tab-width of spaces

1139

.*\n+

1140

1141

)

1142

((?=^[ ]{0,$self->{tab_width}}\S)|\Z) # Lookahead for non-space at line-start, or end of doc

1143

}{

1144

my $codeblock = $1;

1145

my $result; # return value

1146

1147

$codeblock = $self->_EncodeCode($self->_Outdent($codeblock));

1148

$codeblock = $self->_Detab($codeblock);

1149

$codeblock =~ s/\A\n+//; # trim leading newlines

1150

$codeblock =~ s/\n+\z//; # trim trailing newlines

1151

1152

$result = "\n\n<pre><code>" . $codeblock . "\n</code></pre>\n\n";

1153

1154

$result;

1155

}egmx;

1156

1157

return $text;

1155

1158

}

1156

1159

1157

1160

sub _DoCodeSpans {

1158

1161

1159

1162

# * Backtick quotes are used for <code></code> spans.

1160

1163

1161

1164

# * You can use multiple backticks as the delimiters if you want to

1162

1165

# include literal backticks in the code span. So, this input:

1163

1166

1164

1167

# Just type ``foo `bar` baz`` at the prompt.

1165

1168

1166

1169

# Will translate to:

1167

1170

1168

1171

# <p>Just type <code>foo `bar` baz</code> at the prompt.</p>

1169

1172

1170

1173

# There's no arbitrary limit to the number of backticks you

1171

1174

# can use as delimters. If you need three consecutive backticks

1172

1175

# in your code, use four for delimiters, etc.

1173

1176

1174

1177

# * You can use spaces to get literal backticks at the edges:

1175

1178

1176

1179

# ... type `` `bar` `` ...

1177

1180

1178

1181

# Turns to:

1179

1182

1180

1183

# ... type <code>`bar`</code> ...

1181

1184

1182

1185

1183

1186

my ($self, $text) = @_;

1184

1187

1185

$text =~ s@

1186

(?<!\\) # Character before opening ` can't be a backslash

1187

(`+) # $1 = Opening run of `

1188

(.+?) # $2 = The code block

1189

(?<!`)

1190

\1 # Matching closer

1191

(?!`)

1192

1193

my $c = "$2";

1194

$c =~ s/^[ \t]*//g; # leading whitespace

1195

$c =~ s/[ \t]*$//g; # trailing whitespace

1196

$c = $self->_EncodeCode($c);

1197

"<code>$c</code>";

1198

@egsx;

1188

$text =~ s@

1189

(?<!\\) # Character before opening ` can't be a backslash

1190

(`+) # $1 = Opening run of `

1191

(.+?) # $2 = The code block

1192

(?<!`)

1193

\1 # Matching closer

1194

(?!`)

1195

1196

my $c = "$2";

1197

$c =~ s/^[ \t]*//g; # leading whitespace

1198

$c =~ s/[ \t]*$//g; # trailing whitespace

1199

$c = $self->_EncodeCode($c);

1200

"<code>$c</code>";

1201

@egsx;

1199

1202

1200

1203

return $text;

1201

1204

}

1208

1211

1209

1212

my $self = shift;

1210

1213

local $_ = shift;

1211

1214

1212

1215

# Encode all ampersands; HTML entities are not

1213

1216

# entities within a Markdown code span.

1214

1217

s/&/&/g;

1218

1221

{

1219

1222

no warnings 'once';

1220

1223

if (defined($blosxom::version)) {

1221

s/\$/$/g;

1224

s/\$/$/g;

1222

1225

}

1223

1226

}

1224

1227

1349

1352

1350

1353

# Encode naked <'s

1351

1354

$text =~ s{<(?![a-z/?\$!])}{<}gi;

1352

1355

1353

1356

# And >'s - added by Fletcher Penney

1354

1357

# $text =~ s{>(?![a-z/?\$!])}{>}gi;

1355

1358

# Causes problems...

1451

1454

if ( $char eq '@' ) {

1452

1455

# this *must* be encoded. I insist.

1453

1456

$char = $encode[int rand 1]->($char);

1454

}

1457

}

1455

1458

elsif ( $char ne ':' ) {

1456

1459

# leave ':' alone (to spot mailto: later)

1457

1460

my $r = rand;

1541

1544

my ($self, $text) = @_;

1542

1545

1543

1546

# FIXME - Better anchor/regex would be quicker.

1544

1547

1545

1548

# Original:

1546

1549

#$text =~ s{(.*?)\t}{$1.(' ' x ($self->{tab_width} - length($1) % $self->{tab_width}))}ge;

1547

1550

1548

1551

# Much swifter, but pretty hateful:

1549

1552

do {} while ($text =~ s{^(.*?)\t}{$1.(' ' x ($self->{tab_width} - length($1) % $self->{tab_width}))}mge);

1550

1553

return $text;

1553

1556

sub _ConvertCopyright {

1554

1557

my ($self, $text) = @_;

1555

1558

# Convert to an XML compatible form of copyright symbol

1556

1559

1557

1560

1558

1561

1559

1562

return $text;

1560

1563

}

1561

1564

1573

1576

1574

1577

=item C - <http://www.pell.portland.or.us/~orc/Code/discount>

1575

1578

1576

Discount - Original Markdown, but in C. Fastest implementation available, and passes MDTest.

1579

Discount - Original Markdown, but in C. Fastest implementation available, and passes MDTest.

1577

1580

Adds it's own set of custom features.

1578

1581

1579

1582

=item python - <http://www.freewisdom.org/projects/python-markdown/>

1586

1589

1587

1590

=item php - <http://michelf.com/projects/php-markdown/>

1588

1591

1589

A direct port of Markdown.pl, also has a separately maintained 'extra' version,

1592

A direct port of Markdown.pl, also has a separately maintained 'extra' version,

1590

1593

which adds a number of features that were borrowed by MultiMarkdown.

1591

1594

1592

1595

=item lua - <http://www.frykholm.se/files/markdown.lua>

1608

1611

To file bug reports or feature requests please send email to:

1609

1612

1610

1613

bug-Text-Markdown@rt.cpan.org

1611

1614

1612

1615

Please include with your report: (1) the example input; (2) the output

1613

1616

you expected; (3) the output Markdown actually produced.

1614

1617

1629

1632

1630

1633

CPAN Module Text::MultiMarkdown (based on Text::Markdown by Sebastian

1631

1634

Riedel) originally by Darren Kulp (http://kulp.ch/)

1632

1635

1633

1636

This module is maintained by: Tomas Doran http://www.bobtfish.net/

1634

1637

1635

1638

=head1 THIS DISTRIBUTION

1636

1639

1637

Please note that this distribution is a fork of John Gruber's original Markdown project,

1640

Please note that this distribution is a fork of John Gruber's original Markdown project,

1638

1641

and it *is not* in any way blessed by him.

1639

1642

1640

Whilst this code aims to be compatible with the original Markdown.pl (and incorporates

1641

and passes the Markdown test suite) whilst fixing a number of bugs in the original -

1643

Whilst this code aims to be compatible with the original Markdown.pl (and incorporates

1644

and passes the Markdown test suite) whilst fixing a number of bugs in the original -

1642

1645

there may be differences between the behaviour of this module and Markdown.pl. If you find

1643

any differences where you believe Text::Markdown behaves contrary to the Markdown spec,

1646

any differences where you believe Text::Markdown behaves contrary to the Markdown spec,

1644

1647

please report them as bugs.

1645

1648

1646

1649

Text::Markdown *does not* extend the markdown dialect in any way from that which is documented at

1647

1650

daringfireball. If you want additional features, you should look at L<Text::MultiMarkdown>.

1648

1651

1652

=head1 SOURCE CODE

1653

1654

You can find the source code repository for L<Text::Markdown> and L<Text::MultiMarkdown>

1655

on GitHub at <http://github.com/bobtfish/text-markdown>.

1656

1649

1657

=head1 COPYRIGHT AND LICENSE

1650

1658

1651

1652

<http://daringfireball.net/>

1653

1654

1655

1656

<http://fletcher.freeshell.org/>

1657

1658

1659

1660

<http://daringfireball.net/>

1661

1662

1663

1664

<http://fletcher.freeshell.org/>

1665

1666

1667

1660

1668

<http://kulp.ch> and Tomas Doran <http://www.bobtfish.net>

1661

1669

1662

1670

Redistribution and use in source and binary forms, with or without

Older »