76
79
<div id="centerContent">
77
80
<div id="contentHeader">
78
81
<div id="contentHeaderLeft"><a href="#" onClick="showLeft()">Show navigation</a></div>
79
<div id="contentHeaderCentre">-- Perl 5.8.8 documentation --</div>
82
<div id="contentHeaderCentre">-- Perl 5.10.0 documentation --</div>
80
83
<div id="contentHeaderRight"><a href="#" onClick="showRight()">Show toolbar</a></div>
82
85
<div id="breadCrumbs"><a href="../index.html">Home</a> > <a href="../index-modules-A.html">Core modules</a> > <a href="../index-modules-T.html">T</a> > Text::Balanced</div>
83
86
<script language="JavaScript">fromSearch();</script>
84
<div id="contentBody"><div class="title_container"><div class="page_title">Text::Balanced</div></div><ul><li><a href="#NAME">NAME</a><li><a href="#SYNOPSIS">SYNOPSIS</a><li><a href="#DESCRIPTION">DESCRIPTION</a><ul><li><a href="#General-behaviour-in-list-contexts">General behaviour in list contexts</a><li><a href="#General-behaviour-in-scalar-and-void-contexts">General behaviour in scalar and void contexts</a><li><a href="#A-note-about-prefixes">A note about prefixes</a><li><a href="#'extract_delimited'"><code class="inline">extract_delimited</code>
85
</a><li><a href="#'extract_bracketed'"><code class="inline">extract_bracketed</code>
86
</a><li><a href="#'extract_variable'"><code class="inline">extract_variable</code>
87
</a><li><a href="#'extract_tagged'"><code class="inline">extract_tagged</code>
88
</a><li><a href="#'gen_extract_tagged'"><code class="inline">gen_extract_tagged</code>
89
</a><li><a href="#'extract_quotelike'"><code class="inline">extract_quotelike</code>
90
</a><li><a href="#'extract_quotelike'-and-%22here-documents%22"><code class="inline">extract_quotelike</code>
91
and "here documents"</a><li><a href="#'extract_codeblock'"><code class="inline">extract_codeblock</code>
92
</a><li><a href="#'extract_multiple'"><code class="inline">extract_multiple</code>
93
</a><li><a href="#'gen_delimited_pat'"><code class="inline">gen_delimited_pat</code>
87
<div id="contentBody"><div class="title_container"><div class="page_title">Text::Balanced</div></div><ul><li><a href="#NAME">NAME</a><li><a href="#SYNOPSIS">SYNOPSIS</a><li><a href="#DESCRIPTION">DESCRIPTION</a><ul><li><a href="#General-behaviour-in-list-contexts">General behaviour in list contexts</a><li><a href="#General-behaviour-in-scalar-and-void-contexts">General behaviour in scalar and void contexts</a><li><a href="#A-note-about-prefixes">A note about prefixes</a><li><a href="#'extract_delimited'"><code class="inline"><span class="w">extract_delimited</span></code>
88
</a><li><a href="#'extract_bracketed'"><code class="inline"><span class="w">extract_bracketed</span></code>
89
</a><li><a href="#'extract_variable'"><code class="inline"><span class="w">extract_variable</span></code>
90
</a><li><a href="#'extract_tagged'"><code class="inline"><span class="w">extract_tagged</span></code>
91
</a><li><a href="#'gen_extract_tagged'"><code class="inline"><span class="w">gen_extract_tagged</span></code>
92
</a><li><a href="#'extract_quotelike'"><code class="inline"><span class="w">extract_quotelike</span></code>
93
</a><li><a href="#'extract_quotelike'-and-%22here-documents%22"><code class="inline"><span class="w">extract_quotelike</span></code>
94
and "here documents"</a><li><a href="#'extract_codeblock'"><code class="inline"><span class="w">extract_codeblock</span></code>
95
</a><li><a href="#'extract_multiple'"><code class="inline"><span class="w">extract_multiple</span></code>
96
</a><li><a href="#'gen_delimited_pat'"><code class="inline"><span class="w">gen_delimited_pat</span></code>
97
</a><li><a href="#'delimited_pat'"><code class="inline"><span class="w">delimited_pat</span></code>
94
98
</a></ul><li><a href="#DIAGNOSTICS">DIAGNOSTICS</a><li><a href="#AUTHOR">AUTHOR</a><li><a href="#BUGS-AND-IRRITATIONS">BUGS AND IRRITATIONS</a><li><a href="#COPYRIGHT">COPYRIGHT</a></ul><a name="NAME"></a><h1>NAME</h1>
95
99
<p>Text::Balanced - Extract delimited text sequences from strings.</p>
96
100
<a name="SYNOPSIS"></a><h1>SYNOPSIS</h1>
211
215
. normally doesn't match newlines.</p>
212
216
<p>To overcome this limitation, you need to turn on /s matching within
213
217
the prefix pattern, using the <code class="inline">(?s)</code> directive: '(?s).*?(?=<H1>)'</p>
214
<a name="'extract_delimited'"></a><h2><code class="inline">extract_delimited</code>
218
<a name="'extract_delimited'"></a><h2><code class="inline"><span class="w">extract_delimited</span></code>
216
<p>The <code class="inline">extract_delimited</code>
220
<p>The <code class="inline"><span class="w">extract_delimited</span></code>
217
221
function formalizes the common idiom
218
222
of extracting a single-character-delimited substring from the start of
219
223
a string. For example, to extract a single-quote delimited string, the
220
224
following code is typically used:</p>
221
225
<pre class="verbatim"> <span class="s">(</span><span class="i">$remainder</span> = <span class="i">$text</span><span class="s">)</span> =~ <span class="q">s/\A('(\\.|[^'])*')//s</span><span class="sc">;</span>
222
226
<span class="i">$extracted</span> = <span class="i">$1</span><span class="sc">;</span></pre>
223
<p>but with <code class="inline">extract_delimited</code>
227
<p>but with <code class="inline"><span class="w">extract_delimited</span></code>
224
228
it can be simplified to:</p>
225
229
<pre class="verbatim"> <span class="s">(</span><span class="i">$extracted</span><span class="cm">,</span><span class="i">$remainder</span><span class="s">)</span> = <span class="i">extract_delimited</span><span class="s">(</span><span class="i">$text</span><span class="cm">,</span> <span class="q">"'"</span><span class="s">)</span><span class="sc">;</span></pre>
226
<p><code class="inline">extract_delimited</code>
230
<p><code class="inline"><span class="w">extract_delimited</span></code>
227
231
takes up to four scalars (the input text, the
228
232
delimiters, a prefix pattern to be skipped, and any escape characters)
229
233
and extracts the initial substring of the text that
265
269
<pre class="verbatim"> <span class="c"># Extract a single- or double- quoted substring from the</span>
266
270
<span class="c"># beginning of $text, optionally after some whitespace</span>
267
271
<span class="c"># (note the list context to protect $text from modification):</span></pre>
268
<pre class="verbatim"> <span class="s">(</span><span class="i">$substring</span><span class="s">)</span> = extract_delimited <span class="i">$text</span><span class="cm">,</span> <span class="q">q{"'}</span><span class="sc">;</span></pre>
272
<pre class="verbatim"> <span class="s">(</span><span class="i">$substring</span><span class="s">)</span> = <span class="w">extract_delimited</span> <span class="i">$text</span><span class="cm">,</span> <span class="q">q{"'}</span><span class="sc">;</span></pre>
269
273
<pre class="verbatim"> <span class="c"># Delete the substring delimited by the first '/' in $text:</span></pre>
270
274
<pre class="verbatim"> $text = join '', (extract_delimited($text,'/','[^/]*')[2,1];</pre><p>Note that this last example is <i>not</i> the same as deleting the first
271
275
quote-like pattern. For instance, if <code class="inline"><span class="i">$text</span></code>
272
276
contained the string:</p>
273
<pre class="verbatim"> "if ('./cmd' =~ m/$UNIXCMD/s) { $cmd = $1; }"
275
then after the deletion it would contain:</pre><pre class="verbatim"> <span class="q">"if ('.$UNIXCMD/s) { $cmd = $1; }"</span></pre>
277
<pre class="verbatim"> <span class="q">"if ('./cmd' =~ m/$UNIXCMD/s) { $cmd = $1; }"</span></pre>
278
<p>then after the deletion it would contain:</p>
279
<pre class="verbatim"> <span class="q">"if ('.$UNIXCMD/s) { $cmd = $1; }"</span></pre>
277
<pre class="verbatim"> <span class="q">"if ('./cmd' =~ ms) { $cmd = $1; }"</span>
281
<pre class="verbatim"> <span class="q">"if ('./cmd' =~ ms) { $cmd = $1; }"</span></pre>
279
282
<p>See <a href="#extract_quotelike">"extract_quotelike"</a> for a (partial) solution to this problem.</p>
280
<a name="'extract_bracketed'"></a><h2><code class="inline">extract_bracketed</code>
283
<a name="'extract_bracketed'"></a><h2><code class="inline"><span class="w">extract_bracketed</span></code>
282
285
<p>Like <code class="inline"><span class="q">"extract_delimited"</span></code>
283
, the <code class="inline">extract_bracketed</code>
286
, the <code class="inline"><span class="w">extract_bracketed</span></code>
285
288
up to three optional scalar arguments: a string to extract from, a delimiter
286
289
specifier, and a prefix pattern. As before, a missing prefix defaults to
438
441
<p>The various options that can be specified are:</p>
440
<li><a name="'reject-%3d%3e-%24listref'"></a><b><code class="inline">reject <span class="cm">=></span> <span class="i">$listref</span></code>
443
<li><a name="'reject-%3d%3e-%24listref'"></a><b><code class="inline"><span class="w">reject</span> <span class="cm">=></span> <span class="i">$listref</span></code>
442
445
<p>The list reference contains one or more strings specifying patterns
443
446
that must <i>not</i> appear within the tagged text.</p>
444
447
<p>For example, to extract
445
448
an HTML link (which should not contain nested links) use:</p>
446
<pre class="verbatim"> <span class="i">extract_tagged</span><span class="s">(</span><span class="i">$text</span><span class="cm">,</span> <span class="q">'<A>'</span><span class="cm">,</span> <span class="q">'</A>'</span><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <span class="s">{</span>reject <span class="cm">=></span> <span class="s">[</span><span class="q">'<A>'</span><span class="s">]</span><span class="s">}</span> <span class="s">)</span><span class="sc">;</span></pre>
449
<pre class="verbatim"> <span class="i">extract_tagged</span><span class="s">(</span><span class="i">$text</span><span class="cm">,</span> <span class="q">'<A>'</span><span class="cm">,</span> <span class="q">'</A>'</span><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <span class="s">{</span><span class="w">reject</span> <span class="cm">=></span> <span class="s">[</span><span class="q">'<A>'</span><span class="s">]</span><span class="s">}</span> <span class="s">)</span><span class="sc">;</span></pre>
448
<li><a name="'ignore-%3d%3e-%24listref'"></a><b><code class="inline">ignore <span class="cm">=></span> <span class="i">$listref</span></code>
451
<li><a name="'ignore-%3d%3e-%24listref'"></a><b><code class="inline"><span class="w">ignore</span> <span class="cm">=></span> <span class="i">$listref</span></code>
450
453
<p>The list reference contains one or more strings specifying patterns
451
454
that are <i>not</i> be be treated as nested tags within the tagged text
452
455
(even if they would match the start tag pattern).</p>
453
456
<p>For example, to extract an arbitrary XML tag, but ignore "empty" elements:</p>
454
<pre class="verbatim"> <span class="i">extract_tagged</span><span class="s">(</span><span class="i">$text</span><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <span class="s">{</span>ignore <span class="cm">=></span> <span class="s">[</span><span class="q">'<[^>]*/>'</span><span class="s">]</span><span class="s">}</span> <span class="s">)</span><span class="sc">;</span></pre>
457
<pre class="verbatim"> <span class="i">extract_tagged</span><span class="s">(</span><span class="i">$text</span><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <span class="s">{</span><span class="w">ignore</span> <span class="cm">=></span> <span class="s">[</span><span class="q">'<[^>]*/>'</span><span class="s">]</span><span class="s">}</span> <span class="s">)</span><span class="sc">;</span></pre>
455
458
<p>(also see <a href="#gen_delimited_pat">"gen_delimited_pat"</a> below).</p>
457
<li><a name="'fail-%3d%3e-%24str'"></a><b><code class="inline">fail <span class="cm">=></span> <span class="i">$str</span></code>
460
<li><a name="'fail-%3d%3e-%24str'"></a><b><code class="inline"><span class="w">fail</span> <span class="cm">=></span> <span class="i">$str</span></code>
459
<p>The <code class="inline">fail</code>
462
<p>The <code class="inline"><span class="w">fail</span></code>
460
463
option indicates the action to be taken if a matching end
461
464
tag is not encountered (i.e. before the end of the string or some
462
<code class="inline">reject</code>
465
<code class="inline"><span class="w">reject</span></code>
463
466
pattern matches). By default, a failure to match a closing
464
tag causes <code class="inline">extract_tagged</code>
467
tag causes <code class="inline"><span class="w">extract_tagged</span></code>
465
468
to immediately fail.</p>
466
469
<p>However, if the string value associated with <reject> is "MAX", then
467
<code class="inline">extract_tagged</code>
470
<code class="inline"><span class="w">extract_tagged</span></code>
468
471
returns the complete text up to the point of failure.
469
If the string is "PARA", <code class="inline">extract_tagged</code>
472
If the string is "PARA", <code class="inline"><span class="w">extract_tagged</span></code>
470
473
returns only the first paragraph
471
474
after the tag (up to the first line that is either empty or contains
472
475
only whitespace characters).
510
513
<p>On failure, all of these values (except the remaining text) are <code class="inline"><a class="l_k" href="../functions/undef.html">undef</a></code>.</p>
511
<p>In a scalar context, <code class="inline">extract_tagged</code>
514
<p>In a scalar context, <code class="inline"><span class="w">extract_tagged</span></code>
512
515
returns just the complete
513
516
substring that matched a tagged text (including the start and end
514
517
tags). <code class="inline"><a class="l_k" href="../functions/undef.html">undef</a></code> is returned on failure. In addition, the original input
515
518
text has the returned substring (and any prefix) removed from it.</p>
516
519
<p>In a void context, the input text just has the matched substring (and
517
520
any specified prefix) removed.</p>
518
<a name="'gen_extract_tagged'"></a><h2><code class="inline">gen_extract_tagged</code>
521
<a name="'gen_extract_tagged'"></a><h2><code class="inline"><span class="w">gen_extract_tagged</span></code>
520
523
<p>(Note: This subroutine is only available under Perl5.005)</p>
521
<p><code class="inline">gen_extract_tagged</code>
524
<p><code class="inline"><span class="w">gen_extract_tagged</span></code>
522
525
generates a new anonymous subroutine which
523
526
extracts text between (balanced) specified tags. In other words,
524
it generates a function identical in function to <code class="inline">extract_tagged</code>
527
it generates a function identical in function to <code class="inline"><span class="w">extract_tagged</span></code>
526
<p>The difference between <code class="inline">extract_tagged</code>
529
<p>The difference between <code class="inline"><span class="w">extract_tagged</span></code>
527
530
and the anonymous
528
531
subroutines generated by
529
<code class="inline">gen_extract_tagged</code>
532
<code class="inline"><span class="w">gen_extract_tagged</span></code>
530
533
, is that those generated subroutines:</p>
533
536
<p>do not have to reparse tag specification or parsing options every time
534
they are called (whereas <code class="inline">extract_tagged</code>
537
they are called (whereas <code class="inline"><span class="w">extract_tagged</span></code>
535
538
has to effectively rebuild
536
539
its tag parser on every call);</p>
539
542
<p>make use of the new qr// construct to pre-compile the regexes they use
540
(whereas <code class="inline">extract_tagged</code>
543
(whereas <code class="inline"><span class="w">extract_tagged</span></code>
541
544
uses standard string variable interpolation
542
545
to create tag-matching patterns).</p>
545
548
<p>The subroutine takes up to four optional arguments (the same set as
546
<code class="inline">extract_tagged</code>
549
<code class="inline"><span class="w">extract_tagged</span></code>
547
550
except for the string to be processed). It returns
548
551
a reference to a subroutine which in turn takes a single argument (the text to
549
552
be extracted from).</p>
550
<p>In other words, the implementation of <code class="inline">extract_tagged</code>
553
<p>In other words, the implementation of <code class="inline"><span class="w">extract_tagged</span></code>
552
555
equivalent to:</p>
553
556
<pre class="verbatim"><a name="extract_tagged"></a> sub <span class="m">extract_tagged</span>
556
559
<span class="i">$extractor</span> = <span class="i">gen_extract_tagged</span><span class="s">(</span><span class="i">@_</span><span class="s">)</span><span class="sc">;</span>
557
560
<a class="l_k" href="../functions/return.html">return</a> <span class="i">$extractor</span>-><span class="s">(</span><span class="i">$text</span><span class="s">)</span><span class="sc">;</span>
558
561
<span class="s">}</span></pre>
559
<p>(although <code class="inline">extract_tagged</code>
562
<p>(although <code class="inline"><span class="w">extract_tagged</span></code>
560
563
is not currently implemented that way, in order
561
564
to preserve pre-5.005 compatibility).</p>
562
<p>Using <code class="inline">gen_extract_tagged</code>
565
<p>Using <code class="inline"><span class="w">gen_extract_tagged</span></code>
563
566
to create extraction functions for specific tags
564
567
is a good idea if those functions are going to be called more than once, since
565
568
their performance is typically twice as good as the more general-purpose
566
<code class="inline">extract_tagged</code>
569
<code class="inline"><span class="w">extract_tagged</span></code>
568
<a name="'extract_quotelike'"></a><h2><code class="inline">extract_quotelike</code>
571
<a name="'extract_quotelike'"></a><h2><code class="inline"><span class="w">extract_quotelike</span></code>
570
<p><code class="inline">extract_quotelike</code>
573
<p><code class="inline"><span class="w">extract_quotelike</span></code>
571
574
attempts to recognize, extract, and segment any
572
575
one of the various Perl quotes and quotelike operators (see
573
576
<i>perlop(3)</i>) Nested backslashed delimiters, embedded balanced bracket
574
577
delimiters (for the quotelike operators), and trailing modifiers are
575
578
all caught. For example, in:</p>
576
<pre class="verbatim"> extract_quotelike 'q # an octothorpe: \# (not the end of the q!) #'
578
extract_quotelike ' "You said, \"Use sed\"." '</pre><pre class="verbatim"> extract_quotelike <span class="q">' s{([A-Z]{1,8}\.[A-Z]{3})} /\L$1\E/; '</span></pre>
579
<pre class="verbatim"> extract_quotelike <span class="q">' tr/\\\/\\\\/\\\//ds; '</span></pre>
579
<pre class="verbatim"> <span class="w">extract_quotelike</span> <span class="q">'q # an octothorpe: \# (not the end of the q!) #'</span></pre>
580
<pre class="verbatim"> <span class="w">extract_quotelike</span> <span class="q">' "You said, \"Use sed\"." '</span></pre>
581
<pre class="verbatim"> <span class="w">extract_quotelike</span> <span class="q">' s{([A-Z]{1,8}\.[A-Z]{3})} /\L$1\E/; '</span></pre>
582
<pre class="verbatim"> <span class="w">extract_quotelike</span> <span class="q">' tr/\\\/\\\\/\\\//ds; '</span></pre>
580
583
<p>the full Perl quotelike operations are all extracted correctly.</p>
581
584
<p>Note too that, when using the /x modifier on a regex, any comment
582
585
containing the current pattern delimiter will cause the regex to be
726
729
matching position after the here document, but now the rest of the line
727
730
on which the here document starts is not skipped.</p>
728
731
<p>To prevent <extract_quotelike> from mucking about with the input in this way
729
(this is the only case where a list-context <code class="inline">extract_quotelike</code>
732
(this is the only case where a list-context <code class="inline"><span class="w">extract_quotelike</span></code>
731
734
you can pass the input variable as an interpolated literal:</p>
732
735
<pre class="verbatim"> <span class="i">$quotelike</span> = <span class="i">extract_quotelike</span><span class="s">(</span><span class="q">"$var"</span><span class="s">)</span><span class="sc">;</span></pre>
733
<a name="'extract_codeblock'"></a><h2><code class="inline">extract_codeblock</code>
736
<a name="'extract_codeblock'"></a><h2><code class="inline"><span class="w">extract_codeblock</span></code>
735
<p><code class="inline">extract_codeblock</code>
738
<p><code class="inline"><span class="w">extract_codeblock</span></code>
736
739
attempts to recognize and extract a balanced
737
740
bracket delimited substring that may contain unbalanced brackets
738
inside Perl quotes or quotelike operations. That is, <code class="inline">extract_codeblock</code>
741
inside Perl quotes or quotelike operations. That is, <code class="inline"><span class="w">extract_codeblock</span></code>
740
743
is like a combination of <code class="inline"><span class="q">"extract_bracketed"</span></code>
742
745
<code class="inline"><span class="q">"extract_quotelike"</span></code>
744
<p><code class="inline">extract_codeblock</code>
745
takes the same initial three parameters as <code class="inline">extract_bracketed</code>
747
<p><code class="inline"><span class="w">extract_codeblock</span></code>
748
takes the same initial three parameters as <code class="inline"><span class="w">extract_bracketed</span></code>
747
750
a text to process, a set of delimiter brackets to look for, and a prefix to
748
751
match first. It also takes an optional fourth parameter, which allows the
805
808
<pre class="verbatim"> <span class="q"><defer: {if ($count></span></pre>
806
809
<p>because the "less than" operator is interpreted as a closing delimiter.</p>
807
810
<p>But, by extracting the directive using
808
<code class="inline">extract_codeblock($text, '{}', undef, '<>')</code>
811
<code class="inline"><span class="i">extract_codeblock</span><span class="s">(</span><span class="i">$text</span><span class="cm">,</span> <span class="q">'{}'</span><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <span class="q">'<>'</span><span class="s">)</span></code>
809
812
the '>' character is only treated as a delimited at the outermost
810
813
level of the code block, so the directive is parsed correctly.</p>
811
<a name="'extract_multiple'"></a><h2><code class="inline">extract_multiple</code>
814
<a name="'extract_multiple'"></a><h2><code class="inline"><span class="w">extract_multiple</span></code>
813
<p>The <code class="inline">extract_multiple</code>
816
<p>The <code class="inline"><span class="w">extract_multiple</span></code>
814
817
subroutine takes a string to be processed and a
815
818
list of extractors (subroutines or regular expressions) to apply to that string.</p>
816
<p>In an array context <code class="inline">extract_multiple</code>
819
<p>In an array context <code class="inline"><span class="w">extract_multiple</span></code>
817
820
returns an array of substrings
818
821
of the original string, as extracted by the specified extractors.
819
In a scalar context, <code class="inline">extract_multiple</code>
822
In a scalar context, <code class="inline"><span class="w">extract_multiple</span></code>
820
823
returns the first
821
824
substring successfully extracted from the original string. In both
822
825
scalar and void contexts the original string has the first successfully
823
826
extracted substring removed from it. In all contexts
824
<code class="inline">extract_multiple</code>
827
<code class="inline"><span class="w">extract_multiple</span></code>
825
828
starts at the current <code class="inline"><a class="l_k" href="../functions/pos.html">pos</a></code> of the string, and
826
829
sets that <code class="inline"><a class="l_k" href="../functions/pos.html">pos</a></code> appropriately after it matches.</p>
827
<p>Hence, the aim of of a call to <code class="inline">extract_multiple</code>
830
<p>Hence, the aim of of a call to <code class="inline"><span class="w">extract_multiple</span></code>
828
831
in a list context
829
832
is to split the processed string into as many non-overlapping fields as
830
833
possible, by repeatedly applying each of the specified extractors
831
to the remainder of the string. Thus <code class="inline">extract_multiple</code>
834
to the remainder of the string. Thus <code class="inline"><span class="w">extract_multiple</span></code>
833
836
a generalized form of Perl's <code class="inline"><a class="l_k" href="../functions/split.html">split</a></code> subroutine.</p>
834
837
<p>The subroutine takes up to four optional arguments:</p>
934
937
<p>If you wanted the commas preserved as separate fields (i.e. like split
935
938
does if your split pattern has capturing parentheses), you would
936
939
just make the last parameter undefined (or remove it).</p>
937
<a name="'gen_delimited_pat'"></a><h2><code class="inline">gen_delimited_pat</code>
940
<a name="'gen_delimited_pat'"></a><h2><code class="inline"><span class="w">gen_delimited_pat</span></code>
939
<p>The <code class="inline">gen_delimited_pat</code>
942
<p>The <code class="inline"><span class="w">gen_delimited_pat</span></code>
940
943
subroutine takes a single (string) argument and
941
944
> builds a Friedl-style optimized regex that matches a string delimited
942
945
by any one of the characters in the single argument. For example:</p>
943
946
<pre class="verbatim"> <span class="i">gen_delimited_pat</span><span class="s">(</span><span class="q">q{'"}</span><span class="s">)</span></pre>
944
947
<p>returns the regex:</p>
945
948
<pre class="verbatim"> (?:\"(?:\\\"|(?!\").)*\"|\'(?:\\\'|(?!\').)*\')</pre><p>Note that the specified delimiters are automatically quotemeta'd.</p>
946
<p>A typical use of <code class="inline">gen_delimited_pat</code>
949
<p>A typical use of <code class="inline"><span class="w">gen_delimited_pat</span></code>
947
950
would be to build special purpose tags
948
for <code class="inline">extract_tagged</code>
951
for <code class="inline"><span class="w">extract_tagged</span></code>
949
952
. For example, to properly ignore "empty" XML elements
950
953
(which might contain quoted strings):</p>
951
954
<pre class="verbatim"> <a class="l_k" href="../functions/my.html">my</a> <span class="i">$empty_tag</span> = <span class="q">'<('</span> . <span class="i">gen_delimited_pat</span><span class="s">(</span><span class="q">q{'"}</span><span class="s">)</span> . <span class="q">'|.)+/>'</span><span class="sc">;</span></pre>
952
<pre class="verbatim"> <span class="i">extract_tagged</span><span class="s">(</span><span class="i">$text</span><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <span class="s">{</span>ignore <span class="cm">=></span> <span class="s">[</span><span class="i">$empty_tag</span><span class="s">]</span><span class="s">}</span> <span class="s">)</span><span class="sc">;</span></pre>
953
<p><code class="inline">gen_delimited_pat</code>
955
<pre class="verbatim"> <span class="i">extract_tagged</span><span class="s">(</span><span class="i">$text</span><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <span class="s">{</span><span class="w">ignore</span> <span class="cm">=></span> <span class="s">[</span><span class="i">$empty_tag</span><span class="s">]</span><span class="s">}</span> <span class="s">)</span><span class="sc">;</span></pre>
956
<p><code class="inline"><span class="w">gen_delimited_pat</span></code>
954
957
may also be called with an optional second argument,
955
958
which specifies the "escape" character(s) to be used for each delimiter.
956
959
For example to match a Pascal-style string (where ' is the delimiter
998
1001
<p>A non-optional prefix was specified but wasn't found at the start of the text.</p>
1000
1003
<li><a name="'Did-not-find-opening-bracket-after-prefix%3a-%22%25s%22'"></a><b><code class="inline">Did not find opening bracket after prefix: "%s"</code></b>
1001
<p><code class="inline">extract_bracketed</code>
1002
or <code class="inline">extract_codeblock</code>
1004
<p><code class="inline"><span class="w">extract_bracketed</span></code>
1005
or <code class="inline"><span class="w">extract_codeblock</span></code>
1003
1006
was expecting a
1004
1007
particular kind of bracket at the start of the text, and didn't find it.</p>
1006
1009
<li><a name="'No-quotelike-operator-found-after-prefix%3a-%22%25s%22'"></a><b><code class="inline">No quotelike operator found after prefix: "%s"</code></b>
1007
<p><code class="inline">extract_quotelike</code>
1010
<p><code class="inline"><span class="w">extract_quotelike</span></code>
1008
1011
didn't find one of the quotelike operators <code class="inline"><a class="l_k" href="../functions/q.html">q</a></code>,
1009
1012
<code class="inline"><a class="l_k" href="../functions/qq.html">qq</a></code>, <code class="inline"><a class="l_k" href="../functions/qw.html">qw</a></code>, <code class="inline"><a class="l_k" href="../functions/qx.html">qx</a></code>, <code class="inline"><a class="l_k" href="../functions/s.html">s</a></code>, <code class="inline"><a class="l_k" href="../functions/tr.html">tr</a></code> or <code class="inline"><a class="l_k" href="../functions/y.html">y</a></code> at the start of the substring
1010
1013
it was extracting.</p>
1012
1015
<li><a name="'Unmatched-closing-bracket%3a-%22%25c%22'"></a><b><code class="inline">Unmatched closing bracket: "%c"</code></b>
1013
<p><code class="inline">extract_bracketed</code>
1014
, <code class="inline">extract_quotelike</code>
1015
or <code class="inline">extract_codeblock</code>
1016
<p><code class="inline"><span class="w">extract_bracketed</span></code>
1017
, <code class="inline"><span class="w">extract_quotelike</span></code>
1018
or <code class="inline"><span class="w">extract_codeblock</span></code>
1017
1020
a closing bracket where none was expected.</p>
1019
1022
<li><a name="'Unmatched-opening-bracket(s)%3a-%22%25s%22'"></a><b><code class="inline">Unmatched opening bracket(s): "%s"</code></b>
1020
<p><code class="inline">extract_bracketed</code>
1021
, <code class="inline">extract_quotelike</code>
1022
or <code class="inline">extract_codeblock</code>
1023
<p><code class="inline"><span class="w">extract_bracketed</span></code>
1024
, <code class="inline"><span class="w">extract_quotelike</span></code>
1025
or <code class="inline"><span class="w">extract_codeblock</span></code>
1024
1027
out of characters in the text before closing one or more levels of nested
1027
<li><a name="'Unmatched-embedded-quote-(%25s)'"></a><b><code class="inline">Unmatched embedded quote <span class="s">(</span><span class="i">%s</span><span class="s">)</span></code>
1030
<li><a name="'Unmatched-embedded-quote-(%25s)'"></a><b><code class="inline"><span class="w">Unmatched</span> <span class="w">embedded</span> <span class="w">quote</span> <span class="s">(</span><span class="i">%s</span><span class="s">)</span></code>
1029
<p><code class="inline">extract_bracketed</code>
1032
<p><code class="inline"><span class="w">extract_bracketed</span></code>
1030
1033
attempted to match an embedded quoted substring, but
1031
1034
failed to find a closing quote to match it.</p>
1033
<li><a name="'Did-not-find-closing-delimiter-to-match-'%25s''"></a><b><code class="inline">Did not find closing delimiter to match <span class="q">'%s'</span></code>
1036
<li><a name="'Did-not-find-closing-delimiter-to-match-'%25s''"></a><b><code class="inline"><span class="w">Did</span> not <span class="w">find</span> <span class="w">closing</span> <span class="w">delimiter</span> <span class="w">to</span> <span class="w">match</span> <span class="q">'%s'</span></code>
1035
<p><code class="inline">extract_quotelike</code>
1038
<p><code class="inline"><span class="w">extract_quotelike</span></code>
1036
1039
was unable to find a closing delimiter to match the
1037
1040
one that opened the quote-like operation.</p>
1039
1042
<li><a name="'Mismatched-closing-bracket%3a-expected-%22%25c%22-but-found-%22%25s%22'"></a><b><code class="inline">Mismatched closing bracket: expected "%c" but found "%s"</code></b>
1040
<p><code class="inline">extract_bracketed</code>
1041
, <code class="inline">extract_quotelike</code>
1042
or <code class="inline">extract_codeblock</code>
1043
<p><code class="inline"><span class="w">extract_bracketed</span></code>
1044
, <code class="inline"><span class="w">extract_quotelike</span></code>
1045
or <code class="inline"><span class="w">extract_codeblock</span></code>
1044
1047
a valid bracket delimiter, but it was the wrong species. This usually
1045
1048
indicates a nesting error, but may indicate incorrect quoting or escaping.</p>
1047
<li><a name="'No-block-delimiter-found-after-quotelike-%22%25s%22'"></a><b><code class="inline">No block delimiter found after quotelike <span class="q">"%s"</span></code>
1050
<li><a name="'No-block-delimiter-found-after-quotelike-%22%25s%22'"></a><b><code class="inline"><span class="w">No</span> <span class="w">block</span> <span class="w">delimiter</span> <span class="w">found</span> <span class="w">after</span> <span class="w">quotelike</span> <span class="q">"%s"</span></code>
1049
<p><code class="inline">extract_quotelike</code>
1050
or <code class="inline">extract_codeblock</code>
1052
<p><code class="inline"><span class="w">extract_quotelike</span></code>
1053
or <code class="inline"><span class="w">extract_codeblock</span></code>
1051
1054
found one of the
1052
1055
quotelike operators <code class="inline"><a class="l_k" href="../functions/q.html">q</a></code>, <code class="inline"><a class="l_k" href="../functions/qq.html">qq</a></code>, <code class="inline"><a class="l_k" href="../functions/qw.html">qw</a></code>, <code class="inline"><a class="l_k" href="../functions/qx.html">qx</a></code>, <code class="inline"><a class="l_k" href="../functions/s.html">s</a></code>, <code class="inline"><a class="l_k" href="../functions/tr.html">tr</a></code> or <code class="inline"><a class="l_k" href="../functions/y.html">y</a></code>
1053
1056
without a suitable block after it.</p>
1055
<li><a name="'Did-not-find-leading-dereferencer'"></a><b><code class="inline">Did not find leading dereferencer</code>
1058
<li><a name="'Did-not-find-leading-dereferencer'"></a><b><code class="inline"><span class="w">Did</span> not <span class="w">find</span> <span class="w">leading</span> <span class="w">dereferencer</span></code>
1057
<p><code class="inline">extract_variable</code>
1060
<p><code class="inline"><span class="w">extract_variable</span></code>
1058
1061
was expecting one of '$', '@', or '%' at the start of
1059
1062
a variable, but didn't find any of them.</p>
1061
<li><a name="'Bad-identifier-after-dereferencer'"></a><b><code class="inline">Bad identifier after dereferencer</code>
1064
<li><a name="'Bad-identifier-after-dereferencer'"></a><b><code class="inline"><span class="w">Bad</span> <span class="w">identifier</span> <span class="w">after</span> <span class="w">dereferencer</span></code>
1063
<p><code class="inline">extract_variable</code>
1066
<p><code class="inline"><span class="w">extract_variable</span></code>
1064
1067
found a '$', '@', or '%' indicating a variable, but that
1065
1068
character was not followed by a legal Perl identifier.</p>
1067
<li><a name="'Did-not-find-expected-opening-bracket-at-%25s'"></a><b><code class="inline">Did not find expected opening bracket at <span class="i">%s</span></code>
1070
<li><a name="'Did-not-find-expected-opening-bracket-at-%25s'"></a><b><code class="inline"><span class="w">Did</span> not <span class="w">find</span> <span class="w">expected</span> <span class="w">opening</span> <span class="w">bracket</span> <span class="w">at</span> <span class="i">%s</span></code>
1069
<p><code class="inline">extract_codeblock</code>
1072
<p><code class="inline"><span class="w">extract_codeblock</span></code>
1070
1073
failed to find any of the outermost opening brackets
1071
1074
that were specified.</p>
1073
<li><a name="'Improperly-nested-codeblock-at-%25s'"></a><b><code class="inline">Improperly nested codeblock at <span class="i">%s</span></code>
1076
<li><a name="'Improperly-nested-codeblock-at-%25s'"></a><b><code class="inline"><span class="w">Improperly</span> <span class="w">nested</span> <span class="w">codeblock</span> <span class="w">at</span> <span class="i">%s</span></code>
1075
1078
<p>A nested code block was found that started with a delimiter that was specified
1076
1079
as being only to be used as an outermost bracket.</p>
1078
<li><a name="'Missing-second-block-for-quotelike-%22%25s%22'"></a><b><code class="inline">Missing second block for quotelike <span class="q">"%s"</span></code>
1081
<li><a name="'Missing-second-block-for-quotelike-%22%25s%22'"></a><b><code class="inline"><span class="w">Missing</span> <span class="w">second</span> <span class="w">block</span> for <span class="w">quotelike</span> <span class="q">"%s"</span></code>
1080
<p><code class="inline">extract_codeblock</code>
1081
or <code class="inline">extract_quotelike</code>
1083
<p><code class="inline"><span class="w">extract_codeblock</span></code>
1084
or <code class="inline"><span class="w">extract_quotelike</span></code>
1082
1085
found one of the
1083
1086
quotelike operators <code class="inline"><a class="l_k" href="../functions/s.html">s</a></code>, <code class="inline"><a class="l_k" href="../functions/tr.html">tr</a></code> or <code class="inline"><a class="l_k" href="../functions/y.html">y</a></code> followed by only one block.</p>
1085
<li><a name="'No-match-found-for-opening-bracket'"></a><b><code class="inline">No match found for opening bracket</code>
1088
<li><a name="'No-match-found-for-opening-bracket'"></a><b><code class="inline"><span class="w">No</span> <span class="w">match</span> <span class="w">found</span> for <span class="w">opening</span> <span class="w">bracket</span></code>
1087
<p><code class="inline">extract_codeblock</code>
1090
<p><code class="inline"><span class="w">extract_codeblock</span></code>
1088
1091
failed to find a closing bracket to match the outermost
1089
1092
opening bracket.</p>
1091
1094
<li><a name="'Did-not-find-opening-tag%3a-%2f%25s%2f'"></a><b><code class="inline">Did not find opening tag: /%s/</code></b>
1092
<p><code class="inline">extract_tagged</code>
1095
<p><code class="inline"><span class="w">extract_tagged</span></code>
1093
1096
did not find a suitable opening tag (after any specified
1094
1097
prefix was removed).</p>
1096
1099
<li><a name="'Unable-to-construct-closing-tag-to-match%3a-%2f%25s%2f'"></a><b><code class="inline">Unable to construct closing tag to match: /%s/</code></b>
1097
<p><code class="inline">extract_tagged</code>
1100
<p><code class="inline"><span class="w">extract_tagged</span></code>
1098
1101
matched the specified opening tag and tried to
1099
1102
modify the matched text to produce a matching closing tag (because
1100
1103
none was specified). It failed to generate the closing tag, almost