~ubuntu-branches/ubuntu/intrepid/perl-doc-html/intrepid

« back to all changes in this revision

Viewing changes to Text/Balanced.html

  • Committer: Bazaar Package Importer
  • Author(s): Roberto C. Sanchez
  • Date: 2008-05-17 20:14:19 UTC
  • mfrom: (1.1.1 upstream)
  • Revision ID: james.westby@ubuntu.com-20080517201419-qgbuogq2ckkdisyi
Tags: 5.10.0-2
Supersede botched upload of 5.10.0-1.

Show diffs side-by-side

added added

removed removed

Lines of Context:
54
54
      <h2>Links:</h2>
55
55
      <ul>
56
56
        <li><a href="http://search.cpan.org">CPAN</a></li>
 
57
        <li><a href="http://www.perl.org">Perl.org</a></li>
57
58
        <li><a href="http://www.perl.com">Perl.com</a></li>
58
 
        <li><a href="http://www.perl.org">Perl.org</a></li>
 
59
        <li><a href="http://perlbuzz.com">Perl Buzz</a></li>
 
60
        <li><a href="http://www.perlfoundation.org/perl5/index.cgi">Perl 5 Wiki</a></li>
 
61
        <li><a href="http://jobs.perl.org">Perl Jobs</a></li>
59
62
        <li><a href="http://www.pm.org">Perl Mongers</a></li>
60
63
        <li><a href="http://www.perlmonks.org">Perl Monks</a></li>
61
64
        <li><a href="http://planet.perl.org">Planet Perl</a></li>
65
68
      <ul>
66
69
        <li>Site maintained by<br><a href="http://perl.jonallen.info">Jon Allen</a>
67
70
            (<a href="http://perl.jonallen.info">JJ</a>)</li>
68
 
        <li class="spaced">Last updated on<br>23 April 2006</li>
 
71
        <li class="spaced">Last updated on<br>23 December 2007</li>
69
72
        <li class="spaced">See the <a href="http://perl.jonallen.info/projects/perldoc">project page</a> for
70
73
        more details</li>
71
74
      </ul>
76
79
    <div id="centerContent">
77
80
      <div id="contentHeader">
78
81
        <div id="contentHeaderLeft"><a href="#" onClick="showLeft()">Show navigation</a></div>
79
 
        <div id="contentHeaderCentre">-- Perl 5.8.8 documentation --</div>
 
82
        <div id="contentHeaderCentre">-- Perl 5.10.0 documentation --</div>
80
83
        <div id="contentHeaderRight"><a href="#" onClick="showRight()">Show toolbar</a></div>
81
84
      </div>
82
85
      <div id="breadCrumbs"><a href="../index.html">Home</a> &gt; <a href="../index-modules-A.html">Core modules</a> &gt; <a href="../index-modules-T.html">T</a> &gt; Text::Balanced</div>
83
86
      <script language="JavaScript">fromSearch();</script>
84
 
      <div id="contentBody"><div class="title_container"><div class="page_title">Text::Balanced</div></div><ul><li><a href="#NAME">NAME</a><li><a href="#SYNOPSIS">SYNOPSIS</a><li><a href="#DESCRIPTION">DESCRIPTION</a><ul><li><a href="#General-behaviour-in-list-contexts">General behaviour in list contexts</a><li><a href="#General-behaviour-in-scalar-and-void-contexts">General behaviour in scalar and void contexts</a><li><a href="#A-note-about-prefixes">A note about prefixes</a><li><a href="#'extract_delimited'"><code class="inline">extract_delimited</code>
85
 
</a><li><a href="#'extract_bracketed'"><code class="inline">extract_bracketed</code>
86
 
</a><li><a href="#'extract_variable'"><code class="inline">extract_variable</code>
87
 
</a><li><a href="#'extract_tagged'"><code class="inline">extract_tagged</code>
88
 
</a><li><a href="#'gen_extract_tagged'"><code class="inline">gen_extract_tagged</code>
89
 
</a><li><a href="#'extract_quotelike'"><code class="inline">extract_quotelike</code>
90
 
</a><li><a href="#'extract_quotelike'-and-%22here-documents%22"><code class="inline">extract_quotelike</code>
91
 
 and "here documents"</a><li><a href="#'extract_codeblock'"><code class="inline">extract_codeblock</code>
92
 
</a><li><a href="#'extract_multiple'"><code class="inline">extract_multiple</code>
93
 
</a><li><a href="#'gen_delimited_pat'"><code class="inline">gen_delimited_pat</code>
 
87
      <div id="contentBody"><div class="title_container"><div class="page_title">Text::Balanced</div></div><ul><li><a href="#NAME">NAME</a><li><a href="#SYNOPSIS">SYNOPSIS</a><li><a href="#DESCRIPTION">DESCRIPTION</a><ul><li><a href="#General-behaviour-in-list-contexts">General behaviour in list contexts</a><li><a href="#General-behaviour-in-scalar-and-void-contexts">General behaviour in scalar and void contexts</a><li><a href="#A-note-about-prefixes">A note about prefixes</a><li><a href="#'extract_delimited'"><code class="inline"><span class="w">extract_delimited</span></code>
 
88
</a><li><a href="#'extract_bracketed'"><code class="inline"><span class="w">extract_bracketed</span></code>
 
89
</a><li><a href="#'extract_variable'"><code class="inline"><span class="w">extract_variable</span></code>
 
90
</a><li><a href="#'extract_tagged'"><code class="inline"><span class="w">extract_tagged</span></code>
 
91
</a><li><a href="#'gen_extract_tagged'"><code class="inline"><span class="w">gen_extract_tagged</span></code>
 
92
</a><li><a href="#'extract_quotelike'"><code class="inline"><span class="w">extract_quotelike</span></code>
 
93
</a><li><a href="#'extract_quotelike'-and-%22here-documents%22"><code class="inline"><span class="w">extract_quotelike</span></code>
 
94
 and "here documents"</a><li><a href="#'extract_codeblock'"><code class="inline"><span class="w">extract_codeblock</span></code>
 
95
</a><li><a href="#'extract_multiple'"><code class="inline"><span class="w">extract_multiple</span></code>
 
96
</a><li><a href="#'gen_delimited_pat'"><code class="inline"><span class="w">gen_delimited_pat</span></code>
 
97
</a><li><a href="#'delimited_pat'"><code class="inline"><span class="w">delimited_pat</span></code>
94
98
</a></ul><li><a href="#DIAGNOSTICS">DIAGNOSTICS</a><li><a href="#AUTHOR">AUTHOR</a><li><a href="#BUGS-AND-IRRITATIONS">BUGS AND IRRITATIONS</a><li><a href="#COPYRIGHT">COPYRIGHT</a></ul><a name="NAME"></a><h1>NAME</h1>
95
99
<p>Text::Balanced - Extract delimited text sequences from strings.</p>
96
100
<a name="SYNOPSIS"></a><h1>SYNOPSIS</h1>
116
120
<pre class="verbatim"> <span class="c"># Extract the initial substring of $text that is bounded by</span>
117
121
 <span class="c"># a C&lt;BEGIN&gt;...C&lt;END&gt; pair. Don&#39;t allow nested C&lt;BEGIN&gt; tags</span></pre>
118
122
<pre class="verbatim">  <span class="s">(</span><span class="i">$extracted</span><span class="cm">,</span> <span class="i">$remainder</span><span class="s">)</span> =
119
 
                <span class="i">extract_tagged</span><span class="s">(</span><span class="i">$text</span><span class="cm">,</span><span class="q">&quot;BEGIN&quot;</span><span class="cm">,</span><span class="q">&quot;END&quot;</span><span class="cm">,</span><a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span><span class="s">{</span>bad<span class="cm">=&gt;</span><span class="s">[</span><span class="q">&quot;BEGIN&quot;</span><span class="s">]</span><span class="s">}</span><span class="s">)</span><span class="sc">;</span></pre>
 
123
                <span class="i">extract_tagged</span><span class="s">(</span><span class="i">$text</span><span class="cm">,</span><span class="q">&quot;BEGIN&quot;</span><span class="cm">,</span><span class="q">&quot;END&quot;</span><span class="cm">,</span><a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span><span class="s">{</span><span class="w">bad</span><span class="cm">=&gt;</span><span class="s">[</span><span class="q">&quot;BEGIN&quot;</span><span class="s">]</span><span class="s">}</span><span class="s">)</span><span class="sc">;</span></pre>
120
124
<pre class="verbatim"> <span class="c"># Extract the initial substring of $text that represents a</span>
121
125
 <span class="c"># Perl &quot;quote or quote-like operation&quot;</span></pre>
122
126
<pre class="verbatim">  <span class="s">(</span><span class="i">$extracted</span><span class="cm">,</span> <span class="i">$remainder</span><span class="s">)</span> = <span class="i">extract_quotelike</span><span class="s">(</span><span class="i">$text</span><span class="s">)</span><span class="sc">;</span></pre>
145
149
<pre class="verbatim">  <span class="i">$extract_head</span> = <span class="i">gen_extract_tagged</span><span class="s">(</span><span class="q">&#39;&lt;HEAD&gt;&#39;</span><span class="cm">,</span><span class="q">&#39;&lt;/HEAD&gt;&#39;</span><span class="s">)</span><span class="sc">;</span></pre>
146
150
<pre class="verbatim">  <span class="s">(</span><span class="i">$extracted</span><span class="cm">,</span> <span class="i">$remainder</span><span class="s">)</span> = <span class="i">$extract_head</span>-&gt;<span class="s">(</span><span class="i">$text</span><span class="s">)</span><span class="sc">;</span></pre>
147
151
<a name="DESCRIPTION"></a><h1>DESCRIPTION</h1>
148
 
<p>The various <code class="inline">extract_...</code>
 
152
<p>The various <code class="inline"><span class="w">extract_</span>...</code>
149
153
 subroutines may be used to
150
154
extract a delimited substring, possibly after skipping a
151
155
specified prefix string. By default, that prefix is
155
159
<p>The substring to be extracted must appear at the
156
160
current <code class="inline"><a class="l_k" href="../functions/pos.html">pos</a></code> location of the string's variable
157
161
(or at index zero, if no <code class="inline"><a class="l_k" href="../functions/pos.html">pos</a></code> position is defined).
158
 
In other words, the <code class="inline">extract_...</code>
 
162
In other words, the <code class="inline"><span class="w">extract_</span>...</code>
159
163
 subroutines <i>don't</i>
160
 
extract the first occurance of a substring anywhere
 
164
extract the first occurrence of a substring anywhere
161
165
in a string (like an unanchored regex would). Rather,
162
 
they extract an occurance of the substring appearing
 
166
they extract an occurrence of the substring appearing
163
167
immediately at the current matching position in the
164
 
string (like a <code class="inline">\G</code>
 
168
string (like a <code class="inline">\<span class="w">G</span></code>
165
169
-anchored regex would).</p>
166
170
<a name="General-behaviour-in-list-contexts"></a><h2>General behaviour in list contexts</h2>
167
171
<p>In a list context, all the subroutines return a list, the first three
169
173
<ul>
170
174
<li><a name="%5b0%5d"></a><b>[0]</b>
171
175
<p>The extracted string, including the specified delimiters.
172
 
If the extraction fails an empty string is returned.</p>
 
176
If the extraction fails <code class="inline"><a class="l_k" href="../functions/undef.html">undef</a></code> is returned.</p>
173
177
</li>
174
178
<li><a name="%5b1%5d"></a><b>[1]</b>
175
179
<p>The remainder of the input string (i.e. the characters after the
177
181
</li>
178
182
<li><a name="%5b2%5d"></a><b>[2]</b>
179
183
<p>The skipped prefix (i.e. the characters before the extracted string).
180
 
On failure, the empty string is returned.</p>
 
184
On failure, <code class="inline"><a class="l_k" href="../functions/undef.html">undef</a></code> is returned.</p>
181
185
</li>
182
186
</ul>
183
187
<p>Note that in a list context, the contents of the original input text (the first
184
 
argument) are not modified in any way. </p>
 
188
argument) are not modified in any way.</p>
185
189
<p>However, if the input text was passed in a variable, that variable's
186
190
<code class="inline"><a class="l_k" href="../functions/pos.html">pos</a></code> value is updated to point at the first character after the
187
191
extracted text. That means that in a list context the various
211
215
. normally doesn't match newlines.</p>
212
216
<p>To overcome this limitation, you need to turn on /s matching within
213
217
the prefix pattern, using the <code class="inline">(?s)</code> directive: '(?s).*?(?=&lt;H1&gt;)'</p>
214
 
<a name="'extract_delimited'"></a><h2><code class="inline">extract_delimited</code>
 
218
<a name="'extract_delimited'"></a><h2><code class="inline"><span class="w">extract_delimited</span></code>
215
219
</h2>
216
 
<p>The <code class="inline">extract_delimited</code>
 
220
<p>The <code class="inline"><span class="w">extract_delimited</span></code>
217
221
 function formalizes the common idiom
218
222
of extracting a single-character-delimited substring from the start of
219
223
a string. For example, to extract a single-quote delimited string, the
220
224
following code is typically used:</p>
221
225
<pre class="verbatim">  <span class="s">(</span><span class="i">$remainder</span> = <span class="i">$text</span><span class="s">)</span> =~ <span class="q">s/\A(&#39;(\\.|[^&#39;])*&#39;)//s</span><span class="sc">;</span>
222
226
        <span class="i">$extracted</span> = <span class="i">$1</span><span class="sc">;</span></pre>
223
 
<p>but with <code class="inline">extract_delimited</code>
 
227
<p>but with <code class="inline"><span class="w">extract_delimited</span></code>
224
228
 it can be simplified to:</p>
225
229
<pre class="verbatim">  <span class="s">(</span><span class="i">$extracted</span><span class="cm">,</span><span class="i">$remainder</span><span class="s">)</span> = <span class="i">extract_delimited</span><span class="s">(</span><span class="i">$text</span><span class="cm">,</span> <span class="q">&quot;&#39;&quot;</span><span class="s">)</span><span class="sc">;</span></pre>
226
 
<p><code class="inline">extract_delimited</code>
 
230
<p><code class="inline"><span class="w">extract_delimited</span></code>
227
231
 takes up to four scalars (the input text, the
228
232
delimiters, a prefix pattern to be skipped, and any escape characters)
229
233
and extracts the initial substring of the text that
244
248
 is used. If the text to be processed
245
249
is not specified either, <code class="inline"><span class="i">$_</span></code>
246
250
 is used.</p>
247
 
<p>In list context, <code class="inline">extract_delimited</code>
 
251
<p>In list context, <code class="inline"><span class="w">extract_delimited</span></code>
248
252
 returns a array of three
249
253
elements, the extracted substring (<i>including the surrounding
250
254
delimiters</i>), the remainder of the text, and the skipped prefix (if
265
269
<pre class="verbatim">  <span class="c"># Extract a single- or double- quoted substring from the</span>
266
270
        <span class="c"># beginning of $text, optionally after some whitespace</span>
267
271
        <span class="c"># (note the list context to protect $text from modification):</span></pre>
268
 
<pre class="verbatim">          <span class="s">(</span><span class="i">$substring</span><span class="s">)</span> = extract_delimited <span class="i">$text</span><span class="cm">,</span> <span class="q">q{&quot;&#39;}</span><span class="sc">;</span></pre>
 
272
<pre class="verbatim">          <span class="s">(</span><span class="i">$substring</span><span class="s">)</span> = <span class="w">extract_delimited</span> <span class="i">$text</span><span class="cm">,</span> <span class="q">q{&quot;&#39;}</span><span class="sc">;</span></pre>
269
273
<pre class="verbatim">  <span class="c"># Delete the substring delimited by the first &#39;/&#39; in $text:</span></pre>
270
274
<pre class="verbatim">          $text = join '', (extract_delimited($text,'/','[^/]*')[2,1];</pre><p>Note that this last example is <i>not</i> the same as deleting the first
271
275
quote-like pattern. For instance, if <code class="inline"><span class="i">$text</span></code>
272
276
 contained the string:</p>
273
 
<pre class="verbatim">  "if ('./cmd' =~ m/$UNIXCMD/s) { $cmd = $1; }"
274
 
        
275
 
then after the deletion it would contain:</pre><pre class="verbatim">   <span class="q">&quot;if (&#39;.$UNIXCMD/s) { $cmd = $1; }&quot;</span></pre>
 
277
<pre class="verbatim">  <span class="q">&quot;if (&#39;./cmd&#39; =~ m/$UNIXCMD/s) { $cmd = $1; }&quot;</span></pre>
 
278
<p>then after the deletion it would contain:</p>
 
279
<pre class="verbatim">  <span class="q">&quot;if (&#39;.$UNIXCMD/s) { $cmd = $1; }&quot;</span></pre>
276
280
<p>not:</p>
277
 
<pre class="verbatim">  <span class="q">&quot;if (&#39;./cmd&#39; =~ ms) { $cmd = $1; }&quot;</span>
278
 
        </pre>
 
281
<pre class="verbatim">  <span class="q">&quot;if (&#39;./cmd&#39; =~ ms) { $cmd = $1; }&quot;</span></pre>
279
282
<p>See <a href="#extract_quotelike">"extract_quotelike"</a> for a (partial) solution to this problem.</p>
280
 
<a name="'extract_bracketed'"></a><h2><code class="inline">extract_bracketed</code>
 
283
<a name="'extract_bracketed'"></a><h2><code class="inline"><span class="w">extract_bracketed</span></code>
281
284
</h2>
282
285
<p>Like <code class="inline"><span class="q">&quot;extract_delimited&quot;</span></code>
283
 
, the <code class="inline">extract_bracketed</code>
 
286
, the <code class="inline"><span class="w">extract_bracketed</span></code>
284
287
 function takes
285
288
up to three optional scalar arguments: a string to extract from, a delimiter
286
289
specifier, and a prefix pattern. As before, a missing prefix defaults to
288
291
. However, a missing
289
292
delimiter specifier defaults to <code class="inline"><span class="q">&#39;{}()[]&lt;&gt;&#39;</span></code>
290
293
 (see below).</p>
291
 
<p><code class="inline">extract_bracketed</code>
 
294
<p><code class="inline"><span class="w">extract_bracketed</span></code>
292
295
 extracts a balanced-bracket-delimited
293
296
substring (using any one (or more) of the user-specified delimiter
294
297
brackets: '(..)', '{..}', '[..]', or '&lt;..&gt;'). Optionally it will also
295
298
respect quoted unbalanced brackets (see below).</p>
296
299
<p>A "delimiter bracket" is a bracket in list of delimiters passed as
297
 
<code class="inline">extract_bracketed</code>
 
300
<code class="inline"><span class="w">extract_bracketed</span></code>
298
301
's second argument. Delimiter brackets are
299
302
specified by giving either the left or right (or both!) versions
300
303
of the required bracket(s). Note that the order in which
310
313
("non-delimiter") bracket in the substring is ignored.</p>
311
314
<p>For example, given the string:</p>
312
315
<pre class="verbatim">  <span class="i">$text</span> = <span class="q">&quot;{ an &#39;[irregularly :-(] {} parenthesized &gt;:-)&#39; string }&quot;</span><span class="sc">;</span></pre>
313
 
<p>then a call to <code class="inline">extract_bracketed</code>
 
316
<p>then a call to <code class="inline"><span class="w">extract_bracketed</span></code>
314
317
 in a list context:</p>
315
318
<pre class="verbatim">  <span class="i">@result</span> = <span class="i">extract_bracketed</span><span class="s">(</span> <span class="i">$text</span><span class="cm">,</span> <span class="q">&#39;{}&#39;</span> <span class="s">)</span><span class="sc">;</span></pre>
316
319
<p>would return:</p>
360
363
<p>See also: <code class="inline"><span class="q">&quot;extract_quotelike&quot;</span></code>
361
364
 and <code class="inline"><span class="q">&quot;extract_codeblock&quot;</span></code>
362
365
.</p>
363
 
<a name="'extract_variable'"></a><h2><code class="inline">extract_variable</code>
 
366
<a name="'extract_variable'"></a><h2><code class="inline"><span class="w">extract_variable</span></code>
364
367
</h2>
365
 
<p><code class="inline">extract_variable</code>
 
368
<p><code class="inline"><span class="w">extract_variable</span></code>
366
369
 extracts any valid Perl variable or
367
370
variable-involved expression, including scalars, arrays, hashes, array
368
 
accesses, hash look-ups, method calls through objects, subroutine calles
 
371
accesses, hash look-ups, method calls through objects, subroutine calls
369
372
through subroutine references, etc.</p>
370
373
<p>The subroutine takes up to two optional arguments:</p>
371
374
<ol>
392
395
</li>
393
396
</ul>
394
397
<p>On failure, all of these values (except the remaining text) are <code class="inline"><a class="l_k" href="../functions/undef.html">undef</a></code>.</p>
395
 
<p>In a scalar context, <code class="inline">extract_variable</code>
 
398
<p>In a scalar context, <code class="inline"><span class="w">extract_variable</span></code>
396
399
 returns just the complete
397
400
substring that matched a variablish expression. <code class="inline"><a class="l_k" href="../functions/undef.html">undef</a></code> is returned on
398
401
failure. In addition, the original input text has the returned substring
399
402
(and any prefix) removed from it.</p>
400
403
<p>In a void context, the input text just has the matched substring (and
401
404
any specified prefix) removed.</p>
402
 
<a name="'extract_tagged'"></a><h2><code class="inline">extract_tagged</code>
 
405
<a name="'extract_tagged'"></a><h2><code class="inline"><span class="w">extract_tagged</span></code>
403
406
</h2>
404
 
<p><code class="inline">extract_tagged</code>
 
407
<p><code class="inline"><span class="w">extract_tagged</span></code>
405
408
 extracts and segments text between (balanced)
406
 
specified tags. </p>
 
409
specified tags.</p>
407
410
<p>The subroutine takes up to five optional arguments:</p>
408
411
<ol>
409
412
<li>
437
440
</ol>
438
441
<p>The various options that can be specified are:</p>
439
442
<ul>
440
 
<li><a name="'reject-%3d%3e-%24listref'"></a><b><code class="inline">reject <span class="cm">=&gt;</span> <span class="i">$listref</span></code>
 
443
<li><a name="'reject-%3d%3e-%24listref'"></a><b><code class="inline"><span class="w">reject</span> <span class="cm">=&gt;</span> <span class="i">$listref</span></code>
441
444
</b>
442
445
<p>The list reference contains one or more strings specifying patterns
443
446
that must <i>not</i> appear within the tagged text.</p>
444
447
<p>For example, to extract
445
448
an HTML link (which should not contain nested links) use:</p>
446
 
<pre class="verbatim">        <span class="i">extract_tagged</span><span class="s">(</span><span class="i">$text</span><span class="cm">,</span> <span class="q">&#39;&lt;A&gt;&#39;</span><span class="cm">,</span> <span class="q">&#39;&lt;/A&gt;&#39;</span><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <span class="s">{</span>reject <span class="cm">=&gt;</span> <span class="s">[</span><span class="q">&#39;&lt;A&gt;&#39;</span><span class="s">]</span><span class="s">}</span> <span class="s">)</span><span class="sc">;</span></pre>
 
449
<pre class="verbatim">        <span class="i">extract_tagged</span><span class="s">(</span><span class="i">$text</span><span class="cm">,</span> <span class="q">&#39;&lt;A&gt;&#39;</span><span class="cm">,</span> <span class="q">&#39;&lt;/A&gt;&#39;</span><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <span class="s">{</span><span class="w">reject</span> <span class="cm">=&gt;</span> <span class="s">[</span><span class="q">&#39;&lt;A&gt;&#39;</span><span class="s">]</span><span class="s">}</span> <span class="s">)</span><span class="sc">;</span></pre>
447
450
</li>
448
 
<li><a name="'ignore-%3d%3e-%24listref'"></a><b><code class="inline">ignore <span class="cm">=&gt;</span> <span class="i">$listref</span></code>
 
451
<li><a name="'ignore-%3d%3e-%24listref'"></a><b><code class="inline"><span class="w">ignore</span> <span class="cm">=&gt;</span> <span class="i">$listref</span></code>
449
452
</b>
450
453
<p>The list reference contains one or more strings specifying patterns
451
454
that are <i>not</i> be be treated as nested tags within the tagged text
452
455
(even if they would match the start tag pattern).</p>
453
456
<p>For example, to extract an arbitrary XML tag, but ignore "empty" elements:</p>
454
 
<pre class="verbatim">        <span class="i">extract_tagged</span><span class="s">(</span><span class="i">$text</span><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <span class="s">{</span>ignore <span class="cm">=&gt;</span> <span class="s">[</span><span class="q">&#39;&lt;[^&gt;]*/&gt;&#39;</span><span class="s">]</span><span class="s">}</span> <span class="s">)</span><span class="sc">;</span></pre>
 
457
<pre class="verbatim">        <span class="i">extract_tagged</span><span class="s">(</span><span class="i">$text</span><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <span class="s">{</span><span class="w">ignore</span> <span class="cm">=&gt;</span> <span class="s">[</span><span class="q">&#39;&lt;[^&gt;]*/&gt;&#39;</span><span class="s">]</span><span class="s">}</span> <span class="s">)</span><span class="sc">;</span></pre>
455
458
<p>(also see <a href="#gen_delimited_pat">"gen_delimited_pat"</a> below).</p>
456
459
</li>
457
 
<li><a name="'fail-%3d%3e-%24str'"></a><b><code class="inline">fail <span class="cm">=&gt;</span> <span class="i">$str</span></code>
 
460
<li><a name="'fail-%3d%3e-%24str'"></a><b><code class="inline"><span class="w">fail</span> <span class="cm">=&gt;</span> <span class="i">$str</span></code>
458
461
</b>
459
 
<p>The <code class="inline">fail</code>
 
462
<p>The <code class="inline"><span class="w">fail</span></code>
460
463
 option indicates the action to be taken if a matching end
461
464
tag is not encountered (i.e. before the end of the string or some
462
 
<code class="inline">reject</code>
 
465
<code class="inline"><span class="w">reject</span></code>
463
466
 pattern matches). By default, a failure to match a closing
464
 
tag causes <code class="inline">extract_tagged</code>
 
467
tag causes <code class="inline"><span class="w">extract_tagged</span></code>
465
468
 to immediately fail.</p>
466
469
<p>However, if the string value associated with &lt;reject&gt; is "MAX", then
467
 
<code class="inline">extract_tagged</code>
 
470
<code class="inline"><span class="w">extract_tagged</span></code>
468
471
 returns the complete text up to the point of failure.
469
 
If the string is "PARA", <code class="inline">extract_tagged</code>
 
472
If the string is "PARA", <code class="inline"><span class="w">extract_tagged</span></code>
470
473
 returns only the first paragraph
471
474
after the tag (up to the first line that is either empty or contains
472
475
only whitespace characters).
482
485
<pre class="verbatim">        <span class="i">$text</span> = <span class="q">&quot;/para line 1\n\nline 3\n/para line 4&quot;</span><span class="sc">;</span></pre>
483
486
<pre class="verbatim">        extract_tagged($text, '/para', '/endpara', undef,
484
487
                        {reject =&gt; '/para', fail =&gt; MAX );</pre><pre class="verbatim">        <span class="c"># EXTRACTED: &quot;/para line 1\n&quot;</span></pre>
485
 
<p>Note that the specified <code class="inline">fail</code>
 
488
<p>Note that the specified <code class="inline"><span class="w">fail</span></code>
486
489
 behaviour applies to nested tags as well.</p>
487
490
</li>
488
491
</ul>
508
511
</li>
509
512
</ul>
510
513
<p>On failure, all of these values (except the remaining text) are <code class="inline"><a class="l_k" href="../functions/undef.html">undef</a></code>.</p>
511
 
<p>In a scalar context, <code class="inline">extract_tagged</code>
 
514
<p>In a scalar context, <code class="inline"><span class="w">extract_tagged</span></code>
512
515
 returns just the complete
513
516
substring that matched a tagged text (including the start and end
514
517
tags). <code class="inline"><a class="l_k" href="../functions/undef.html">undef</a></code> is returned on failure. In addition, the original input
515
518
text has the returned substring (and any prefix) removed from it.</p>
516
519
<p>In a void context, the input text just has the matched substring (and
517
520
any specified prefix) removed.</p>
518
 
<a name="'gen_extract_tagged'"></a><h2><code class="inline">gen_extract_tagged</code>
 
521
<a name="'gen_extract_tagged'"></a><h2><code class="inline"><span class="w">gen_extract_tagged</span></code>
519
522
</h2>
520
523
<p>(Note: This subroutine is only available under Perl5.005)</p>
521
 
<p><code class="inline">gen_extract_tagged</code>
 
524
<p><code class="inline"><span class="w">gen_extract_tagged</span></code>
522
525
 generates a new anonymous subroutine which
523
526
extracts text between (balanced) specified tags. In other words,
524
 
it generates a function identical in function to <code class="inline">extract_tagged</code>
 
527
it generates a function identical in function to <code class="inline"><span class="w">extract_tagged</span></code>
525
528
.</p>
526
 
<p>The difference between <code class="inline">extract_tagged</code>
 
529
<p>The difference between <code class="inline"><span class="w">extract_tagged</span></code>
527
530
 and the anonymous
528
531
subroutines generated by
529
 
<code class="inline">gen_extract_tagged</code>
 
532
<code class="inline"><span class="w">gen_extract_tagged</span></code>
530
533
, is that those generated subroutines:</p>
531
534
<ul>
532
535
<li>
533
536
<p>do not have to reparse tag specification or parsing options every time
534
 
they are called (whereas <code class="inline">extract_tagged</code>
 
537
they are called (whereas <code class="inline"><span class="w">extract_tagged</span></code>
535
538
 has to effectively rebuild
536
539
its tag parser on every call);</p>
537
540
</li>
538
541
<li>
539
542
<p>make use of the new qr// construct to pre-compile the regexes they use
540
 
(whereas <code class="inline">extract_tagged</code>
 
543
(whereas <code class="inline"><span class="w">extract_tagged</span></code>
541
544
 uses standard string variable interpolation 
542
545
to create tag-matching patterns).</p>
543
546
</li>
544
547
</ul>
545
548
<p>The subroutine takes up to four optional arguments (the same set as
546
 
<code class="inline">extract_tagged</code>
 
549
<code class="inline"><span class="w">extract_tagged</span></code>
547
550
 except for the string to be processed). It returns
548
551
a reference to a subroutine which in turn takes a single argument (the text to
549
552
be extracted from).</p>
550
 
<p>In other words, the implementation of <code class="inline">extract_tagged</code>
 
553
<p>In other words, the implementation of <code class="inline"><span class="w">extract_tagged</span></code>
551
554
 is exactly
552
555
equivalent to:</p>
553
556
<pre class="verbatim"><a name="extract_tagged"></a>        sub <span class="m">extract_tagged</span>
556
559
                <span class="i">$extractor</span> = <span class="i">gen_extract_tagged</span><span class="s">(</span><span class="i">@_</span><span class="s">)</span><span class="sc">;</span>
557
560
                <a class="l_k" href="../functions/return.html">return</a> <span class="i">$extractor</span>-&gt;<span class="s">(</span><span class="i">$text</span><span class="s">)</span><span class="sc">;</span>
558
561
        <span class="s">}</span></pre>
559
 
<p>(although <code class="inline">extract_tagged</code>
 
562
<p>(although <code class="inline"><span class="w">extract_tagged</span></code>
560
563
 is not currently implemented that way, in order
561
564
to preserve pre-5.005 compatibility).</p>
562
 
<p>Using <code class="inline">gen_extract_tagged</code>
 
565
<p>Using <code class="inline"><span class="w">gen_extract_tagged</span></code>
563
566
 to create extraction functions for specific tags 
564
567
is a good idea if those functions are going to be called more than once, since
565
568
their performance is typically twice as good as the more general-purpose
566
 
<code class="inline">extract_tagged</code>
 
569
<code class="inline"><span class="w">extract_tagged</span></code>
567
570
.</p>
568
 
<a name="'extract_quotelike'"></a><h2><code class="inline">extract_quotelike</code>
 
571
<a name="'extract_quotelike'"></a><h2><code class="inline"><span class="w">extract_quotelike</span></code>
569
572
</h2>
570
 
<p><code class="inline">extract_quotelike</code>
 
573
<p><code class="inline"><span class="w">extract_quotelike</span></code>
571
574
 attempts to recognize, extract, and segment any
572
575
one of the various Perl quotes and quotelike operators (see
573
576
<i>perlop(3)</i>) Nested backslashed delimiters, embedded balanced bracket
574
577
delimiters (for the quotelike operators), and trailing modifiers are
575
578
all caught. For example, in:</p>
576
 
<pre class="verbatim">        extract_quotelike 'q # an octothorpe: \# (not the end of the q!) #'
577
 
        
578
 
        extract_quotelike '  "You said, \"Use sed\"."  '</pre><pre class="verbatim">        extract_quotelike <span class="q">&#39; s{([A-Z]{1,8}\.[A-Z]{3})} /\L$1\E/; &#39;</span></pre>
579
 
<pre class="verbatim">        extract_quotelike <span class="q">&#39; tr/\\\/\\\\/\\\//ds; &#39;</span></pre>
 
579
<pre class="verbatim">        <span class="w">extract_quotelike</span> <span class="q">&#39;q # an octothorpe: \# (not the end of the q!) #&#39;</span></pre>
 
580
<pre class="verbatim">        <span class="w">extract_quotelike</span> <span class="q">&#39;  &quot;You said, \&quot;Use sed\&quot;.&quot;  &#39;</span></pre>
 
581
<pre class="verbatim">        <span class="w">extract_quotelike</span> <span class="q">&#39; s{([A-Z]{1,8}\.[A-Z]{3})} /\L$1\E/; &#39;</span></pre>
 
582
<pre class="verbatim">        <span class="w">extract_quotelike</span> <span class="q">&#39; tr/\\\/\\\\/\\\//ds; &#39;</span></pre>
580
583
<p>the full Perl quotelike operations are all extracted correctly.</p>
581
584
<p>Note too that, when using the /x modifier on a regex, any comment
582
585
containing the current pattern delimiter will cause the regex to be
591
594
                <span class="q">                (?i)            # CASE INSENSITIVE</span>
592
595
                <span class="q">                [a-z_]          # LEADING ALPHABETIC/&#39;</span></pre>
593
596
<p>This behaviour is identical to that of the actual compiler.</p>
594
 
<p><code class="inline">extract_quotelike</code>
 
597
<p><code class="inline"><span class="w">extract_quotelike</span></code>
595
598
 takes two arguments: the text to be processed and
596
599
a prefix to be matched at the very beginning of the text. If no prefix 
597
600
is specified, optional whitespace is the default. If no text is given,
642
645
<p>For each of the fields marked "(if any)" the default value on success is
643
646
an empty string.
644
647
On failure, all of these values (except the remaining text) are <code class="inline"><a class="l_k" href="../functions/undef.html">undef</a></code>.</p>
645
 
<p>In a scalar context, <code class="inline">extract_quotelike</code>
 
648
<p>In a scalar context, <code class="inline"><span class="w">extract_quotelike</span></code>
646
649
 returns just the complete substring
647
650
that matched a quotelike operation (or <code class="inline"><a class="l_k" href="../functions/undef.html">undef</a></code> on failure). In a scalar or
648
651
void context, the input text has the same substring (and any specified
652
655
<pre class="verbatim">                <span class="i">$quotelike</span> = <span class="i">extract_quotelike</span><span class="s">(</span><span class="i">$text</span><span class="cm">,</span><span class="q">&#39;.*?&#39;</span><span class="s">)</span><span class="sc">;</span></pre>
653
656
<pre class="verbatim">        <span class="c"># Replace one or more leading whitespace-separated quotelike</span>
654
657
        <span class="c"># literals in $_ with &quot;&lt;QLL&gt;&quot;</span></pre>
655
 
<pre class="verbatim">                <a class="l_k" href="../functions/do.html">do</a> <span class="s">{</span> <span class="i">$_</span> = <a class="l_k" href="../functions/join.html">join</a> <span class="q">&#39;&lt;QLL&gt;&#39;</span><span class="cm">,</span> <span class="s">(</span>extract_quotelike<span class="s">)</span>[<span class="n">2</span><span class="cm">,</span><span class="n">1</span>] <span class="s">}</span> until <span class="i">$@</span><span class="sc">;</span></pre>
 
658
<pre class="verbatim">                <a class="l_k" href="../functions/do.html">do</a> <span class="s">{</span> <span class="i">$_</span> = <a class="l_k" href="../functions/join.html">join</a> <span class="q">&#39;&lt;QLL&gt;&#39;</span><span class="cm">,</span> <span class="s">(</span><span class="w">extract_quotelike</span><span class="s">)</span>[<span class="n">2</span><span class="cm">,</span><span class="n">1</span>] <span class="s">}</span> until <span class="i">$@</span><span class="sc">;</span></pre>
656
659
<pre class="verbatim">        <span class="c"># Isolate the search pattern in a quotelike operation from $text</span></pre>
657
 
<pre class="verbatim">                <span class="s">(</span><span class="i">$op</span><span class="cm">,</span><span class="i">$pat</span><span class="s">)</span> = <span class="s">(</span>extract_quotelike <span class="i">$text</span><span class="s">)</span>[<span class="n">3</span><span class="cm">,</span><span class="n">5</span>]<span class="sc">;</span>
 
660
<pre class="verbatim">                <span class="s">(</span><span class="i">$op</span><span class="cm">,</span><span class="i">$pat</span><span class="s">)</span> = <span class="s">(</span><span class="w">extract_quotelike</span> <span class="i">$text</span><span class="s">)</span>[<span class="n">3</span><span class="cm">,</span><span class="n">5</span>]<span class="sc">;</span>
658
661
                if <span class="s">(</span><span class="i">$op</span> =~ <span class="q">/[ms]/</span><span class="s">)</span>
659
662
                <span class="s">{</span>
660
663
                        <a class="l_k" href="../functions/print.html">print</a> <span class="q">&quot;search pattern: $pat\n&quot;</span><span class="sc">;</span>
663
666
                <span class="s">{</span>
664
667
                        <a class="l_k" href="../functions/print.html">print</a> <span class="q">&quot;$op is not a pattern matching operation\n&quot;</span><span class="sc">;</span>
665
668
                <span class="s">}</span></pre>
666
 
<a name="'extract_quotelike'-and-%22here-documents%22"></a><h2><code class="inline">extract_quotelike</code>
 
669
<a name="'extract_quotelike'-and-%22here-documents%22"></a><h2><code class="inline"><span class="w">extract_quotelike</span></code>
667
670
 and "here documents"</h2>
668
 
<p><code class="inline">extract_quotelike</code>
 
671
<p><code class="inline"><span class="w">extract_quotelike</span></code>
669
672
 can successfully extract "here documents" from an input
670
673
string, but with an important caveat in list contexts.</p>
671
674
<p>Unlike other types of quote-like literals, a here document is rarely
674
677
<pre class="verbatim">        &lt;&lt;'EOMSG' || die;
675
678
        This is the message.
676
679
        EOMSG
677
 
        exit;</pre><p>Given this as an input string in a scalar context, <code class="inline">extract_quotelike</code>
 
680
        exit;</pre><p>Given this as an input string in a scalar context, <code class="inline"><span class="w">extract_quotelike</span></code>
678
681
 
679
682
would correctly return the string "&lt;&lt;'EOMSG'\nThis is the message.\nEOMSG",
680
683
leaving the string " || die;\nexit;" in the original variable. In other words,
681
684
the two separate pieces of the here document are successfully extracted and
682
685
concatenated.</p>
683
 
<p>In a list context, <code class="inline">extract_quotelike</code>
 
686
<p>In a list context, <code class="inline"><span class="w">extract_quotelike</span></code>
684
687
 would return the list</p>
685
688
<ul>
686
689
<li><a name="%5b0%5d"></a><b>[0]</b>
715
718
which would cause the earlier " || die;\nexit;" to be skipped in any
716
719
sequence of code fragment extractions.</p>
717
720
<p>To avoid this problem, when it encounters a here document whilst
718
 
extracting from a modifiable string, <code class="inline">extract_quotelike</code>
 
721
extracting from a modifiable string, <code class="inline"><span class="w">extract_quotelike</span></code>
719
722
 silently
720
723
rearranges the string to an equivalent piece of Perl:</p>
721
724
<pre class="verbatim">        &lt;&lt;'EOMSG'
726
729
matching position after the here document, but now the rest of the line
727
730
on which the here document starts is not skipped.</p>
728
731
<p>To prevent &lt;extract_quotelike&gt; from mucking about with the input in this way
729
 
(this is the only case where a list-context <code class="inline">extract_quotelike</code>
 
732
(this is the only case where a list-context <code class="inline"><span class="w">extract_quotelike</span></code>
730
733
 does so),
731
734
you can pass the input variable as an interpolated literal:</p>
732
735
<pre class="verbatim">        <span class="i">$quotelike</span> = <span class="i">extract_quotelike</span><span class="s">(</span><span class="q">&quot;$var&quot;</span><span class="s">)</span><span class="sc">;</span></pre>
733
 
<a name="'extract_codeblock'"></a><h2><code class="inline">extract_codeblock</code>
 
736
<a name="'extract_codeblock'"></a><h2><code class="inline"><span class="w">extract_codeblock</span></code>
734
737
</h2>
735
 
<p><code class="inline">extract_codeblock</code>
 
738
<p><code class="inline"><span class="w">extract_codeblock</span></code>
736
739
 attempts to recognize and extract a balanced
737
740
bracket delimited substring that may contain unbalanced brackets
738
 
inside Perl quotes or quotelike operations. That is, <code class="inline">extract_codeblock</code>
 
741
inside Perl quotes or quotelike operations. That is, <code class="inline"><span class="w">extract_codeblock</span></code>
739
742
 
740
743
is like a combination of <code class="inline"><span class="q">&quot;extract_bracketed&quot;</span></code>
741
744
 and
742
745
<code class="inline"><span class="q">&quot;extract_quotelike&quot;</span></code>
743
746
.</p>
744
 
<p><code class="inline">extract_codeblock</code>
745
 
 takes the same initial three parameters as <code class="inline">extract_bracketed</code>
 
747
<p><code class="inline"><span class="w">extract_codeblock</span></code>
 
748
 takes the same initial three parameters as <code class="inline"><span class="w">extract_bracketed</span></code>
746
749
:
747
750
a text to process, a set of delimiter brackets to look for, and a prefix to
748
751
match first. It also takes an optional fourth parameter, which allows the
765
768
</li>
766
769
<li>
767
770
<p>Try to match a quote or quotelike operator. If found, call
768
 
<code class="inline">extract_quotelike</code>
769
 
 to eat it. If <code class="inline">extract_quotelike</code>
 
771
<code class="inline"><span class="w">extract_quotelike</span></code>
 
772
 to eat it. If <code class="inline"><span class="w">extract_quotelike</span></code>
770
773
 fails, return
771
774
the error it returned. Otherwise go back to step 1.</p>
772
775
</li>
773
776
<li>
774
777
<p>Try to match an opening delimiter bracket. If found, call
775
 
<code class="inline">extract_codeblock</code>
 
778
<code class="inline"><span class="w">extract_codeblock</span></code>
776
779
 recursively to eat the embedded block. If the
777
780
recursive call fails, return an error. Otherwise, go back to step 1.</p>
778
781
</li>
789
792
                <span class="s">}</span></pre>
790
793
<pre class="verbatim">        <span class="c"># Remove the first round-bracketed list (which may include</span>
791
794
        <span class="c"># round- or curly-bracketed code blocks or quotelike operators)</span></pre>
792
 
<pre class="verbatim">                extract_codeblock <span class="i">$text</span><span class="cm">,</span> <span class="q">&quot;(){}&quot;</span><span class="cm">,</span> <span class="q">&#39;[^(]*&#39;</span><span class="sc">;</span></pre>
 
795
<pre class="verbatim">                <span class="w">extract_codeblock</span> <span class="i">$text</span><span class="cm">,</span> <span class="q">&quot;(){}&quot;</span><span class="cm">,</span> <span class="q">&#39;[^(]*&#39;</span><span class="sc">;</span></pre>
793
796
<p>The ability to specify a different outermost delimiter bracket is useful
794
797
in some circumstances. For example, in the Parse::RecDescent module,
795
798
parser actions which are to be performed only on a successful parse
805
808
<pre class="verbatim">                        <span class="q">&lt;defer: {if ($count&gt;</span></pre>
806
809
<p>because the "less than" operator is interpreted as a closing delimiter.</p>
807
810
<p>But, by extracting the directive using
808
 
<code&nbsp;class="inline">extract_codeblock($text,&nbsp;'{}',&nbsp;undef,&nbsp;'&lt;&gt;')</code>
 
811
<code&nbsp;class="inline"><span&nbsp;class="i">extract_codeblock</span><span&nbsp;class="s">(</span><span&nbsp;class="i">$text</span><span&nbsp;class="cm">,</span>&nbsp;<span&nbsp;class="q">&#39;{}&#39;</span><span&nbsp;class="cm">,</span>&nbsp;<a&nbsp;class="l_k"&nbsp;href="../functions/undef.html">undef</a><span&nbsp;class="cm">,</span>&nbsp;<span&nbsp;class="q">&#39;&lt;&gt;&#39;</span><span&nbsp;class="s">)</span></code>&nbsp;
809
812
the '&gt;' character is only treated as a delimited at the outermost
810
813
level of the code block, so the directive is parsed correctly.</p>
811
 
<a name="'extract_multiple'"></a><h2><code class="inline">extract_multiple</code>
 
814
<a name="'extract_multiple'"></a><h2><code class="inline"><span class="w">extract_multiple</span></code>
812
815
</h2>
813
 
<p>The <code class="inline">extract_multiple</code>
 
816
<p>The <code class="inline"><span class="w">extract_multiple</span></code>
814
817
 subroutine takes a string to be processed and a 
815
818
list of extractors (subroutines or regular expressions) to apply to that string.</p>
816
 
<p>In an array context <code class="inline">extract_multiple</code>
 
819
<p>In an array context <code class="inline"><span class="w">extract_multiple</span></code>
817
820
 returns an array of substrings
818
821
of the original string, as extracted by the specified extractors.
819
 
In a scalar context, <code class="inline">extract_multiple</code>
 
822
In a scalar context, <code class="inline"><span class="w">extract_multiple</span></code>
820
823
 returns the first
821
824
substring successfully extracted from the original string. In both
822
825
scalar and void contexts the original string has the first successfully
823
826
extracted substring removed from it. In all contexts
824
 
<code class="inline">extract_multiple</code>
 
827
<code class="inline"><span class="w">extract_multiple</span></code>
825
828
 starts at the current <code class="inline"><a class="l_k" href="../functions/pos.html">pos</a></code> of the string, and
826
829
sets that <code class="inline"><a class="l_k" href="../functions/pos.html">pos</a></code> appropriately after it matches.</p>
827
 
<p>Hence, the aim of of a call to <code class="inline">extract_multiple</code>
 
830
<p>Hence, the aim of of a call to <code class="inline"><span class="w">extract_multiple</span></code>
828
831
 in a list context
829
832
is to split the processed string into as many non-overlapping fields as
830
833
possible, by repeatedly applying each of the specified extractors
831
 
to the remainder of the string. Thus <code class="inline">extract_multiple</code>
 
834
to the remainder of the string. Thus <code class="inline"><span class="w">extract_multiple</span></code>
832
835
 is
833
836
a generalized form of Perl's <code class="inline"><a class="l_k" href="../functions/split.html">split</a></code> subroutine.</p>
834
837
<p>The subroutine takes up to four optional arguments:</p>
875
878
match), and a string representing any prefix skipped before the
876
879
extraction (like $` in a pattern match). Note that this is designed
877
880
to facilitate the use of other Text::Balanced subroutines with
878
 
<code class="inline">extract_multiple</code>
 
881
<code class="inline"><span class="w">extract_multiple</span></code>
879
882
. Note too that the value returned by an extractor
880
883
subroutine need not bear any relationship to the corresponding substring
881
884
of the original text (see examples below).</p>
892
895
<p>If an extractor returns a defined value, that value is immediately
893
896
treated as the next extracted field and pushed onto the list of fields.
894
897
If the extractor was specified in a hash reference, the field is also
895
 
blessed into the appropriate class, </p>
 
898
blessed into the appropriate class,</p>
896
899
<p>If the extractor fails to match (in the case of a regex extractor), or returns an empty list or an undefined value (in the case of a subroutine extractor), it is
897
900
assumed to have failed to extract.
898
901
If none of the extractor subroutines succeeds, then one
899
902
character is extracted from the start of the text and the extraction
900
903
subroutines reapplied. Characters which are thus removed are accumulated and
901
904
eventually become the next field (unless the fourth argument is true, in which
902
 
case they are disgarded).</p>
 
905
case they are discarded).</p>
903
906
<p>For example, the following extracts substrings that are valid Perl variables:</p>
904
907
<pre class="verbatim">        <span class="i">@fields</span> = <span class="i">extract_multiple</span><span class="s">(</span><span class="i">$text</span><span class="cm">,</span>
905
908
                                   <span class="s">[</span> <a class="l_k" href="../functions/sub.html">sub</a> <span class="s">{</span> <span class="i">extract_variable</span><span class="s">(</span><span class="i">$_</span>[<span class="n">0</span>]<span class="s">)</span> <span class="s">}</span> <span class="s">]</span><span class="cm">,</span>
909
912
parts are also blessed to identify them (the "anything else" is unblessed):</p>
910
913
<pre class="verbatim">        <span class="i">@fields</span> = <span class="i">extract_multiple</span><span class="s">(</span><span class="i">$text</span><span class="cm">,</span>
911
914
                   <span class="s">[</span>
912
 
                        <span class="s">{</span> Delim <span class="cm">=&gt;</span> <a class="l_k" href="../functions/sub.html">sub</a> <span class="s">{</span> <span class="i">extract_delimited</span><span class="s">(</span><span class="i">$_</span>[<span class="n">0</span>]<span class="cm">,</span><span class="q">q{&#39;&quot;}</span><span class="s">)</span> <span class="s">}</span> <span class="s">}</span><span class="cm">,</span>
913
 
                        <span class="s">{</span> Brack <span class="cm">=&gt;</span> <a class="l_k" href="../functions/sub.html">sub</a> <span class="s">{</span> <span class="i">extract_bracketed</span><span class="s">(</span><span class="i">$_</span>[<span class="n">0</span>]<span class="cm">,</span><span class="q">&#39;{}&#39;</span><span class="s">)</span> <span class="s">}</span> <span class="s">}</span><span class="cm">,</span>
 
915
                        <span class="s">{</span> <span class="w">Delim</span> <span class="cm">=&gt;</span> <a class="l_k" href="../functions/sub.html">sub</a> <span class="s">{</span> <span class="i">extract_delimited</span><span class="s">(</span><span class="i">$_</span>[<span class="n">0</span>]<span class="cm">,</span><span class="q">q{&#39;&quot;}</span><span class="s">)</span> <span class="s">}</span> <span class="s">}</span><span class="cm">,</span>
 
916
                        <span class="s">{</span> <span class="w">Brack</span> <span class="cm">=&gt;</span> <a class="l_k" href="../functions/sub.html">sub</a> <span class="s">{</span> <span class="i">extract_bracketed</span><span class="s">(</span><span class="i">$_</span>[<span class="n">0</span>]<span class="cm">,</span><span class="q">&#39;{}&#39;</span><span class="s">)</span> <span class="s">}</span> <span class="s">}</span><span class="cm">,</span>
914
917
                   <span class="s">]</span><span class="s">)</span><span class="sc">;</span></pre>
915
918
<p>This call extracts the next single substring that is a valid Perl quotelike
916
919
operator (and removes it from $text):</p>
934
937
<p>If you wanted the commas preserved as separate fields (i.e. like split
935
938
does if your split pattern has capturing parentheses), you would
936
939
just make the last parameter undefined (or remove it).</p>
937
 
<a name="'gen_delimited_pat'"></a><h2><code class="inline">gen_delimited_pat</code>
 
940
<a name="'gen_delimited_pat'"></a><h2><code class="inline"><span class="w">gen_delimited_pat</span></code>
938
941
</h2>
939
 
<p>The <code class="inline">gen_delimited_pat</code>
 
942
<p>The <code class="inline"><span class="w">gen_delimited_pat</span></code>
940
943
 subroutine takes a single (string) argument and
941
944
   &gt; builds a Friedl-style optimized regex that matches a string delimited
942
945
by any one of the characters in the single argument. For example:</p>
943
946
<pre class="verbatim">        <span class="i">gen_delimited_pat</span><span class="s">(</span><span class="q">q{&#39;&quot;}</span><span class="s">)</span></pre>
944
947
<p>returns the regex:</p>
945
948
<pre class="verbatim">        (?:\"(?:\\\"|(?!\").)*\"|\'(?:\\\'|(?!\').)*\')</pre><p>Note that the specified delimiters are automatically quotemeta'd.</p>
946
 
<p>A typical use of <code class="inline">gen_delimited_pat</code>
 
949
<p>A typical use of <code class="inline"><span class="w">gen_delimited_pat</span></code>
947
950
 would be to build special purpose tags
948
 
for <code class="inline">extract_tagged</code>
 
951
for <code class="inline"><span class="w">extract_tagged</span></code>
949
952
. For example, to properly ignore "empty" XML elements
950
953
(which might contain quoted strings):</p>
951
954
<pre class="verbatim">        <a class="l_k" href="../functions/my.html">my</a> <span class="i">$empty_tag</span> = <span class="q">&#39;&lt;(&#39;</span> . <span class="i">gen_delimited_pat</span><span class="s">(</span><span class="q">q{&#39;&quot;}</span><span class="s">)</span> . <span class="q">&#39;|.)+/&gt;&#39;</span><span class="sc">;</span></pre>
952
 
<pre class="verbatim">        <span class="i">extract_tagged</span><span class="s">(</span><span class="i">$text</span><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <span class="s">{</span>ignore <span class="cm">=&gt;</span> <span class="s">[</span><span class="i">$empty_tag</span><span class="s">]</span><span class="s">}</span> <span class="s">)</span><span class="sc">;</span></pre>
953
 
<p><code class="inline">gen_delimited_pat</code>
 
955
<pre class="verbatim">        <span class="i">extract_tagged</span><span class="s">(</span><span class="i">$text</span><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span> <span class="s">{</span><span class="w">ignore</span> <span class="cm">=&gt;</span> <span class="s">[</span><span class="i">$empty_tag</span><span class="s">]</span><span class="s">}</span> <span class="s">)</span><span class="sc">;</span></pre>
 
956
<p><code class="inline"><span class="w">gen_delimited_pat</span></code>
954
957
 may also be called with an optional second argument,
955
958
which specifies the "escape" character(s) to be used for each delimiter.
956
959
For example to match a Pascal-style string (where ' is the delimiter
963
966
<p>If more delimiters than escape chars are specified, the last escape char
964
967
is used for the remaining delimiters.
965
968
If no escape char is specified for a given specified delimiter, '\' is used.</p>
966
 
<p>Note that 
967
 
<code class="inline">gen_delimited_pat</code>
968
 
 was previously called
969
 
<code class="inline">delimited_pat</code>
970
 
. That name may still be used, but is now deprecated.
971
 
        </p>
 
969
<a name="'delimited_pat'"></a><h2><code class="inline"><span class="w">delimited_pat</span></code>
 
970
</h2>
 
971
<p>Note that <code class="inline"><span class="w">gen_delimited_pat</span></code>
 
972
 was previously called <code class="inline"><span class="w">delimited_pat</span></code>
 
973
.
 
974
That name may still be used, but is now deprecated.</p>
972
975
<a name="DIAGNOSTICS"></a><h1>DIAGNOSTICS</h1>
973
976
<p>In a list context, all the functions return <code class="inline"><span class="s">(</span><a class="l_k" href="../functions/undef.html">undef</a><span class="cm">,</span><span class="i">$original_text</span><span class="s">)</span></code>
974
977
 
976
979
(in this case the input text is not modified in any way).</p>
977
980
<p>In addition, on failure in <i>any</i> context, the <code class="inline"><span class="i">$@</span></code>
978
981
 variable is set.
979
 
Accessing <code class="inline"><span class="i">$@</span>-&gt;{error}</code>
 
982
Accessing <code class="inline"><span class="i">$@</span>-&gt;{<span class="w">error</span>}</code>
980
983
 returns one of the error diagnostics listed
981
984
below.
982
 
Accessing <code class="inline"><span class="i">$@</span>-&gt;{pos}</code>
 
985
Accessing <code class="inline"><span class="i">$@</span>-&gt;{<span class="w">pos</span>}</code>
983
986
 returns the offset into the original string at
984
987
which the error was detected (although not necessarily where it occurred!)
985
988
Printing <code class="inline"><span class="i">$@</span></code>
989
992
<p>The available diagnostics are:</p>
990
993
<ul>
991
994
<li><a name="'Did-not-find-a-suitable-bracket%3a-%22%25s%22'"></a><b><code class="inline">Did not find a suitable bracket: "%s"</code></b>
992
 
<p>The delimiter provided to <code class="inline">extract_bracketed</code>
 
995
<p>The delimiter provided to <code class="inline"><span class="w">extract_bracketed</span></code>
993
996
 was not one of
994
997
<code class="inline"><span class="q">&#39;()[]&lt;&gt;{}&#39;</span></code>
995
998
.</p>
998
1001
<p>A non-optional prefix was specified but wasn't found at the start of the text.</p>
999
1002
</li>
1000
1003
<li><a name="'Did-not-find-opening-bracket-after-prefix%3a-%22%25s%22'"></a><b><code class="inline">Did not find opening bracket after prefix: "%s"</code></b>
1001
 
<p><code class="inline">extract_bracketed</code>
1002
 
 or <code class="inline">extract_codeblock</code>
 
1004
<p><code class="inline"><span class="w">extract_bracketed</span></code>
 
1005
 or <code class="inline"><span class="w">extract_codeblock</span></code>
1003
1006
 was expecting a
1004
1007
particular kind of bracket at the start of the text, and didn't find it.</p>
1005
1008
</li>
1006
1009
<li><a name="'No-quotelike-operator-found-after-prefix%3a-%22%25s%22'"></a><b><code class="inline">No quotelike operator found after prefix: "%s"</code></b>
1007
 
<p><code class="inline">extract_quotelike</code>
 
1010
<p><code class="inline"><span class="w">extract_quotelike</span></code>
1008
1011
 didn't find one of the quotelike operators <code class="inline"><a class="l_k" href="../functions/q.html">q</a></code>,
1009
1012
<code class="inline"><a class="l_k" href="../functions/qq.html">qq</a></code>, <code class="inline"><a class="l_k" href="../functions/qw.html">qw</a></code>, <code class="inline"><a class="l_k" href="../functions/qx.html">qx</a></code>, <code class="inline"><a class="l_k" href="../functions/s.html">s</a></code>, <code class="inline"><a class="l_k" href="../functions/tr.html">tr</a></code> or <code class="inline"><a class="l_k" href="../functions/y.html">y</a></code> at the start of the substring
1010
1013
it was extracting.</p>
1011
1014
</li>
1012
1015
<li><a name="'Unmatched-closing-bracket%3a-%22%25c%22'"></a><b><code class="inline">Unmatched closing bracket: "%c"</code></b>
1013
 
<p><code class="inline">extract_bracketed</code>
1014
 
, <code class="inline">extract_quotelike</code>
1015
 
 or <code class="inline">extract_codeblock</code>
 
1016
<p><code class="inline"><span class="w">extract_bracketed</span></code>
 
1017
, <code class="inline"><span class="w">extract_quotelike</span></code>
 
1018
 or <code class="inline"><span class="w">extract_codeblock</span></code>
1016
1019
 encountered
1017
1020
a closing bracket where none was expected.</p>
1018
1021
</li>
1019
1022
<li><a name="'Unmatched-opening-bracket(s)%3a-%22%25s%22'"></a><b><code class="inline">Unmatched opening bracket(s): "%s"</code></b>
1020
 
<p><code class="inline">extract_bracketed</code>
1021
 
, <code class="inline">extract_quotelike</code>
1022
 
 or <code class="inline">extract_codeblock</code>
 
1023
<p><code class="inline"><span class="w">extract_bracketed</span></code>
 
1024
, <code class="inline"><span class="w">extract_quotelike</span></code>
 
1025
 or <code class="inline"><span class="w">extract_codeblock</span></code>
1023
1026
 ran 
1024
1027
out of characters in the text before closing one or more levels of nested
1025
1028
brackets.</p>
1026
1029
</li>
1027
 
<li><a name="'Unmatched-embedded-quote-(%25s)'"></a><b><code class="inline">Unmatched embedded quote <span class="s">(</span><span class="i">%s</span><span class="s">)</span></code>
 
1030
<li><a name="'Unmatched-embedded-quote-(%25s)'"></a><b><code class="inline"><span class="w">Unmatched</span> <span class="w">embedded</span> <span class="w">quote</span> <span class="s">(</span><span class="i">%s</span><span class="s">)</span></code>
1028
1031
</b>
1029
 
<p><code class="inline">extract_bracketed</code>
 
1032
<p><code class="inline"><span class="w">extract_bracketed</span></code>
1030
1033
 attempted to match an embedded quoted substring, but
1031
1034
failed to find a closing quote to match it.</p>
1032
1035
</li>
1033
 
<li><a name="'Did-not-find-closing-delimiter-to-match-'%25s''"></a><b><code class="inline">Did not find closing delimiter to match <span class="q">&#39;%s&#39;</span></code>
 
1036
<li><a name="'Did-not-find-closing-delimiter-to-match-'%25s''"></a><b><code class="inline"><span class="w">Did</span> not <span class="w">find</span> <span class="w">closing</span> <span class="w">delimiter</span> <span class="w">to</span> <span class="w">match</span> <span class="q">&#39;%s&#39;</span></code>
1034
1037
</b>
1035
 
<p><code class="inline">extract_quotelike</code>
 
1038
<p><code class="inline"><span class="w">extract_quotelike</span></code>
1036
1039
 was unable to find a closing delimiter to match the
1037
1040
one that opened the quote-like operation.</p>
1038
1041
</li>
1039
1042
<li><a name="'Mismatched-closing-bracket%3a-expected-%22%25c%22-but-found-%22%25s%22'"></a><b><code class="inline">Mismatched closing bracket: expected "%c" but found "%s"</code></b>
1040
 
<p><code class="inline">extract_bracketed</code>
1041
 
, <code class="inline">extract_quotelike</code>
1042
 
 or <code class="inline">extract_codeblock</code>
 
1043
<p><code class="inline"><span class="w">extract_bracketed</span></code>
 
1044
, <code class="inline"><span class="w">extract_quotelike</span></code>
 
1045
 or <code class="inline"><span class="w">extract_codeblock</span></code>
1043
1046
 found
1044
1047
a valid bracket delimiter, but it was the wrong species. This usually
1045
1048
indicates a nesting error, but may indicate incorrect quoting or escaping.</p>
1046
1049
</li>
1047
 
<li><a name="'No-block-delimiter-found-after-quotelike-%22%25s%22'"></a><b><code class="inline">No block delimiter found after quotelike <span class="q">&quot;%s&quot;</span></code>
 
1050
<li><a name="'No-block-delimiter-found-after-quotelike-%22%25s%22'"></a><b><code class="inline"><span class="w">No</span> <span class="w">block</span> <span class="w">delimiter</span> <span class="w">found</span> <span class="w">after</span> <span class="w">quotelike</span> <span class="q">&quot;%s&quot;</span></code>
1048
1051
</b>
1049
 
<p><code class="inline">extract_quotelike</code>
1050
 
 or <code class="inline">extract_codeblock</code>
 
1052
<p><code class="inline"><span class="w">extract_quotelike</span></code>
 
1053
 or <code class="inline"><span class="w">extract_codeblock</span></code>
1051
1054
 found one of the
1052
1055
quotelike operators <code class="inline"><a class="l_k" href="../functions/q.html">q</a></code>, <code class="inline"><a class="l_k" href="../functions/qq.html">qq</a></code>, <code class="inline"><a class="l_k" href="../functions/qw.html">qw</a></code>, <code class="inline"><a class="l_k" href="../functions/qx.html">qx</a></code>, <code class="inline"><a class="l_k" href="../functions/s.html">s</a></code>, <code class="inline"><a class="l_k" href="../functions/tr.html">tr</a></code> or <code class="inline"><a class="l_k" href="../functions/y.html">y</a></code>
1053
1056
without a suitable block after it.</p>
1054
1057
</li>
1055
 
<li><a name="'Did-not-find-leading-dereferencer'"></a><b><code class="inline">Did not find leading dereferencer</code>
 
1058
<li><a name="'Did-not-find-leading-dereferencer'"></a><b><code class="inline"><span class="w">Did</span> not <span class="w">find</span> <span class="w">leading</span> <span class="w">dereferencer</span></code>
1056
1059
</b>
1057
 
<p><code class="inline">extract_variable</code>
 
1060
<p><code class="inline"><span class="w">extract_variable</span></code>
1058
1061
 was expecting one of '$', '@', or '%' at the start of
1059
1062
a variable, but didn't find any of them.</p>
1060
1063
</li>
1061
 
<li><a name="'Bad-identifier-after-dereferencer'"></a><b><code class="inline">Bad identifier after dereferencer</code>
 
1064
<li><a name="'Bad-identifier-after-dereferencer'"></a><b><code class="inline"><span class="w">Bad</span> <span class="w">identifier</span> <span class="w">after</span> <span class="w">dereferencer</span></code>
1062
1065
</b>
1063
 
<p><code class="inline">extract_variable</code>
 
1066
<p><code class="inline"><span class="w">extract_variable</span></code>
1064
1067
 found a '$', '@', or '%' indicating a variable, but that
1065
1068
character was not followed by a legal Perl identifier.</p>
1066
1069
</li>
1067
 
<li><a name="'Did-not-find-expected-opening-bracket-at-%25s'"></a><b><code class="inline">Did not find expected opening bracket at <span class="i">%s</span></code>
 
1070
<li><a name="'Did-not-find-expected-opening-bracket-at-%25s'"></a><b><code class="inline"><span class="w">Did</span> not <span class="w">find</span> <span class="w">expected</span> <span class="w">opening</span> <span class="w">bracket</span> <span class="w">at</span> <span class="i">%s</span></code>
1068
1071
</b>
1069
 
<p><code class="inline">extract_codeblock</code>
 
1072
<p><code class="inline"><span class="w">extract_codeblock</span></code>
1070
1073
 failed to find any of the outermost opening brackets
1071
1074
that were specified.</p>
1072
1075
</li>
1073
 
<li><a name="'Improperly-nested-codeblock-at-%25s'"></a><b><code class="inline">Improperly nested codeblock at <span class="i">%s</span></code>
 
1076
<li><a name="'Improperly-nested-codeblock-at-%25s'"></a><b><code class="inline"><span class="w">Improperly</span> <span class="w">nested</span> <span class="w">codeblock</span> <span class="w">at</span> <span class="i">%s</span></code>
1074
1077
</b>
1075
1078
<p>A nested code block was found that started with a delimiter that was specified
1076
1079
as being only to be used as an outermost bracket.</p>
1077
1080
</li>
1078
 
<li><a name="'Missing-second-block-for-quotelike-%22%25s%22'"></a><b><code class="inline">Missing second block for quotelike <span class="q">&quot;%s&quot;</span></code>
 
1081
<li><a name="'Missing-second-block-for-quotelike-%22%25s%22'"></a><b><code class="inline"><span class="w">Missing</span> <span class="w">second</span> <span class="w">block</span> for <span class="w">quotelike</span> <span class="q">&quot;%s&quot;</span></code>
1079
1082
</b>
1080
 
<p><code class="inline">extract_codeblock</code>
1081
 
 or <code class="inline">extract_quotelike</code>
 
1083
<p><code class="inline"><span class="w">extract_codeblock</span></code>
 
1084
 or <code class="inline"><span class="w">extract_quotelike</span></code>
1082
1085
 found one of the
1083
1086
quotelike operators <code class="inline"><a class="l_k" href="../functions/s.html">s</a></code>, <code class="inline"><a class="l_k" href="../functions/tr.html">tr</a></code> or <code class="inline"><a class="l_k" href="../functions/y.html">y</a></code> followed by only one block.</p>
1084
1087
</li>
1085
 
<li><a name="'No-match-found-for-opening-bracket'"></a><b><code class="inline">No match found for opening bracket</code>
 
1088
<li><a name="'No-match-found-for-opening-bracket'"></a><b><code class="inline"><span class="w">No</span> <span class="w">match</span> <span class="w">found</span> for <span class="w">opening</span> <span class="w">bracket</span></code>
1086
1089
</b>
1087
 
<p><code class="inline">extract_codeblock</code>
 
1090
<p><code class="inline"><span class="w">extract_codeblock</span></code>
1088
1091
 failed to find a closing bracket to match the outermost
1089
1092
opening bracket.</p>
1090
1093
</li>
1091
1094
<li><a name="'Did-not-find-opening-tag%3a-%2f%25s%2f'"></a><b><code class="inline">Did not find opening tag: /%s/</code></b>
1092
 
<p><code class="inline">extract_tagged</code>
 
1095
<p><code class="inline"><span class="w">extract_tagged</span></code>
1093
1096
 did not find a suitable opening tag (after any specified
1094
1097
prefix was removed).</p>
1095
1098
</li>
1096
1099
<li><a name="'Unable-to-construct-closing-tag-to-match%3a-%2f%25s%2f'"></a><b><code class="inline">Unable to construct closing tag to match: /%s/</code></b>
1097
 
<p><code class="inline">extract_tagged</code>
 
1100
<p><code class="inline"><span class="w">extract_tagged</span></code>
1098
1101
 matched the specified opening tag and tried to
1099
1102
modify the matched text to produce a matching closing tag (because
1100
1103
none was specified). It failed to generate the closing tag, almost
1102
1105
bracket of some kind.</p>
1103
1106
</li>
1104
1107
<li><a name="'Found-invalid-nested-tag%3a-%25s'"></a><b><code class="inline">Found invalid nested tag: %s</code></b>
1105
 
<p><code class="inline">extract_tagged</code>
 
1108
<p><code class="inline"><span class="w">extract_tagged</span></code>
1106
1109
 found a nested tag that appeared in the "reject" list
1107
1110
(and the failure mode was not "MAX" or "PARA").</p>
1108
1111
</li>
1109
1112
<li><a name="'Found-unbalanced-nested-tag%3a-%25s'"></a><b><code class="inline">Found unbalanced nested tag: %s</code></b>
1110
 
<p><code class="inline">extract_tagged</code>
 
1113
<p><code class="inline"><span class="w">extract_tagged</span></code>
1111
1114
 found a nested opening tag that was not matched by a
1112
1115
corresponding nested closing tag (and the failure mode was not "MAX" or "PARA").</p>
1113
1116
</li>
1114
 
<li><a name="'Did-not-find-closing-tag'"></a><b><code class="inline">Did not find closing tag</code>
 
1117
<li><a name="'Did-not-find-closing-tag'"></a><b><code class="inline"><span class="w">Did</span> not <span class="w">find</span> <span class="w">closing</span> <span class="w">tag</span></code>
1115
1118
</b>
1116
 
<p><code class="inline">extract_tagged</code>
 
1119
<p><code class="inline"><span class="w">extract_tagged</span></code>
1117
1120
 reached the end of the text without finding a closing tag
1118
1121
to match the original opening tag (and the failure mode was not
1119
1122
"MAX" or "PARA").</p>
1124
1127
<a name="BUGS-AND-IRRITATIONS"></a><h1>BUGS AND IRRITATIONS</h1>
1125
1128
<p>There are undoubtedly serious bugs lurking somewhere in this code, if
1126
1129
only because parts of it give the impression of understanding a great deal
1127
 
more about Perl than they really do. </p>
 
1130
more about Perl than they really do.</p>
1128
1131
<p>Bug reports and other feedback are most welcome.</p>
1129
1132
<a name="COPYRIGHT"></a><h1>COPYRIGHT</h1>
1130
1133
<pre class="verbatim"> Copyright (c) 1997-2001, Damian Conway. All Rights Reserved.
1147
1150
          <!--<select name="r"><option value="1" selected>Go to top result<option value="0">Show results list</select>-->
1148
1151
        </form>
1149
1152
      </p>
 
1153
      <script language="JavaScript" type="text/javascript" src="/perl-version.js"></script>
1150
1154
      <h2>Labels:</h2>
1151
1155
      <p>
1152
1156
        <a href="#" onClick="addLabel('Text::Balanced','Text/Balanced.html')">Add this page</a>