317
317
strings: a single CR (carriage return) character, a single LF (linefeed)
318
318
character, the two-character sequence CRLF, any of the three preceding, or any
319
319
Unicode newline sequence. The Unicode newline sequences are the three just
320
mentioned, plus the single characters VT (vertical tab, U+000B), FF (formfeed,
320
mentioned, plus the single characters VT (vertical tab, U+000B), FF (form feed,
321
321
U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
322
322
(paragraph separator, U+2029).
524
524
the pattern, the contents of the <i>options</i> argument specifies their
525
525
settings at the start of compilation and execution. The PCRE_ANCHORED,
526
526
PCRE_BSR_<i>xxx</i>, PCRE_NEWLINE_<i>xxx</i>, PCRE_NO_UTF8_CHECK, and
527
PCRE_NO_START_OPT options can be set at the time of matching as well as at
527
PCRE_NO_START_OPTIMIZE options can be set at the time of matching as well as at
644
If this bit is set, whitespace data characters in the pattern are totally
645
ignored except when escaped or inside a character class. Whitespace does not
644
If this bit is set, white space data characters in the pattern are totally
645
ignored except when escaped or inside a character class. White space does not
646
646
include the VT character (code 11). In addition, characters between an
647
647
unescaped # outside a character class and the next newline, inclusive, are also
648
648
ignored. This is equivalent to Perl's /x option, and it can be changed within a
661
661
This option makes it possible to include comments inside complicated patterns.
662
Note, however, that this applies only to data characters. Whitespace characters
662
Note, however, that this applies only to data characters. White space characters
663
663
may never appear within special character sequences in a pattern, for example
664
664
within the sequence (?( that introduces a conditional subpattern.
745
745
preceding sequences should be recognized. Setting PCRE_NEWLINE_ANY specifies
746
746
that any Unicode newline sequence should be recognized. The Unicode newline
747
747
sequences are the three just mentioned, plus the single characters VT (vertical
748
tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line
748
tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
749
749
separator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit
750
750
library, the last two are recognized only in UTF-8 mode.
761
761
The only time that a line break in a pattern is specially recognized when
762
compiling is when PCRE_EXTENDED is set. CR and LF are whitespace characters,
762
compiling is when PCRE_EXTENDED is set. CR and LF are white space characters,
763
763
and so are ignored in this mode. Also, an unescaped # outside a character class
764
764
indicates a comment that lasts until after the next line break sequence. In
765
765
other circumstances, line break sequences in patterns are treated as literal
916
916
72 too many forward references
917
917
73 disallowed Unicode code point (>= 0xd800 && <= 0xdfff)
918
918
74 invalid UTF-16 string (specifically UTF-16)
919
75 name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
920
76 character value in \u.... sequence is too large
920
922
The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may
921
923
be used if the limits were changed when PCRE was built.
949
951
<b>pcre_dfa_exec()</b>, it must set up its own <b>pcre_extra</b> block.
952
The second argument of <b>pcre_study()</b> contains option bits. There is only
953
one option: PCRE_STUDY_JIT_COMPILE. If this is set, and the just-in-time
954
compiler is available, the pattern is further compiled into machine code that
955
executes much faster than the <b>pcre_exec()</b> matching function. If
956
the just-in-time compiler is not available, this option is ignored. All other
957
bits in the <i>options</i> argument must be zero.
954
The second argument of <b>pcre_study()</b> contains option bits. There are three
957
PCRE_STUDY_JIT_COMPILE
958
PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
959
PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
961
If any of these are set, and the just-in-time compiler is available, the
962
pattern is further compiled into machine code that executes much faster than
963
the <b>pcre_exec()</b> interpretive matching function. If the just-in-time
964
compiler is not available, these options are ignored. All other bits in the
965
<i>options</i> argument must be zero.
960
968
JIT compilation is a heavyweight optimization. It can take some time for
979
987
study data by calling <b>pcre_free_study()</b>. This function was added to the
980
988
API for release 8.20. For earlier versions, the memory could be freed with
981
989
<b>pcre_free()</b>, just like the pattern itself. This will still work in cases
982
where PCRE_STUDY_JIT_COMPILE is not used, but it is advisable to change to the
983
new function when convenient.
990
where JIT optimization is not used, but it is advisable to change to the new
991
function when convenient.
986
994
This is a typical way in which <b>pcre_study</b>() is used (except that in a
1018
1026
These two optimizations apply to both <b>pcre_exec()</b> and
1019
<b>pcre_dfa_exec()</b>. However, they are not used by <b>pcre_exec()</b> if
1020
<b>pcre_study()</b> is called with the PCRE_STUDY_JIT_COMPILE option, and
1021
just-in-time compiling is successful. The optimizations can be disabled by
1022
setting the PCRE_NO_START_OPTIMIZE option when calling <b>pcre_exec()</b> or
1023
<b>pcre_dfa_exec()</b>. You might want to do this if your pattern contains
1024
callouts or (*MARK) (which cannot be handled by the JIT compiler), and you want
1025
to make use of these facilities in cases where matching fails. See the
1026
discussion of PCRE_NO_START_OPTIMIZE
1027
<b>pcre_dfa_exec()</b>, and the information is also used by the JIT compiler.
1028
The optimizations can be disabled by setting the PCRE_NO_START_OPTIMIZE option
1029
when calling <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>, but if this is done,
1030
JIT execution is also disabled. You might want to do this if your pattern
1031
contains callouts or (*MARK) and you want to make use of these facilities in
1032
cases where matching fails. See the discussion of PCRE_NO_START_OPTIMIZE
1027
1033
<a href="#execoptions">below.</a>
1028
1034
<a name="localesupport"></a></P>
1029
1035
<br><a name="SEC14" href="#TOC1">LOCALE SUPPORT</a><br>
1202
Return 1 if the pattern was studied with the PCRE_STUDY_JIT_COMPILE option, and
1208
Return 1 if the pattern was studied with one of the JIT options, and
1203
1209
just-in-time compiling was successful. The fourth argument should point to an
1204
1210
<b>int</b> variable. A return value of 0 means that JIT support is not available
1205
in this version of PCRE, or that the pattern was not studied with the
1206
PCRE_STUDY_JIT_COMPILE option, or that the JIT compiler could not handle this
1207
particular pattern. See the
1211
in this version of PCRE, or that the pattern was not studied with a JIT option,
1212
or that the JIT compiler could not handle this particular pattern. See the
1208
1213
<a href="pcrejit.html"><b>pcrejit</b></a>
1209
1214
documentation for details of what can and cannot be handled.
1211
1216
PCRE_INFO_JITSIZE
1213
If the pattern was successfully studied with the PCRE_STUDY_JIT_COMPILE option,
1214
return the size of the JIT compiled code, otherwise return zero. The fourth
1215
argument should point to a <b>size_t</b> variable.
1218
If the pattern was successfully studied with a JIT option, return the size of
1219
the JIT compiled code, otherwise return zero. The fourth argument should point
1220
to a <b>size_t</b> variable.
1217
1222
PCRE_INFO_LASTLITERAL
1224
1229
/^a\d+z\d+/ the returned value is "z", but for /^a\dz\d/ the returned value
1232
PCRE_INFO_MAXLOOKBEHIND
1234
Return the number of characters (NB not bytes) in the longest lookbehind
1235
assertion in the pattern. Note that the simple assertions \b and \B require a
1236
one-character lookbehind. This information is useful when doing multi-segment
1237
matching using the partial matching facilities.
1227
1239
PCRE_INFO_MINLENGTH
1229
1241
If the pattern was studied and a minimum length for matching subject strings
1439
1451
"PCRE_UCHAR16 **".
1442
The <i>flags</i> field is a bitmap that specifies which of the other fields
1443
are set. The flag bits are:
1454
The <i>flags</i> field is used to specify which of the other fields are set. The
1445
PCRE_EXTRA_STUDY_DATA
1457
PCRE_EXTRA_CALLOUT_DATA
1446
1458
PCRE_EXTRA_EXECUTABLE_JIT
1447
1460
PCRE_EXTRA_MATCH_LIMIT
1448
1461
PCRE_EXTRA_MATCH_LIMIT_RECURSION
1449
PCRE_EXTRA_CALLOUT_DATA
1462
PCRE_EXTRA_STUDY_DATA
1450
1463
PCRE_EXTRA_TABLES
1453
1465
Other flag bits should be set to zero. The <i>study_data</i> field and sometimes
1454
1466
the <i>executable_jit</i> field are set in the <b>pcre_extra</b> block that is
1455
1467
returned by <b>pcre_study()</b>, together with the appropriate flag bits. You
1456
should not set these yourself, but you may add to the block by setting the
1457
other fields and their corresponding flag bits.
1468
should not set these yourself, but you may add to the block by setting other
1469
fields and their corresponding flag bits.
1460
1472
The <i>match_limit</i> field provides a means of preventing PCRE from using up a
1474
1486
When <b>pcre_exec()</b> is called with a pattern that was successfully studied
1475
with the PCRE_STUDY_JIT_COMPILE option, the way that the matching is executed
1476
is entirely different. However, there is still the possibility of runaway
1477
matching that goes on for a very long time, and so the <i>match_limit</i> value
1478
is also used in this case (but in a different way) to limit how long the
1479
matching can continue.
1487
with a JIT option, the way that the matching is executed is entirely different.
1488
However, there is still the possibility of runaway matching that goes on for a
1489
very long time, and so the <i>match_limit</i> value is also used in this case
1490
(but in a different way) to limit how long the matching can continue.
1482
1493
The default value for the limit can be set when PCRE is built; the default
1497
1508
Limiting the recursion depth limits the amount of machine stack that can be
1498
1509
used, or, when PCRE has been compiled to use memory on the heap instead of the
1499
1510
stack, the amount of heap memory that can be used. This limit is not relevant,
1500
and is ignored, if the pattern was successfully studied with
1501
PCRE_STUDY_JIT_COMPILE.
1511
and is ignored, when matching is done using JIT compiled code.
1504
1514
The default value for <i>match_limit_recursion</i> can be set when PCRE is
1549
1559
The unused bits of the <i>options</i> argument for <b>pcre_exec()</b> must be
1550
1560
zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_<i>xxx</i>,
1551
1561
PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART,
1552
PCRE_NO_START_OPTIMIZE, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_SOFT, and
1562
PCRE_NO_START_OPTIMIZE, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD, and
1556
If the pattern was successfully studied with the PCRE_STUDY_JIT_COMPILE option,
1557
the only supported options for JIT execution are PCRE_NO_UTF8_CHECK,
1558
PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NOTEMPTY_ATSTART. Note in
1559
particular that partial matching is not supported. If an unsupported option is
1560
used, JIT execution is disabled and the normal interpretive code in
1561
<b>pcre_exec()</b> is run.
1566
If the pattern was successfully studied with one of the just-in-time (JIT)
1567
compile options, the only supported options for JIT execution are
1568
PCRE_NO_UTF8_CHECK, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY,
1569
PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and PCRE_PARTIAL_SOFT. If an
1570
unsupported option is used, JIT execution is disabled and the normal
1571
interpretive code in <b>pcre_exec()</b> is run.
1681
1691
"no match", the callouts do occur, and that items such as (*COMMIT) and (*MARK)
1682
1692
are considered at every possible starting position in the subject string. If
1683
1693
PCRE_NO_START_OPTIMIZE is set at compile time, it cannot be unset at matching
1694
time. The use of PCRE_NO_START_OPTIMIZE disables JIT execution; when it is set,
1695
matching is always done using interpretively.
1687
1698
Setting PCRE_NO_START_OPTIMIZE can change the outcome of a matching operation.
1716
1727
When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8
1717
1728
string is automatically checked when <b>pcre_exec()</b> is subsequently called.
1718
The value of <i>startoffset</i> is also checked to ensure that it points to the
1719
start of a UTF-8 character. There is a discussion about the validity of UTF-8
1729
The entire string is checked before any other processing takes place. The value
1730
of <i>startoffset</i> is also checked to ensure that it points to the start of a
1731
UTF-8 character. There is a discussion about the
1732
<a href="pcreunicode.html#utf8strings">validity of UTF-8 strings</a>
1721
1734
<a href="pcreunicode.html"><b>pcreunicode</b></a>
1722
1735
page. If an invalid sequence of bytes is found, <b>pcre_exec()</b> returns the
1723
1736
error PCRE_ERROR_BADUTF8 or, if PCRE_PARTIAL_HARD is set and the problem is a
1869
1882
If the vector is too small to hold all the captured substring offsets, it is
1870
1883
used as far as possible (up to two-thirds of its length), and the function
1871
returns a value of zero. If neither the actual string matched not any captured
1884
returns a value of zero. If neither the actual string matched nor any captured
1872
1885
substrings are of interest, <b>pcre_exec()</b> may be called with <i>ovector</i>
1873
1886
passed as NULL and <i>ovecsize</i> as zero. However, if the pattern contains
1874
1887
back references and the <i>ovector</i> is not big enough to remember the related
2068
2081
PCRE_ERROR_JIT_STACKLIMIT (-27)
2070
This error is returned when a pattern that was successfully studied using the
2071
PCRE_STUDY_JIT_COMPILE option is being matched, but the memory available for
2072
the just-in-time processing stack is not large enough. See the
2083
This error is returned when a pattern that was successfully studied using a
2084
JIT compile option is being matched, but the memory available for the
2085
just-in-time processing stack is not large enough. See the
2073
2086
<a href="pcrejit.html"><b>pcrejit</b></a>
2074
2087
documentation for more details.
2076
PCRE_ERROR_BADMODE (-28)
2089
PCRE_ERROR_BADMODE (-28)
2078
2091
This error is given if a pattern that was compiled by the 8-bit library is
2079
2092
passed to a 16-bit library function, or vice versa.
2081
PCRE_ERROR_BADENDIANNESS (-29)
2094
PCRE_ERROR_BADENDIANNESS (-29)
2083
2096
This error is given if a pattern that was compiled and saved is reloaded on a
2084
2097
host with different endianness. The utility function
2086
2099
so that it runs on the new host.
2089
Error numbers -16 to -20 and -22 are not used by <b>pcre_exec()</b>.
2102
Error numbers -16 to -20, -22, and -30 are not used by <b>pcre_exec()</b>.
2090
2103
<a name="badutf8reasons"></a></P>
2092
2105
Reason codes for invalid UTF-8 strings
2581
2594
recursively, using private vectors for <i>ovector</i> and <i>workspace</i>. This
2582
2595
error is given if the output vector is not large enough. This should be
2583
2596
extremely rare, as a vector of size 1000 is used.
2598
PCRE_ERROR_DFA_BADRESTART (-30)
2600
When <b>pcre_dfa_exec()</b> is called with the <b>PCRE_DFA_RESTART</b> option,
2601
some plausibility checks are made on the contents of the workspace, which
2602
should contain data about the previous partial match. If any of these checks
2603
fail, this error is given.
2585
2605
<br><a name="SEC24" href="#TOC1">SEE ALSO</a><br>