16
16
<li><a name="TOC1" href="#SEC1">PCRE BUILD-TIME OPTIONS</a>
17
<li><a name="TOC2" href="#SEC2">C++ SUPPORT</a>
18
<li><a name="TOC3" href="#SEC3">UTF-8 SUPPORT</a>
19
<li><a name="TOC4" href="#SEC4">UNICODE CHARACTER PROPERTY SUPPORT</a>
20
<li><a name="TOC5" href="#SEC5">CODE VALUE OF NEWLINE</a>
21
<li><a name="TOC6" href="#SEC6">WHAT \R MATCHES</a>
22
<li><a name="TOC7" href="#SEC7">BUILDING SHARED AND STATIC LIBRARIES</a>
23
<li><a name="TOC8" href="#SEC8">POSIX MALLOC USAGE</a>
24
<li><a name="TOC9" href="#SEC9">HANDLING VERY LARGE PATTERNS</a>
25
<li><a name="TOC10" href="#SEC10">AVOIDING EXCESSIVE STACK USAGE</a>
26
<li><a name="TOC11" href="#SEC11">LIMITING PCRE RESOURCE USAGE</a>
27
<li><a name="TOC12" href="#SEC12">CREATING CHARACTER TABLES AT BUILD TIME</a>
28
<li><a name="TOC13" href="#SEC13">USING EBCDIC CODE</a>
29
<li><a name="TOC14" href="#SEC14">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>
30
<li><a name="TOC15" href="#SEC15">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a>
31
<li><a name="TOC16" href="#SEC16">SEE ALSO</a>
32
<li><a name="TOC17" href="#SEC17">AUTHOR</a>
33
<li><a name="TOC18" href="#SEC18">REVISION</a>
17
<li><a name="TOC2" href="#SEC2">BUILDING 8-BIT and 16-BIT LIBRARIES</a>
18
<li><a name="TOC3" href="#SEC3">BUILDING SHARED AND STATIC LIBRARIES</a>
19
<li><a name="TOC4" href="#SEC4">C++ SUPPORT</a>
20
<li><a name="TOC5" href="#SEC5">UTF-8 and UTF-16 SUPPORT</a>
21
<li><a name="TOC6" href="#SEC6">UNICODE CHARACTER PROPERTY SUPPORT</a>
22
<li><a name="TOC7" href="#SEC7">JUST-IN-TIME COMPILER SUPPORT</a>
23
<li><a name="TOC8" href="#SEC8">CODE VALUE OF NEWLINE</a>
24
<li><a name="TOC9" href="#SEC9">WHAT \R MATCHES</a>
25
<li><a name="TOC10" href="#SEC10">POSIX MALLOC USAGE</a>
26
<li><a name="TOC11" href="#SEC11">HANDLING VERY LARGE PATTERNS</a>
27
<li><a name="TOC12" href="#SEC12">AVOIDING EXCESSIVE STACK USAGE</a>
28
<li><a name="TOC13" href="#SEC13">LIMITING PCRE RESOURCE USAGE</a>
29
<li><a name="TOC14" href="#SEC14">CREATING CHARACTER TABLES AT BUILD TIME</a>
30
<li><a name="TOC15" href="#SEC15">USING EBCDIC CODE</a>
31
<li><a name="TOC16" href="#SEC16">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>
32
<li><a name="TOC17" href="#SEC17">PCREGREP BUFFER SIZE</a>
33
<li><a name="TOC18" href="#SEC18">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a>
34
<li><a name="TOC19" href="#SEC19">SEE ALSO</a>
35
<li><a name="TOC20" href="#SEC20">AUTHOR</a>
36
<li><a name="TOC21" href="#SEC21">REVISION</a>
35
38
<br><a name="SEC1" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br>
61
64
--enable and --disable always come in pairs, so the complementary option always
62
65
exists as well, but as it specifies the default, it is not described.
64
<br><a name="SEC2" href="#TOC1">C++ SUPPORT</a><br>
66
By default, the <b>configure</b> script will search for a C++ compiler and C++
67
header files. If it finds them, it automatically builds the C++ wrapper library
68
for PCRE. You can disable this by adding
67
<br><a name="SEC2" href="#TOC1">BUILDING 8-BIT and 16-BIT LIBRARIES</a><br>
69
By default, a library called <b>libpcre</b> is built, containing functions that
70
take string arguments contained in vectors of bytes, either as single-byte
71
characters, or interpreted as UTF-8 strings. You can also build a separate
72
library, called <b>libpcre16</b>, in which strings are contained in vectors of
73
16-bit data units and interpreted either as single-unit characters or UTF-16
78
to the <b>configure</b> command. If you do not want the 8-bit library, add
82
as well. At least one of the two libraries must be built. Note that the C++ and
83
POSIX wrappers are for the 8-bit library only, and that <b>pcregrep</b> is an
84
8-bit program. None of these are built if you select only the 16-bit library.
86
<br><a name="SEC3" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
88
The PCRE building process uses <b>libtool</b> to build both shared and static
89
Unix libraries by default. You can suppress one of these by adding one of
94
to the <b>configure</b> command, as required.
96
<br><a name="SEC4" href="#TOC1">C++ SUPPORT</a><br>
98
By default, if the 8-bit library is being built, the <b>configure</b> script
99
will search for a C++ compiler and C++ header files. If it finds them, it
100
automatically builds the C++ wrapper library (which supports only 8-bit
101
strings). You can disable this by adding
72
105
to the <b>configure</b> command.
74
<br><a name="SEC3" href="#TOC1">UTF-8 SUPPORT</a><br>
107
<br><a name="SEC5" href="#TOC1">UTF-8 and UTF-16 SUPPORT</a><br>
76
To build PCRE with support for UTF-8 Unicode character strings, add
109
To build PCRE with support for UTF Unicode character strings, add
80
to the <b>configure</b> command. Of itself, this does not make PCRE treat
81
strings as UTF-8. As well as compiling PCRE with this option, you also have
82
have to set the PCRE_UTF8 option when you call the <b>pcre_compile()</b>
83
or <b>pcre_compile2()</b> functions.
86
If you set --enable-utf8 when compiling in an EBCDIC environment, PCRE expects
113
to the <b>configure</b> command. This setting applies to both libraries, adding
114
support for UTF-8 to the 8-bit library and support for UTF-16 to the 16-bit
115
library. There are no separate options for enabling UTF-8 and UTF-16
116
independently because that would allow ridiculous settings such as requesting
117
UTF-16 support while building only the 8-bit library. It is not possible to
118
build one library with UTF support and the other without in the same
119
configuration. (For backwards compatibility, --enable-utf8 is a synonym of
123
Of itself, this setting does not make PCRE treat strings as UTF-8 or UTF-16. As
124
well as compiling PCRE with this option, you also have have to set the
125
PCRE_UTF8 or PCRE_UTF16 option when you call one of the pattern compiling
129
If you set --enable-utf when compiling in an EBCDIC environment, PCRE expects
87
130
its input to be either ASCII or UTF-8 (depending on the runtime option). It is
88
131
not possible to support both EBCDIC and UTF-8 codes in the same version of the
89
library. Consequently, --enable-utf8 and --enable-ebcdic are mutually
132
library. Consequently, --enable-utf and --enable-ebcdic are mutually
92
<br><a name="SEC4" href="#TOC1">UNICODE CHARACTER PROPERTY SUPPORT</a><br>
135
<br><a name="SEC6" href="#TOC1">UNICODE CHARACTER PROPERTY SUPPORT</a><br>
94
UTF-8 support allows PCRE to process character values greater than 255 in the
95
strings that it handles. On its own, however, it does not provide any
137
UTF support allows the libraries to process character codepoints up to 0x10ffff
138
in the strings that they handle. On its own, however, it does not provide any
96
139
facilities for accessing the properties of such characters. If you want to be
97
140
able to use the pattern escapes \P, \p, and \X, which refer to Unicode
98
141
character properties, you must add
100
143
--enable-unicode-properties
102
to the <b>configure</b> command. This implies UTF-8 support, even if you have
145
to the <b>configure</b> command. This implies UTF support, even if you have
103
146
not explicitly requested it.
109
152
<a href="pcrepattern.html"><b>pcrepattern</b></a>
112
<br><a name="SEC5" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
155
<br><a name="SEC7" href="#TOC1">JUST-IN-TIME COMPILER SUPPORT</a><br>
157
Just-in-time compiler support is included in the build by specifying
161
This support is available only for certain hardware architectures. If this
162
option is set for an unsupported architecture, a compile time error occurs.
164
<a href="pcrejit.html"><b>pcrejit</b></a>
165
documentation for a discussion of JIT usage. When JIT support is enabled,
166
pcregrep automatically makes use of it, unless you add
168
--disable-pcregrep-jit
170
to the "configure" command.
172
<br><a name="SEC8" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
114
174
By default, PCRE interprets the linefeed (LF) character as indicating the end
115
175
of a line. This is the normal newline character on Unix-like systems. You can
153
213
selected when PCRE is built can be overridden when the library functions are
156
<br><a name="SEC7" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
158
The PCRE building process uses <b>libtool</b> to build both shared and static
159
Unix libraries by default. You can suppress one of these by adding one of
164
to the <b>configure</b> command, as required.
166
<br><a name="SEC8" href="#TOC1">POSIX MALLOC USAGE</a><br>
168
When PCRE is called through the POSIX interface (see the
216
<br><a name="SEC10" href="#TOC1">POSIX MALLOC USAGE</a><br>
218
When the 8-bit library is called through the POSIX interface (see the
169
219
<a href="pcreposix.html"><b>pcreposix</b></a>
170
220
documentation), additional working storage is required for holding the pointers
171
221
to capturing substrings, because PCRE requires three integers per substring,
180
230
to the <b>configure</b> command.
182
<br><a name="SEC9" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
232
<br><a name="SEC11" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
184
234
Within a compiled pattern, offset values are used to point from one part to
185
235
another (for example, from an opening parenthesis to an alternation
186
236
metacharacter). By default, two-byte values are used for these offsets, leading
187
237
to a maximum size for a compiled pattern of around 64K. This is sufficient to
188
238
handle all but the most gigantic patterns. Nevertheless, some people do want to
189
process truyl enormous patterns, so it is possible to compile PCRE to use
239
process truly enormous patterns, so it is possible to compile PCRE to use
190
240
three-byte or four-byte offsets by adding a setting such as
192
242
--with-link-size=3
194
to the <b>configure</b> command. The value given must be 2, 3, or 4. Using
195
longer offsets slows down the operation of PCRE because it has to load
196
additional bytes when handling them.
244
to the <b>configure</b> command. The value given must be 2, 3, or 4. For the
245
16-bit library, a value of 3 is rounded up to 4. Using longer offsets slows
246
down the operation of PCRE because it has to load additional data when handling
198
<br><a name="SEC10" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
249
<br><a name="SEC12" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
200
251
When matching with the <b>pcre_exec()</b> function, PCRE implements backtracking
201
252
by making recursive calls to an internal function called <b>match()</b>. In
299
350
relevant libraries are installed on your system. Configuration will fail if
302
<br><a name="SEC15" href="#TOC1">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a><br>
353
<br><a name="SEC17" href="#TOC1">PCREGREP BUFFER SIZE</a><br>
355
<b>pcregrep</b> uses an internal buffer to hold a "window" on the file it is
356
scanning, in order to be able to output "before" and "after" lines when it
357
finds a match. The size of the buffer is controlled by a parameter whose
358
default value is 20K. The buffer itself is three times this size, but because
359
of the way it is used for holding "before" lines, the longest line that is
360
guaranteed to be processable is the parameter size. You can change the default
361
parameter value by adding, for example,
363
--with-pcregrep-bufsize=50K
365
to the <b>configure</b> command. The caller of \fPpcregrep\fP can, however,
366
override this value by specifying a run-time option.
368
<br><a name="SEC18" href="#TOC1">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a><br>