~ubuntu-branches/ubuntu/trusty/pcre3/trusty

Committer: Package Import Robot
Author(s): Mark Baker
Date: 2012-03-23 22:34:54 UTC
mfrom: (23.1.9 sid)
Revision ID: package-import@ubuntu.com-20120323223454-grhqqolk8a7x1h24

Tags: 1:8.30-4

* Reluctantly using an epoch, as it seems the funny version number with
extra dots causes problems
* Bumped standard version to 3.9.3. No changes needed
* Converted to use new source format / quilt
* Put back obsolete pcre_info() API that upstream have dropped (Closes:
#665300, #665356)
* Don't include pcregrep binary in debug package

Thanks to Elimar Riesebieter for the conversion to the new source format.

files added:
.pc

.pc/.version

.pc/PCRE6_compatible_API.patch

.pc/PCRE6_compatible_API.patch/pcrecpp.cc

.pc/PCRE6_compatible_API.patch/pcrecpp.h

.pc/PCRE6_compatible_API.patch/pcretest.c

.pc/applied-patches

.pc/pcre_info.patch

.pc/pcre_info.patch/Makefile.am

.pc/pcre_info.patch/Makefile.in

.pc/pcre_info.patch/pcre_info.c

.pc/pcregrep.1-patch

.pc/pcregrep.1-patch/doc

.pc/pcregrep.1-patch/doc/pcregrep.1

.pc/pcreposix.patch

.pc/pcreposix.patch/pcreposix.h

.pc/soname.patch

.pc/soname.patch/configure

CheckMan

debian/patches

debian/patches/PCRE6_compatible_API.patch

debian/patches/pcre_info.patch

debian/patches/pcregrep.1-patch

debian/patches/pcreposix.patch

debian/patches/series

debian/patches/soname.patch

debian/source/options

doc/html/pcre16.html

doc/html/pcre_assign_jit_stack.html

doc/html/pcre_free_study.html

doc/html/pcre_jit_stack_alloc.html

doc/html/pcre_jit_stack_free.html

doc/html/pcre_pattern_to_host_byte_order.html

doc/html/pcre_utf16_to_host_byte_order.html

doc/html/pcrejit.html

doc/html/pcrelimits.html

doc/html/pcreunicode.html

doc/pcre16.3

doc/pcre_assign_jit_stack.3

doc/pcre_free_study.3

doc/pcre_jit_stack_alloc.3

doc/pcre_jit_stack_free.3

doc/pcre_pattern_to_host_byte_order.3

doc/pcre_utf16_to_host_byte_order.3

doc/pcrejit.3

doc/pcrelimits.3

doc/pcreunicode.3

libpcre16.pc.in

pcre16_byte_order.c

pcre16_chartables.c

pcre16_compile.c

pcre16_config.c

pcre16_dfa_exec.c

pcre16_exec.c

pcre16_fullinfo.c

pcre16_get.c

pcre16_globals.c

pcre16_jit_compile.c

pcre16_maketables.c

pcre16_newline.c

pcre16_ord2utf16.c

pcre16_printint.c

pcre16_refcount.c

pcre16_string_utils.c

pcre16_study.c

pcre16_tables.c

pcre16_ucd.c

pcre16_utf16_utils.c

pcre16_valid_utf16.c

pcre16_version.c

pcre16_xclass.c

pcre_byte_order.c

pcre_jit_compile.c

pcre_jit_test.c

pcre_printint.c

pcre_string_utils.c

sljit

sljit/sljitConfig.h

sljit/sljitConfigInternal.h

sljit/sljitExecAllocator.c

sljit/sljitLir.c

sljit/sljitLir.h

sljit/sljitNativeARM_Thumb2.c

sljit/sljitNativeARM_v5.c

sljit/sljitNativeMIPS_32.c

sljit/sljitNativeMIPS_common.c

sljit/sljitNativePPC_32.c

sljit/sljitNativePPC_64.c

sljit/sljitNativePPC_common.c

sljit/sljitNativeX86_32.c

sljit/sljitNativeX86_64.c

sljit/sljitNativeX86_common.c

sljit/sljitUtils.c

testdata/greppatN4

testdata/saved16

testdata/saved16BE-1

testdata/saved16BE-2

testdata/saved16LE-1

testdata/saved16LE-2

testdata/saved8

testdata/testinput13

testdata/testinput14

testdata/testinput15

testdata/testinput16

testdata/testinput17

testdata/testinput18

testdata/testinput19

testdata/testinput20

testdata/testinput21

testdata/testinput22

testdata/testoutput11-16

testdata/testoutput11-8

testdata/testoutput13

testdata/testoutput14

testdata/testoutput15

testdata/testoutput16

testdata/testoutput17

testdata/testoutput18

testdata/testoutput19

testdata/testoutput20

testdata/testoutput21

testdata/testoutput22

files removed:
doc/html/pcre_info.html

doc/pcre_info.3

pcre_printint.src

pcre_try_flipped.c

testdata/testoutput11

files modified:
AUTHORS

CMakeLists.txt

ChangeLog

HACKING

LICENCE

Makefile.am

Makefile.in

NEWS

NON-UNIX-USE

PrepareRelease

README

RunGrepTest

RunTest

RunTest.bat

aclocal.m4

config-cmake.h.in

config.guess

config.h.generic

config.h.in

config.sub

configure

configure.ac

debian/changelog

debian/control

debian/rules

debian/source/format

dftables.c

doc/html/index.html

doc/html/pcre-config.html

doc/html/pcre.html

doc/html/pcre_compile.html

doc/html/pcre_compile2.html

doc/html/pcre_config.html

doc/html/pcre_copy_named_substring.html

doc/html/pcre_copy_substring.html

doc/html/pcre_dfa_exec.html

doc/html/pcre_exec.html

doc/html/pcre_free_substring.html

doc/html/pcre_free_substring_list.html

doc/html/pcre_fullinfo.html

doc/html/pcre_get_named_substring.html

doc/html/pcre_get_stringnumber.html

doc/html/pcre_get_stringtable_entries.html

doc/html/pcre_get_substring.html

doc/html/pcre_get_substring_list.html

doc/html/pcre_maketables.html

doc/html/pcre_refcount.html

doc/html/pcre_study.html

doc/html/pcre_version.html

doc/html/pcreapi.html

doc/html/pcrebuild.html

doc/html/pcrecallout.html

doc/html/pcrecompat.html

doc/html/pcrecpp.html

doc/html/pcregrep.html

doc/html/pcrematching.html

doc/html/pcrepartial.html

doc/html/pcrepattern.html

doc/html/pcreperform.html

doc/html/pcreposix.html

doc/html/pcreprecompile.html

doc/html/pcresample.html

doc/html/pcrestack.html

doc/html/pcresyntax.html

doc/html/pcretest.html

doc/index.html.src

doc/pcre-config.1

doc/pcre-config.txt

doc/pcre.3

doc/pcre.txt

doc/pcre_compile.3

doc/pcre_compile2.3

doc/pcre_config.3

doc/pcre_copy_named_substring.3

doc/pcre_copy_substring.3

doc/pcre_dfa_exec.3

doc/pcre_exec.3

doc/pcre_free_substring.3

doc/pcre_free_substring_list.3

doc/pcre_fullinfo.3

doc/pcre_get_named_substring.3

doc/pcre_get_stringnumber.3

doc/pcre_get_stringtable_entries.3

doc/pcre_get_substring.3

doc/pcre_get_substring_list.3

doc/pcre_maketables.3

doc/pcre_refcount.3

doc/pcre_study.3

doc/pcre_version.3

doc/pcreapi.3

doc/pcrebuild.3

doc/pcrecallout.3

doc/pcrecompat.3

doc/pcrecpp.3

doc/pcregrep.1

doc/pcregrep.txt

doc/pcrematching.3

doc/pcrepartial.3

doc/pcrepattern.3

doc/pcreperform.3

doc/pcreposix.3

doc/pcreprecompile.3

doc/pcresample.3

doc/pcrestack.3

doc/pcresyntax.3

doc/pcretest.1

doc/pcretest.txt

doc/perltest.txt

libpcre.pc.in

ltmain.sh

makevp_c.txt

makevp_l.txt

pcre-config.in

pcre.h.generic

pcre.h.in

pcre_chartables.c.dist

pcre_compile.c

pcre_config.c

pcre_dfa_exec.c

pcre_exec.c

pcre_fullinfo.c

pcre_get.c

pcre_globals.c

pcre_info.c

pcre_internal.h

pcre_maketables.c

pcre_newline.c

pcre_ord2utf8.c

pcre_refcount.c

pcre_scanner_unittest.cc

pcre_study.c

pcre_tables.c

pcre_ucd.c

pcre_valid_utf8.c

pcre_version.c

pcre_xclass.c

pcrecpp.cc

pcrecpp_unittest.cc

pcregrep.c

pcreposix.c

pcreposix.h

pcretest.c

perltest.pl

testdata/grepinput

testdata/grepoutput

testdata/testinput1

testdata/testinput10

testdata/testinput11

testdata/testinput12

testdata/testinput2

testdata/testinput4

testdata/testinput5

testdata/testinput6

testdata/testinput7

testdata/testinput8

testdata/testinput9

testdata/testoutput1

testdata/testoutput10

testdata/testoutput12

testdata/testoutput2

testdata/testoutput4

testdata/testoutput5

testdata/testoutput6

testdata/testoutput7

testdata/testoutput8

testdata/testoutput9

ucp.h

Show diffs side-by-side

added added

removed removed

doc/html/pcre.html

<ul>

<li><a name="TOC1" href="#SEC1">INTRODUCTION</a>

<li><a name="TOC2" href="#SEC2">USER DOCUMENTATION</a>

<li><a name="TOC3" href="#SEC3">LIMITATIONS</a>

<li><a name="TOC4" href="#SEC4">UTF-8 AND UNICODE PROPERTY SUPPORT</a>

<li><a name="TOC5" href="#SEC5">AUTHOR</a>

<li><a name="TOC6" href="#SEC6">REVISION</a>

<li><a name="TOC3" href="#SEC3">AUTHOR</a>

<li><a name="TOC4" href="#SEC4">REVISION</a>

</ul>

<a name="SEC1" href="#TOC1">INTRODUCTION</a>

for requesting some minor changes that give better JavaScript compatibility.

Starting with release 8.30, it is possible to compile two separate PCRE

libraries: the original, which supports 8-bit character strings (including

UTF-8 strings), and a second library that supports 16-bit character strings

(including UTF-16 strings). The build process allows either one or both to be

built. The majority of the work to make this possible was done by Zoltan

Herczeg.

The two libraries contain identical sets of functions, except that the names in

the 16-bit library start with pcre16_ instead of pcre_. To avoid

over-complication and reduce the documentation maintenance load, most of the

documentation describes the 8-bit library, with the differences for the 16-bit

library described separately in the

page. References to functions or structures of the form pcre[16]_xxx

should be read as meaning "pcre_xxx when using the 8-bit library and

pcre16_xxx when using the 16-bit library".

The current implementation of PCRE corresponds approximately with Perl 5.12,

including support for UTF-8 encoded strings and Unicode general category

properties. However, UTF-8 and Unicode support has to be explicitly enabled; it

is not the default. The Unicode tables correspond to Unicode release 5.2.0.

including support for UTF-8/16 encoded strings and Unicode general category

properties. However, UTF-8/16 and Unicode support has to be explicitly enabled;

it is not the default. The Unicode tables correspond to Unicode release 6.0.0.

In addition to the Perl-compatible matching function, PCRE contains an

PCRE is written in C and released as a C library. A number of people have

written wrappers and interfaces of various kinds. In particular, Google Inc.

have provided a comprehensive C++ wrapper. This is now included as part of the

PCRE distribution. The

have provided a comprehensive C++ wrapper for the 8-bit library. This is now

included as part of the PCRE distribution. The

<a href="pcrecpp.html">pcrecpp</a>

page has details of this interface. Other people's contributions can be found

in the Contrib directory at the primary FTP site, which is:

distribution.

The library contains a number of undocumented internal functions and data

The libraries contains a number of undocumented internal functions and data

tables that are used by more than one of the exported external functions, but

which are not intended for use by external callers. Their names all begin with

"_pcre_", which hopefully will not provoke any name clashes. In some

environments, it is possible to control which external symbols are exported

when a shared library is built, and in these cases the undocumented symbols are

not exported.

"_pcre_" or "_pcre16_", which hopefully will not provoke any name clashes. In

some environments, it is possible to control which external symbols are

100

exported when a shared library is built, and in these cases the undocumented

101

symbols are not exported.

102

103

<a name="SEC2" href="#TOC1">USER DOCUMENTATION</a>

104

109

of searching. The sections are as follows:

110

<pre>

111

pcre this document

112

pcre16 details of the 16-bit library

113

pcre-config show PCRE installation configuration information

114

pcreapi details of PCRE's native C API

115

pcrebuild options for building PCRE

116

pcrecallout details of the callout feature

117

pcrecompat discussion of Perl compatibility

100

pcrecpp details of the C++ wrapper

118

pcrecpp details of the C++ wrapper for the 8-bit library

101

119

pcredemo a demonstration C program that uses PCRE

102

pcregrep description of the pcregrep command

120

pcregrep description of the pcregrep command (8-bit only)

121

pcrejit discussion of the just-in-time optimization support

122

pcrelimits details of size and other limits

103

123

pcrematching discussion of the two matching algorithms

104

124

pcrepartial details of the partial matching facility

105

125

pcrepattern syntax and semantics of supported regular expressions

106

126

pcreperform discussion of performance issues

107

pcreposix the POSIX-compatible C API

127

pcreposix the POSIX-compatible C API for the 8-bit library

108

128

pcreprecompile details of saving and re-using precompiled patterns

109

129

pcresample discussion of the pcredemo program

110

130

pcrestack discussion of stack usage

111

131

pcresyntax quick syntax reference

112

132

pcretest description of the pcretest testing command

133

pcreunicode discussion of Unicode and UTF-8/16 support

113

134

</pre>

114

135

In addition, in the "man" and HTML formats, there is a short page for each

115

C library function, listing its arguments and results.

116

117

<a name="SEC3" href="#TOC1">LIMITATIONS</a>

118

119

There are some size limitations in PCRE but it is hoped that they will never in

120

practice be relevant.

121

122

123

The maximum length of a compiled pattern is 65539 (sic) bytes if PCRE is

124

compiled with the default internal linkage size of 2. If you want to process

125

regular expressions that are truly enormous, you can compile PCRE with an

126

internal linkage size of 3 or 4 (see the README file in the source

127

distribution and the

128

<a href="pcrebuild.html">pcrebuild</a>

129

documentation for details). In these cases the limit is substantially larger.

130

However, the speed of execution is slower.

131

132

133

All values in repeating quantifiers must be less than 65536.

134

135

136

There is no limit to the number of parenthesized subpatterns, but there can be

137

no more than 65535 capturing subpatterns.

138

139

140

The maximum length of name for a named subpattern is 32 characters, and the

141

maximum number of named subpatterns is 10000.

142

143

144

The maximum length of a subject string is the largest positive number that an

145

integer variable can hold. However, when using the traditional matching

146

function, PCRE uses recursion to handle subpatterns and indefinite repetition.

147

This means that the available stack space may limit the size of a subject

148

string that can be processed by certain patterns. For a discussion of stack

149

issues, see the

150

<a href="pcrestack.html">pcrestack</a>

151

documentation.

152

153

<a name="SEC4" href="#TOC1">UTF-8 AND UNICODE PROPERTY SUPPORT</a>

154

155

From release 3.3, PCRE has had some support for character strings encoded in

156

the UTF-8 format. For release 4.0 this was greatly extended to cover most

157

common requirements, and in release 5.0 additional support for Unicode general

158

category properties was added.

159

160

161

In order process UTF-8 strings, you must build PCRE to include UTF-8 support in

162

the code, and, in addition, you must call

163

<a href="pcre_compile.html">pcre_compile()</a>

164

with the PCRE_UTF8 option flag, or the pattern must start with the sequence

165

(*UTF8). When either of these is the case, both the pattern and any subject

166

strings that are matched against it are treated as UTF-8 strings instead of

167

strings of 1-byte characters.

168

169

170

If you compile PCRE with UTF-8 support, but do not use it at run time, the

171

library will be a bit bigger, but the additional run time overhead is limited

172

to testing the PCRE_UTF8 flag occasionally, so should not be very big.

173

174

175

If PCRE is built with Unicode character property support (which implies UTF-8

176

support), the escape sequences \p{..}, \P{..}, and \X are supported.

177

The available properties that can be tested are limited to the general

178

category properties such as Lu for an upper case letter or Nd for a decimal

179

number, the Unicode script names such as Arabic or Han, and the derived

180

properties Any and L&. A full list is given in the

181

<a href="pcrepattern.html">pcrepattern</a>

182

documentation. Only the short names for properties are supported. For example,

183

\p{L} matches a letter. Its Perl synonym, \p{Letter}, is not supported.

184

Furthermore, in Perl, many properties may optionally be prefixed by "Is", for

185

compatibility with Perl 5.6. PCRE does not support this.

186

187

188

Validity of UTF-8 strings

189

190

191

When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects

192

are (by default) checked for validity on entry to the relevant functions. From

193

release 7.3 of PCRE, the check is according the rules of RFC 3629, which are

194

themselves derived from the Unicode specification. Earlier releases of PCRE

195

followed the rules of RFC 2279, which allows the full range of 31-bit values (0

196

to 0x7FFFFFFF). The current check allows only values in the range U+0 to

197

U+10FFFF, excluding U+D800 to U+DFFF.

198

199

200

The excluded code points are the "Low Surrogate Area" of Unicode, of which the

201

Unicode Standard says this: "The Low Surrogate Area does not contain any

202

character assignments, consequently no character code charts or namelists are

203

provided for this area. Surrogates are reserved for use with UTF-16 and then

204

must be used in pairs." The code points that are encoded by UTF-16 pairs are

205

available as independent code points in the UTF-8 encoding. (In other words,

206

the whole surrogate thing is a fudge for UTF-16 which unfortunately messes up

207

UTF-8.)

208

209

210

If an invalid UTF-8 string is passed to PCRE, an error return

211

(PCRE_ERROR_BADUTF8) is given. In some situations, you may already know that

212

your strings are valid, and therefore want to skip these checks in order to

213

improve performance. If you set the PCRE_NO_UTF8_CHECK flag at compile time or

214

at run time, PCRE assumes that the pattern or subject it is given

215

(respectively) contains only valid UTF-8 codes. In this case, it does not

216

diagnose an invalid UTF-8 string.

217

218

219

If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, what

220

happens depends on why the string is invalid. If the string conforms to the

221

"old" definition of UTF-8 (RFC 2279), it is processed as a string of characters

222

in the range 0 to 0x7FFFFFFF. In other words, apart from the initial validity

223

test, PCRE (when in UTF-8 mode) handles strings according to the more liberal

224

rules of RFC 2279. However, if the string does not even conform to RFC 2279,

225

the result is undefined. Your program may crash.

226

227

228

If you want to process strings of values in the full range 0 to 0x7FFFFFFF,

229

encoded in a UTF-8-like manner as per the old RFC, you can set

230

PCRE_NO_UTF8_CHECK to bypass the more restrictive test. However, in this

231

situation, you will have to apply your own validity check.

232

233

234

General comments about UTF-8 mode

235

236

237

1. An unbraced hexadecimal escape sequence (such as \xb3) matches a two-byte

238

UTF-8 character if the value is greater than 127.

239

240

241

2. Octal numbers up to \777 are recognized, and match two-byte UTF-8

242

characters for values greater than \177.

243

244

245

3. Repeat quantifiers apply to complete UTF-8 characters, not to individual

246

bytes, for example: \x{100}{3}.

247

248

249

4. The dot metacharacter matches one UTF-8 character instead of a single byte.

250

251

252

5. The escape sequence \C can be used to match a single byte in UTF-8 mode,

253

but its use can lead to some strange effects. This facility is not available in

254

the alternative matching function, pcre_dfa_exec().

255

256

257

6. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly

258

test characters of any code value, but, by default, the characters that PCRE

259

recognizes as digits, spaces, or word characters remain the same set as before,

260

all with values less than 256. This remains true even when PCRE is built to

261

include Unicode property support, because to do otherwise would slow down PCRE

262

in many common cases. Note in particular that this applies to \b and \B,

263

because they are defined in terms of \w and \W. If you really want to test

264

for a wider sense of, say, "digit", you can use explicit Unicode property tests

265

such as \p{Nd}. Alternatively, if you set the PCRE_UCP option, the way that

266

the character escapes work is changed so that Unicode properties are used to

267

determine which characters match. There are more details in the section on

268

<a href="pcrepattern.html#genericchartypes">generic character types</a>

269

in the

270

<a href="pcrepattern.html">pcrepattern</a>

271

documentation.

272

273

274

7. Similarly, characters that match the POSIX named character classes are all

275

low-valued characters, unless the PCRE_UCP option is set.

276

277

278

8. However, the horizontal and vertical whitespace matching escapes (\h, \H,

279

\v, and \V) do match all the appropriate Unicode characters, whether or not

280

PCRE_UCP is set.

281

282

283

9. Case-insensitive matching applies only to characters whose values are less

284

than 128, unless PCRE is built with Unicode property support. Even when Unicode

285

property support is available, PCRE still uses its own character tables when

286

checking the case of low-valued characters, so as not to degrade performance.

287

The Unicode property information is used only for characters with higher

288

values. Furthermore, PCRE supports case-insensitive matching only when there is

289

a one-to-one mapping between a letter's cases. There are a small number of

290

many-to-one mappings in Unicode; these are not supported by PCRE.

291

292

<a name="SEC5" href="#TOC1">AUTHOR</a>

136

8-bit C library function, listing its arguments and results.

137

138

<a name="SEC3" href="#TOC1">AUTHOR</a>

293

139

294

140

Philip Hazel

295

141

303

149

taken it away. If you want to email me, use my two initials, followed by the

304

150

two digits 10, at the domain cam.ac.uk.

305

151

306

<a name="SEC6" href="#TOC1">REVISION</a>

152

<a name="SEC4" href="#TOC1">REVISION</a>

307

153

308

Last updated: 13 November 2010

154

Last updated: 10 January 2012

309

155

310

156

311

157

312

158

313

159

Return to the <a href="index.html">PCRE index page</a>.

Older »