~ubuntu-branches/ubuntu/trusty/pcre3/trusty

Committer: Package Import Robot
Author(s): Mark Baker
Date: 2012-03-23 22:34:54 UTC
mfrom: (23.1.9 sid)
Revision ID: package-import@ubuntu.com-20120323223454-grhqqolk8a7x1h24

Tags: 1:8.30-4

* Reluctantly using an epoch, as it seems the funny version number with
extra dots causes problems
* Bumped standard version to 3.9.3. No changes needed
* Converted to use new source format / quilt
* Put back obsolete pcre_info() API that upstream have dropped (Closes:
#665300, #665356)
* Don't include pcregrep binary in debug package

Thanks to Elimar Riesebieter for the conversion to the new source format.

files added:
.pc

.pc/.version

.pc/PCRE6_compatible_API.patch

.pc/PCRE6_compatible_API.patch/pcrecpp.cc

.pc/PCRE6_compatible_API.patch/pcrecpp.h

.pc/PCRE6_compatible_API.patch/pcretest.c

.pc/applied-patches

.pc/pcre_info.patch

.pc/pcre_info.patch/Makefile.am

.pc/pcre_info.patch/Makefile.in

.pc/pcre_info.patch/pcre_info.c

.pc/pcregrep.1-patch

.pc/pcregrep.1-patch/doc

.pc/pcregrep.1-patch/doc/pcregrep.1

.pc/pcreposix.patch

.pc/pcreposix.patch/pcreposix.h

.pc/soname.patch

.pc/soname.patch/configure

CheckMan

debian/patches

debian/patches/PCRE6_compatible_API.patch

debian/patches/pcre_info.patch

debian/patches/pcregrep.1-patch

debian/patches/pcreposix.patch

debian/patches/series

debian/patches/soname.patch

debian/source/options

doc/html/pcre16.html

doc/html/pcre_assign_jit_stack.html

doc/html/pcre_free_study.html

doc/html/pcre_jit_stack_alloc.html

doc/html/pcre_jit_stack_free.html

doc/html/pcre_pattern_to_host_byte_order.html

doc/html/pcre_utf16_to_host_byte_order.html

doc/html/pcrejit.html

doc/html/pcrelimits.html

doc/html/pcreunicode.html

doc/pcre16.3

doc/pcre_assign_jit_stack.3

doc/pcre_free_study.3

doc/pcre_jit_stack_alloc.3

doc/pcre_jit_stack_free.3

doc/pcre_pattern_to_host_byte_order.3

doc/pcre_utf16_to_host_byte_order.3

doc/pcrejit.3

doc/pcrelimits.3

doc/pcreunicode.3

libpcre16.pc.in

pcre16_byte_order.c

pcre16_chartables.c

pcre16_compile.c

pcre16_config.c

pcre16_dfa_exec.c

pcre16_exec.c

pcre16_fullinfo.c

pcre16_get.c

pcre16_globals.c

pcre16_jit_compile.c

pcre16_maketables.c

pcre16_newline.c

pcre16_ord2utf16.c

pcre16_printint.c

pcre16_refcount.c

pcre16_string_utils.c

pcre16_study.c

pcre16_tables.c

pcre16_ucd.c

pcre16_utf16_utils.c

pcre16_valid_utf16.c

pcre16_version.c

pcre16_xclass.c

pcre_byte_order.c

pcre_jit_compile.c

pcre_jit_test.c

pcre_printint.c

pcre_string_utils.c

sljit

sljit/sljitConfig.h

sljit/sljitConfigInternal.h

sljit/sljitExecAllocator.c

sljit/sljitLir.c

sljit/sljitLir.h

sljit/sljitNativeARM_Thumb2.c

sljit/sljitNativeARM_v5.c

sljit/sljitNativeMIPS_32.c

sljit/sljitNativeMIPS_common.c

sljit/sljitNativePPC_32.c

sljit/sljitNativePPC_64.c

sljit/sljitNativePPC_common.c

sljit/sljitNativeX86_32.c

sljit/sljitNativeX86_64.c

sljit/sljitNativeX86_common.c

sljit/sljitUtils.c

testdata/greppatN4

testdata/saved16

testdata/saved16BE-1

testdata/saved16BE-2

testdata/saved16LE-1

testdata/saved16LE-2

testdata/saved8

testdata/testinput13

testdata/testinput14

testdata/testinput15

testdata/testinput16

testdata/testinput17

testdata/testinput18

testdata/testinput19

testdata/testinput20

testdata/testinput21

testdata/testinput22

testdata/testoutput11-16

testdata/testoutput11-8

testdata/testoutput13

testdata/testoutput14

testdata/testoutput15

testdata/testoutput16

testdata/testoutput17

testdata/testoutput18

testdata/testoutput19

testdata/testoutput20

testdata/testoutput21

testdata/testoutput22

files removed:
doc/html/pcre_info.html

doc/pcre_info.3

pcre_printint.src

pcre_try_flipped.c

testdata/testoutput11

files modified:
AUTHORS

CMakeLists.txt

ChangeLog

HACKING

LICENCE

Makefile.am

Makefile.in

NEWS

NON-UNIX-USE

PrepareRelease

README

RunGrepTest

RunTest

RunTest.bat

aclocal.m4

config-cmake.h.in

config.guess

config.h.generic

config.h.in

config.sub

configure

configure.ac

debian/changelog

debian/control

debian/rules

debian/source/format

dftables.c

doc/html/index.html

doc/html/pcre-config.html

doc/html/pcre.html

doc/html/pcre_compile.html

doc/html/pcre_compile2.html

doc/html/pcre_config.html

doc/html/pcre_copy_named_substring.html

doc/html/pcre_copy_substring.html

doc/html/pcre_dfa_exec.html

doc/html/pcre_exec.html

doc/html/pcre_free_substring.html

doc/html/pcre_free_substring_list.html

doc/html/pcre_fullinfo.html

doc/html/pcre_get_named_substring.html

doc/html/pcre_get_stringnumber.html

doc/html/pcre_get_stringtable_entries.html

doc/html/pcre_get_substring.html

doc/html/pcre_get_substring_list.html

doc/html/pcre_maketables.html

doc/html/pcre_refcount.html

doc/html/pcre_study.html

doc/html/pcre_version.html

doc/html/pcreapi.html

doc/html/pcrebuild.html

doc/html/pcrecallout.html

doc/html/pcrecompat.html

doc/html/pcrecpp.html

doc/html/pcregrep.html

doc/html/pcrematching.html

doc/html/pcrepartial.html

doc/html/pcrepattern.html

doc/html/pcreperform.html

doc/html/pcreposix.html

doc/html/pcreprecompile.html

doc/html/pcresample.html

doc/html/pcrestack.html

doc/html/pcresyntax.html

doc/html/pcretest.html

doc/index.html.src

doc/pcre-config.1

doc/pcre-config.txt

doc/pcre.3

doc/pcre.txt

doc/pcre_compile.3

doc/pcre_compile2.3

doc/pcre_config.3

doc/pcre_copy_named_substring.3

doc/pcre_copy_substring.3

doc/pcre_dfa_exec.3

doc/pcre_exec.3

doc/pcre_free_substring.3

doc/pcre_free_substring_list.3

doc/pcre_fullinfo.3

doc/pcre_get_named_substring.3

doc/pcre_get_stringnumber.3

doc/pcre_get_stringtable_entries.3

doc/pcre_get_substring.3

doc/pcre_get_substring_list.3

doc/pcre_maketables.3

doc/pcre_refcount.3

doc/pcre_study.3

doc/pcre_version.3

doc/pcreapi.3

doc/pcrebuild.3

doc/pcrecallout.3

doc/pcrecompat.3

doc/pcrecpp.3

doc/pcregrep.1

doc/pcregrep.txt

doc/pcrematching.3

doc/pcrepartial.3

doc/pcrepattern.3

doc/pcreperform.3

doc/pcreposix.3

doc/pcreprecompile.3

doc/pcresample.3

doc/pcrestack.3

doc/pcresyntax.3

doc/pcretest.1

doc/pcretest.txt

doc/perltest.txt

libpcre.pc.in

ltmain.sh

makevp_c.txt

makevp_l.txt

pcre-config.in

pcre.h.generic

pcre.h.in

pcre_chartables.c.dist

pcre_compile.c

pcre_config.c

pcre_dfa_exec.c

pcre_exec.c

pcre_fullinfo.c

pcre_get.c

pcre_globals.c

pcre_info.c

pcre_internal.h

pcre_maketables.c

pcre_newline.c

pcre_ord2utf8.c

pcre_refcount.c

pcre_scanner_unittest.cc

pcre_study.c

pcre_tables.c

pcre_ucd.c

pcre_valid_utf8.c

pcre_version.c

pcre_xclass.c

pcrecpp.cc

pcrecpp_unittest.cc

pcregrep.c

pcreposix.c

pcreposix.h

pcretest.c

perltest.pl

testdata/grepinput

testdata/grepoutput

testdata/testinput1

testdata/testinput10

testdata/testinput11

testdata/testinput12

testdata/testinput2

testdata/testinput4

testdata/testinput5

testdata/testinput6

testdata/testinput7

testdata/testinput8

testdata/testinput9

testdata/testoutput1

testdata/testoutput10

testdata/testoutput12

testdata/testoutput2

testdata/testoutput4

testdata/testoutput5

testdata/testoutput6

testdata/testoutput7

testdata/testoutput8

testdata/testoutput9

ucp.h

Show diffs side-by-side

added added

removed removed

doc/pcreunicode.3

.TH PCREUNICODE 3

.SH NAME

PCRE - Perl-compatible regular expressions

.SH "UTF-8, UTF-16, AND UNICODE PROPERTY SUPPORT"

.rs

.sp

From Release 8.30, in addition to its previous UTF-8 support, PCRE also

supports UTF-16 by means of a separate 16-bit library. This can be built as

well as, or instead of, the 8-bit library.

.SH "UTF-8 SUPPORT"

.rs

.sp

In order process UTF-8 strings, you must build PCRE's 8-bit library with UTF

support, and, in addition, you must call

.\" HREF

\fBpcre_compile()\fP

.\"

with the PCRE_UTF8 option flag, or the pattern must start with the sequence

(*UTF8). When either of these is the case, both the pattern and any subject

strings that are matched against it are treated as UTF-8 strings instead of

strings of 1-byte characters.

.SH "UTF-16 SUPPORT"

.rs

.sp

In order process UTF-16 strings, you must build PCRE's 16-bit library with UTF

support, and, in addition, you must call

.\" HTML <a href="pcre_compile.html">

.\" </a>

\fBpcre16_compile()\fP

.\"

with the PCRE_UTF16 option flag, or the pattern must start with the sequence

(*UTF16). When either of these is the case, both the pattern and any subject

strings that are matched against it are treated as UTF-16 strings instead of

strings of 16-bit characters.

.SH "UTF SUPPORT OVERHEAD"

.rs

.sp

If you compile PCRE with UTF support, but do not use it at run time, the

library will be a bit bigger, but the additional run time overhead is limited

to testing the PCRE_UTF8/16 flag occasionally, so should not be very big.

.SH "UNICODE PROPERTY SUPPORT"

.rs

.sp

If PCRE is built with Unicode character property support (which implies UTF

support), the escape sequences \ep{..}, \eP{..}, and \eX can be used.

The available properties that can be tested are limited to the general

category properties such as Lu for an upper case letter or Nd for a decimal

number, the Unicode script names such as Arabic or Han, and the derived

properties Any and L&. A full list is given in the

.\" HREF

\fBpcrepattern\fP

.\"

documentation. Only the short names for properties are supported. For example,

\ep{L} matches a letter. Its Perl synonym, \ep{Letter}, is not supported.

Furthermore, in Perl, many properties may optionally be prefixed by "Is", for

compatibility with Perl 5.6. PCRE does not support this.

.\" HTML <a name="utf8strings"></a>

.SS "Validity of UTF-8 strings"

.rs

.sp

When you set the PCRE_UTF8 flag, the byte strings passed as patterns and

subjects are (by default) checked for validity on entry to the relevant

functions. From release 7.3 of PCRE, the check is according the rules of RFC

3629, which are themselves derived from the Unicode specification. Earlier

releases of PCRE followed the rules of RFC 2279, which allows the full range of

31-bit values (0 to 0x7FFFFFFF). The current check allows only values in the

range U+0 to U+10FFFF, excluding U+D800 to U+DFFF.

The excluded code points are the "Surrogate Area" of Unicode. They are reserved

for use by UTF-16, where they are used in pairs to encode codepoints with

values greater than 0xFFFF. The code points that are encoded by UTF-16 pairs

are available independently in the UTF-8 encoding. (In other words, the whole

surrogate thing is a fudge for UTF-16 which unfortunately messes up UTF-8.)

If an invalid UTF-8 string is passed to PCRE, an error return is given. At

compile time, the only additional information is the offset to the first byte

of the failing character. The runtime functions \fBpcre_exec()\fP and

\fBpcre_dfa_exec()\fP also pass back this information, as well as a more

detailed reason code if the caller has provided memory in which to do this.

In some situations, you may already know that your strings are valid, and

therefore want to skip these checks in order to improve performance. If you set

the PCRE_NO_UTF8_CHECK flag at compile time or at run time, PCRE assumes that

the pattern or subject it is given (respectively) contains only valid UTF-8

codes. In this case, it does not diagnose an invalid UTF-8 string.

If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, what

happens depends on why the string is invalid. If the string conforms to the

"old" definition of UTF-8 (RFC 2279), it is processed as a string of characters

100

in the range 0 to 0x7FFFFFFF by \fBpcre_dfa_exec()\fP and the interpreted

101

version of \fBpcre_exec()\fP. In other words, apart from the initial validity

102

test, these functions (when in UTF-8 mode) handle strings according to the more

103

liberal rules of RFC 2279. However, the just-in-time (JIT) optimization for

104

\fBpcre_exec()\fP supports only RFC 3629. If you are using JIT optimization, or

105

if the string does not even conform to RFC 2279, the result is undefined. Your

106

program may crash.

107

108

If you want to process strings of values in the full range 0 to 0x7FFFFFFF,

109

encoded in a UTF-8-like manner as per the old RFC, you can set

110

PCRE_NO_UTF8_CHECK to bypass the more restrictive test. However, in this

111

situation, you will have to apply your own validity check, and avoid the use of

112

JIT optimization.

113

114

115

.\" HTML <a name="utf16strings"></a>

116

.SS "Validity of UTF-16 strings"

117

.rs

118

.sp

119

When you set the PCRE_UTF16 flag, the strings of 16-bit data units that are

120

passed as patterns and subjects are (by default) checked for validity on entry

121

to the relevant functions. Values other than those in the surrogate range

122

U+D800 to U+DFFF are independent code points. Values in the surrogate range

123

must be used in pairs in the correct manner.

124

125

If an invalid UTF-16 string is passed to PCRE, an error return is given. At

126

compile time, the only additional information is the offset to the first data

127

unit of the failing character. The runtime functions \fBpcre16_exec()\fP and

128

\fBpcre16_dfa_exec()\fP also pass back this information, as well as a more

129

detailed reason code if the caller has provided memory in which to do this.

130

131

In some situations, you may already know that your strings are valid, and

132

therefore want to skip these checks in order to improve performance. If you set

133

the PCRE_NO_UTF16_CHECK flag at compile time or at run time, PCRE assumes that

134

the pattern or subject it is given (respectively) contains only valid UTF-16

135

sequences. In this case, it does not diagnose an invalid UTF-16 string.

136

137

138

.SS "General comments about UTF modes"

139

.rs

140

.sp

141

1. Codepoints less than 256 can be specified by either braced or unbraced

142

hexadecimal escape sequences (for example, \ex{b3} or \exb3). Larger values

143

have to use braced sequences.

144

145

2. Octal numbers up to \e777 are recognized, and in UTF-8 mode, they match

146

two-byte characters for values greater than \e177.

147

148

3. Repeat quantifiers apply to complete UTF characters, not to individual

149

data units, for example: \ex{100}{3}.

150

151

4. The dot metacharacter matches one UTF character instead of a single data

152

unit.

153

154

5. The escape sequence \eC can be used to match a single byte in UTF-8 mode, or

155

a single 16-bit data unit in UTF-16 mode, but its use can lead to some strange

156

effects because it breaks up multi-unit characters (see the description of \eC

157

in the

158

.\" HREF

159

\fBpcrepattern\fP

160

.\"

161

documentation). The use of \eC is not supported in the alternative matching

162

function \fBpcre[16]_dfa_exec()\fP, nor is it supported in UTF mode by the JIT

163

optimization of \fBpcre[16]_exec()\fP. If JIT optimization is requested for a

164

UTF pattern that contains \eC, it will not succeed, and so the matching will

165

be carried out by the normal interpretive function.

166

167

6. The character escapes \eb, \eB, \ed, \eD, \es, \eS, \ew, and \eW correctly

168

test characters of any code value, but, by default, the characters that PCRE

169

recognizes as digits, spaces, or word characters remain the same set as in

170

non-UTF mode, all with values less than 256. This remains true even when PCRE

171

is built to include Unicode property support, because to do otherwise would

172

slow down PCRE in many common cases. Note in particular that this applies to

173

\eb and \eB, because they are defined in terms of \ew and \eW. If you really

174

want to test for a wider sense of, say, "digit", you can use explicit Unicode

175

property tests such as \ep{Nd}. Alternatively, if you set the PCRE_UCP option,

176

the way that the character escapes work is changed so that Unicode properties

177

are used to determine which characters match. There are more details in the

178

section on

179

.\" HTML <a href="pcrepattern.html#genericchartypes">

180

.\" </a>

181

generic character types

182

.\"

183

in the

184

.\" HREF

185

\fBpcrepattern\fP

186

.\"

187

documentation.

188

189

7. Similarly, characters that match the POSIX named character classes are all

190

low-valued characters, unless the PCRE_UCP option is set.

191

192

8. However, the horizontal and vertical whitespace matching escapes (\eh, \eH,

193

\ev, and \eV) do match all the appropriate Unicode characters, whether or not

194

PCRE_UCP is set.

195

196

9. Case-insensitive matching applies only to characters whose values are less

197

than 128, unless PCRE is built with Unicode property support. Even when Unicode

198

property support is available, PCRE still uses its own character tables when

199

checking the case of low-valued characters, so as not to degrade performance.

200

The Unicode property information is used only for characters with higher

201

values. Furthermore, PCRE supports case-insensitive matching only when there is

202

a one-to-one mapping between a letter's cases. There are a small number of

203

many-to-one mappings in Unicode; these are not supported by PCRE.

204

205

206

.SH AUTHOR

207

.rs

208

.sp

209

.nf

210

Philip Hazel

211

University Computing Service

212

Cambridge CB2 3QH, England.

213

.fi

214

215

216

.SH REVISION

217

.rs

218

.sp

219

.nf

220

Last updated: 13 January 2012

221

222

.fi

Older »