7
1. The table for translating pcre_compile() error codes into POSIX error codes
8
was out-of-date, and there was no check on the pcre_compile() error code
9
being within the table. This could lead to an OK return being given in
12
2. Changed the call to open a subject file in pcregrep from fopen(pathname,
13
"r") to fopen(pathname, "rb"), which fixed a problem with some of the tests
14
in a Windows environment.
16
3. The pcregrep --count option prints the count for each file even when it is
17
zero, as does GNU grep. However, pcregrep was also printing all files when
18
--files-with-matches was added. Now, when both options are given, it prints
19
counts only for those files that have at least one match. (GNU grep just
20
prints the file name in this circumstance, but including the count seems
21
more useful - otherwise, why use --count?) Also ensured that the
22
combination -clh just lists non-zero counts, with no names.
24
4. The long form of the pcregrep -F option was incorrectly implemented as
25
--fixed_strings instead of --fixed-strings. This is an incompatible change,
26
but it seems right to fix it, and I didn't think it was worth preserving
29
5. The command line items --regex=pattern and --regexp=pattern were not
30
recognized by pcregrep, which required --regex pattern or --regexp pattern
31
(with a space rather than an '='). The man page documented the '=' forms,
32
which are compatible with GNU grep; these now work.
34
6. No libpcreposix.pc file was created for pkg-config; there was just
35
libpcre.pc and libpcrecpp.pc. The omission has been rectified.
37
7. Added #ifndef SUPPORT_UCP into the pcre_ucd.c module, to reduce its size
38
when UCP support is not needed, by modifying the Python script that
39
generates it from Unicode data files. This should not matter if the module
40
is correctly used as a library, but I received one complaint about 50K of
41
unwanted data. My guess is that the person linked everything into his
42
program rather than using a library. Anyway, it does no harm.
44
8. A pattern such as /\x{123}{2,2}+/8 was incorrectly compiled; the trigger
45
was a minimum greater than 1 for a wide character in a possessive
46
repetition. The same bug could also affect patterns like /(\x{ff}{0,2})*/8
47
which had an unlimited repeat of a nested, fixed maximum repeat of a wide
48
character. Chaos in the form of incorrect output or a compiling loop could
51
9. The restrictions on what a pattern can contain when partial matching is
52
requested for pcre_exec() have been removed. All patterns can now be
53
partially matched by this function. In addition, if there are at least two
54
slots in the offset vector, the offset of the earliest inspected character
55
for the match and the offset of the end of the subject are set in them when
56
PCRE_ERROR_PARTIAL is returned.
58
10. Partial matching has been split into two forms: PCRE_PARTIAL_SOFT, which is
59
synonymous with PCRE_PARTIAL, for backwards compatibility, and
60
PCRE_PARTIAL_HARD, which causes a partial match to supersede a full match,
61
and may be more useful for multi-segment matching.
63
11. Partial matching with pcre_exec() is now more intuitive. A partial match
64
used to be given if ever the end of the subject was reached; now it is
65
given only if matching could not proceed because another character was
66
needed. This makes a difference in some odd cases such as Z(*FAIL) with the
67
string "Z", which now yields "no match" instead of "partial match". In the
68
case of pcre_dfa_exec(), "no match" is given if every matching path for the
69
final character ended with (*FAIL).
71
12. Restarting a match using pcre_dfa_exec() after a partial match did not work
72
if the pattern had a "must contain" character that was already found in the
73
earlier partial match, unless partial matching was again requested. For
74
example, with the pattern /dog.(body)?/, the "must contain" character is
75
"g". If the first part-match was for the string "dog", restarting with
76
"sbody" failed. This bug has been fixed.
78
13. The string returned by pcre_dfa_exec() after a partial match has been
79
changed so that it starts at the first inspected character rather than the
80
first character of the match. This makes a difference only if the pattern
81
starts with a lookbehind assertion or \b or \B (\K is not supported by
82
pcre_dfa_exec()). It's an incompatible change, but it makes the two
83
matching functions compatible, and I think it's the right thing to do.
85
14. Added a pcredemo man page, created automatically from the pcredemo.c file,
86
so that the demonstration program is easily available in environments where
87
PCRE has not been installed from source.
89
15. Arranged to add -DPCRE_STATIC to cflags in libpcre.pc, libpcreposix.cp,
90
libpcrecpp.pc and pcre-config when PCRE is not compiled as a shared
93
16. Added REG_UNGREEDY to the pcreposix interface, at the request of a user.
94
It maps to PCRE_UNGREEDY. It is not, of course, POSIX-compatible, but it
95
is not the first non-POSIX option to be added. Clearly some people find
98
17. If a caller to the POSIX matching function regexec() passes a non-zero
99
value for nmatch with a NULL value for pmatch, the value of
100
nmatch is forced to zero.
102
18. RunGrepTest did not have a test for the availability of the -u option of
103
the diff command, as RunTest does. It now checks in the same way as
104
RunTest, and also checks for the -b option.
106
19. If an odd number of negated classes containing just a single character
107
interposed, within parentheses, between a forward reference to a named
108
subpattern and the definition of the subpattern, compilation crashed with
109
an internal error, complaining that it could not find the referenced
110
subpattern. An example of a crashing pattern is /(?&A)(([^m])(?<A>))/.
111
[The bug was that it was starting one character too far in when skipping
112
over the character class, thus treating the ] as data rather than
113
terminating the class. This meant it could skip too much.]
115
20. Added PCRE_NOTEMPTY_ATSTART in order to be able to correctly implement the
116
/g option in pcretest when the pattern contains \K, which makes it possible
117
to have an empty string match not at the start, even when the pattern is
118
anchored. Updated pcretest and pcredemo to use this option.
120
21. If the maximum number of capturing subpatterns in a recursion was greater
121
than the maximum at the outer level, the higher number was returned, but
122
with unset values at the outer level. The correct (outer level) value is
125
22. If (*ACCEPT) appeared inside capturing parentheses, previous releases of
126
PCRE did not set those parentheses (unlike Perl). I have now found a way to
127
make it do so. The string so far is captured, making this feature
128
compatible with Perl.
130
23. The tests have been re-organized, adding tests 11 and 12, to make it
131
possible to check the Perl 5.10 features against Perl 5.10.
133
24. Perl 5.10 allows subroutine calls in lookbehinds, as long as the subroutine
134
pattern matches a fixed length string. PCRE did not allow this; now it
135
does. Neither allows recursion.
137
25. I finally figured out how to implement a request to provide the minimum
138
length of subject string that was needed in order to match a given pattern.
139
(It was back references and recursion that I had previously got hung up
140
on.) This code has now been added to pcre_study(); it finds a lower bound
141
to the length of subject needed. It is not necessarily the greatest lower
142
bound, but using it to avoid searching strings that are too short does give
143
some useful speed-ups. The value is available to calling programs via
146
26. While implementing 25, I discovered to my embarrassment that pcretest had
147
not been passing the result of pcre_study() to pcre_dfa_exec(), so the
148
study optimizations had never been tested with that matching function.
149
Oops. What is worse, even when it was passed study data, there was a bug in
150
pcre_dfa_exec() that meant it never actually used it. Double oops. There
151
were also very few tests of studied patterns with pcre_dfa_exec().
153
27. If (?| is used to create subpatterns with duplicate numbers, they are now
154
allowed to have the same name, even if PCRE_DUPNAMES is not set. However,
155
on the other side of the coin, they are no longer allowed to have different
156
names, because these cannot be distinguished in PCRE, and this has caused
157
confusion. (This is a difference from Perl.)
159
28. When duplicate subpattern names are present (necessarily with different
160
numbers, as required by 27 above), and a test is made by name in a
161
conditional pattern, either for a subpattern having been matched, or for
162
recursion in such a pattern, all the associated numbered subpatterns are
163
tested, and the overall condition is true if the condition is true for any
164
one of them. This is the way Perl works, and is also more like the way
165
testing by number works.
4
168
Version 7.9 11-Apr-09
5
169
---------------------