1
This is Info file regex.info, produced by Makeinfo-1.52 from the input
2
file .././doc/regex.texi.
4
This file documents the GNU regular expression library.
6
Copyright (C) 1992, 1993 Free Software Foundation, Inc.
8
Permission is granted to make and distribute verbatim copies of this
9
manual provided the copyright notice and this permission notice are
10
preserved on all copies.
12
Permission is granted to copy and distribute modified versions of this
13
manual under the conditions for verbatim copying, provided also that the
14
section entitled "GNU General Public License" is included exactly as in
15
the original, and provided that the entire resulting derived work is
16
distributed under the terms of a permission notice identical to this
19
Permission is granted to copy and distribute translations of this
20
manual into another language, under the above conditions for modified
21
versions, except that the section entitled "GNU General Public License"
22
may be included in a translation approved by the Free Software
23
Foundation instead of in the original English.
26
File: regex.info, Node: Top, Next: Overview, Prev: (dir), Up: (dir)
28
Regular Expression Library
29
**************************
31
This manual documents how to program with the GNU regular expression
32
library. This is edition 0.12a of the manual, 19 September 1992.
34
The first part of this master menu lists the major nodes in this Info
35
document, including the index. The rest of the menu lists all the
36
lower level nodes in the document.
41
* Regular Expression Syntax::
44
* GNU Emacs Operators::
45
* What Gets Matched?::
46
* Programming with Regex::
47
* Copying:: Copying and sharing Regex.
48
* Index:: General index.
49
-- The Detailed Node Listing --
51
Regular Expression Syntax
54
* Predefined Syntaxes::
55
* Collating Elements vs. Characters::
56
* The Backslash Character::
60
* Match-self Operator:: Ordinary characters.
61
* Match-any-character Operator:: .
62
* Concatenation Operator:: Juxtaposition.
63
* Repetition Operators:: * + ? {}
64
* Alternation Operator:: |
65
* List Operators:: [...] [^...]
66
* Grouping Operators:: (...)
67
* Back-reference Operator:: \digit
68
* Anchoring Operators:: ^ $
72
* Match-zero-or-more Operator:: *
73
* Match-one-or-more Operator:: +
74
* Match-zero-or-one Operator:: ?
75
* Interval Operators:: {}
77
List Operators (`[' ... `]' and `[^' ... `]')
79
* Character Class Operators:: [:class:]
80
* Range Operator:: start-end
84
* Match-beginning-of-line Operator:: ^
85
* Match-end-of-line Operator:: $
94
* Non-Emacs Syntax Tables::
95
* Match-word-boundary Operator:: \b
96
* Match-within-word Operator:: \B
97
* Match-beginning-of-word Operator:: \<
98
* Match-end-of-word Operator:: \>
99
* Match-word-constituent Operator:: \w
100
* Match-non-word-constituent Operator:: \W
104
* Match-beginning-of-buffer Operator:: \`
105
* Match-end-of-buffer Operator:: \'
109
* Syntactic Class Operators::
111
Syntactic Class Operators
113
* Emacs Syntax Tables::
114
* Match-syntactic-class Operator:: \sCLASS
115
* Match-not-syntactic-class Operator:: \SCLASS
117
Programming with Regex
119
* GNU Regex Functions::
120
* POSIX Regex Functions::
121
* BSD Regex Functions::
125
* GNU Pattern Buffers:: The re_pattern_buffer type.
126
* GNU Regular Expression Compiling:: re_compile_pattern ()
127
* GNU Matching:: re_match ()
128
* GNU Searching:: re_search ()
129
* Matching/Searching with Split Data:: re_match_2 (), re_search_2 ()
130
* Searching with Fastmaps:: re_compile_fastmap ()
131
* GNU Translate Tables:: The `translate' field.
132
* Using Registers:: The re_registers type and related fns.
133
* Freeing GNU Pattern Buffers:: regfree ()
135
POSIX Regex Functions
137
* POSIX Pattern Buffers:: The regex_t type.
138
* POSIX Regular Expression Compiling:: regcomp ()
139
* POSIX Matching:: regexec ()
140
* Reporting Errors:: regerror ()
141
* Using Byte Offsets:: The regmatch_t type.
142
* Freeing POSIX Pattern Buffers:: regfree ()
146
* BSD Regular Expression Compiling:: re_comp ()
147
* BSD Searching:: re_exec ()
150
File: regex.info, Node: Overview, Next: Regular Expression Syntax, Prev: Top, Up: Top
155
A "regular expression" (or "regexp", or "pattern") is a text string
156
that describes some (mathematical) set of strings. A regexp R
157
"matches" a string S if S is in the set of strings described by R.
159
Using the Regex library, you can:
161
* see if a string matches a specified pattern as a whole, and
163
* search within a string for a substring matching a specified
166
Some regular expressions match only one string, i.e., the set they
167
describe has only one member. For example, the regular expression
168
`foo' matches the string `foo' and no others. Other regular
169
expressions match more than one string, i.e., the set they describe has
170
more than one member. For example, the regular expression `f*' matches
171
the set of strings made up of any number (including zero) of `f's. As
172
you can see, some characters in regular expressions match themselves
173
(such as `f') and some don't (such as `*'); the ones that don't match
174
themselves instead let you specify patterns that describe many
177
To either match or search for a regular expression with the Regex
178
library functions, you must first compile it with a Regex pattern
179
compiling function. A "compiled pattern" is a regular expression
180
converted to the internal format used by the library functions. Once
181
you've compiled a pattern, you can use it for matching or searching any
184
The Regex library consists of two source files: `regex.h' and
185
`regex.c'. Regex provides three groups of functions with which you can
186
operate on regular expressions. One group--the GNU group--is more
187
powerful but not completely compatible with the other two, namely the
188
POSIX and Berkeley UNIX groups; its interface was designed specifically
189
for GNU. The other groups have the same interfaces as do the regular
190
expression functions in POSIX and Berkeley UNIX.
192
We wrote this chapter with programmers in mind, not users of
193
programs--such as Emacs--that use Regex. We describe the Regex library
194
in its entirety, not how to write regular expressions that a particular
198
File: regex.info, Node: Regular Expression Syntax, Next: Common Operators, Prev: Overview, Up: Top
200
Regular Expression Syntax
201
*************************
203
"Characters" are things you can type. "Operators" are things in a
204
regular expression that match one or more characters. You compose
205
regular expressions from operators, which in turn you specify using one
208
Most characters represent what we call the match-self operator, i.e.,
209
they match themselves; we call these characters "ordinary". Other
210
characters represent either all or parts of fancier operators; e.g.,
211
`.' represents what we call the match-any-character operator (which, no
212
surprise, matches (almost) any character); we call these characters
213
"special". Two different things determine what characters represent
216
1. the regular expression syntax your program has told the Regex
217
library to recognize, and
219
2. the context of the character in the regular expression.
221
In the following sections, we describe these things in more detail.
226
* Predefined Syntaxes::
227
* Collating Elements vs. Characters::
228
* The Backslash Character::
231
File: regex.info, Node: Syntax Bits, Next: Predefined Syntaxes, Up: Regular Expression Syntax
236
In any particular syntax for regular expressions, some characters are
237
always special, others are sometimes special, and others are never
238
special. The particular syntax that Regex recognizes for a given
239
regular expression depends on the value in the `syntax' field of the
240
pattern buffer of that regular expression.
242
You get a pattern buffer by compiling a regular expression. *Note
243
GNU Pattern Buffers::, and *Note POSIX Pattern Buffers::, for more
244
information on pattern buffers. *Note GNU Regular Expression
245
Compiling::, *Note POSIX Regular Expression Compiling::, and *Note BSD
246
Regular Expression Compiling::, for more information on compiling.
248
Regex considers the value of the `syntax' field to be a collection of
249
bits; we refer to these bits as "syntax bits". In most cases, they
250
affect what characters represent what operators. We describe the
251
meanings of the operators to which we refer in *Note Common Operators::,
252
*Note GNU Operators::, and *Note GNU Emacs Operators::.
254
For reference, here is the complete list of syntax bits, in
257
`RE_BACKSLASH_ESCAPE_IN_LISTS'
258
If this bit is set, then `\' inside a list (*note List Operators::.
259
quotes (makes ordinary, if it's special) the following character;
260
if this bit isn't set, then `\' is an ordinary character inside
261
lists. (*Note The Backslash Character::, for what `\' does
265
If this bit is set, then `\+' represents the match-one-or-more
266
operator and `\?' represents the match-zero-or-more operator; if
267
this bit isn't set, then `+' represents the match-one-or-more
268
operator and `?' represents the match-zero-or-one operator. This
269
bit is irrelevant if `RE_LIMITED_OPS' is set.
272
If this bit is set, then you can use character classes in lists;
273
if this bit isn't set, then you can't.
275
`RE_CONTEXT_INDEP_ANCHORS'
276
If this bit is set, then `^' and `$' are special anywhere outside
277
a list; if this bit isn't set, then these characters are special
278
only in certain contexts. *Note Match-beginning-of-line
279
Operator::, and *Note Match-end-of-line Operator::.
281
`RE_CONTEXT_INDEP_OPS'
282
If this bit is set, then certain characters are special anywhere
283
outside a list; if this bit isn't set, then those characters are
284
special only in some contexts and are ordinary elsewhere.
285
Specifically, if this bit isn't set then `*', and (if the syntax
286
bit `RE_LIMITED_OPS' isn't set) `+' and `?' (or `\+' and `\?',
287
depending on the syntax bit `RE_BK_PLUS_QM') represent repetition
288
operators only if they're not first in a regular expression or
289
just after an open-group or alternation operator. The same holds
290
for `{' (or `\{', depending on the syntax bit `RE_NO_BK_BRACES') if
291
it is the beginning of a valid interval and the syntax bit
292
`RE_INTERVALS' is set.
294
`RE_CONTEXT_INVALID_OPS'
295
If this bit is set, then repetition and alternation operators
296
can't be in certain positions within a regular expression.
297
Specifically, the regular expression is invalid if it has:
299
* a repetition operator first in the regular expression or just
300
after a match-beginning-of-line, open-group, or alternation
303
* an alternation operator first or last in the regular
304
expression, just before a match-end-of-line operator, or just
305
after an alternation or open-group operator.
307
If this bit isn't set, then you can put the characters
308
representing the repetition and alternation characters anywhere in
309
a regular expression. Whether or not they will in fact be
310
operators in certain positions depends on other syntax bits.
313
If this bit is set, then the match-any-character operator matches
314
a newline; if this bit isn't set, then it doesn't.
317
If this bit is set, then the match-any-character operator doesn't
318
match a null character; if this bit isn't set, then it does.
321
If this bit is set, then Regex recognizes interval operators; if
322
this bit isn't set, then it doesn't.
325
If this bit is set, then Regex doesn't recognize the
326
match-one-or-more, match-zero-or-one or alternation operators; if
327
this bit isn't set, then it does.
330
If this bit is set, then newline represents the alternation
331
operator; if this bit isn't set, then newline is ordinary.
334
If this bit is set, then `{' represents the open-interval operator
335
and `}' represents the close-interval operator; if this bit isn't
336
set, then `\{' represents the open-interval operator and `\}'
337
represents the close-interval operator. This bit is relevant only
338
if `RE_INTERVALS' is set.
341
If this bit is set, then `(' represents the open-group operator and
342
`)' represents the close-group operator; if this bit isn't set,
343
then `\(' represents the open-group operator and `\)' represents
344
the close-group operator.
347
If this bit is set, then Regex doesn't recognize `\'DIGIT as the
348
back reference operator; if this bit isn't set, then it does.
351
If this bit is set, then `|' represents the alternation operator;
352
if this bit isn't set, then `\|' represents the alternation
353
operator. This bit is irrelevant if `RE_LIMITED_OPS' is set.
356
If this bit is set, then a regular expression with a range whose
357
ending point collates lower than its starting point is invalid; if
358
this bit isn't set, then Regex considers such a range to be empty.
360
`RE_UNMATCHED_RIGHT_PAREN_ORD'
361
If this bit is set and the regular expression has no matching
362
open-group operator, then Regex considers what would otherwise be
363
a close-group operator (based on how `RE_NO_BK_PARENS' is set) to
367
File: regex.info, Node: Predefined Syntaxes, Next: Collating Elements vs. Characters, Prev: Syntax Bits, Up: Regular Expression Syntax
372
If you're programming with Regex, you can set a pattern buffer's
373
(*note GNU Pattern Buffers::., and *Note POSIX Pattern Buffers::)
374
`syntax' field either to an arbitrary combination of syntax bits (*note
375
Syntax Bits::.) or else to the configurations defined by Regex. These
376
configurations define the syntaxes used by certain programs--GNU Emacs,
377
POSIX Awk, traditional Awk, Grep, Egrep--in addition to syntaxes for
378
POSIX basic and extended regular expressions.
380
The predefined syntaxes-taken directly from `regex.h'--are:
382
#define RE_SYNTAX_EMACS 0
384
#define RE_SYNTAX_AWK \
385
(RE_BACKSLASH_ESCAPE_IN_LISTS | RE_DOT_NOT_NULL \
386
| RE_NO_BK_PARENS | RE_NO_BK_REFS \
387
| RE_NO_BK_VBAR | RE_NO_EMPTY_RANGES \
388
| RE_UNMATCHED_RIGHT_PAREN_ORD)
390
#define RE_SYNTAX_POSIX_AWK \
391
(RE_SYNTAX_POSIX_EXTENDED | RE_BACKSLASH_ESCAPE_IN_LISTS)
393
#define RE_SYNTAX_GREP \
394
(RE_BK_PLUS_QM | RE_CHAR_CLASSES \
395
| RE_HAT_LISTS_NOT_NEWLINE | RE_INTERVALS \
398
#define RE_SYNTAX_EGREP \
399
(RE_CHAR_CLASSES | RE_CONTEXT_INDEP_ANCHORS \
400
| RE_CONTEXT_INDEP_OPS | RE_HAT_LISTS_NOT_NEWLINE \
401
| RE_NEWLINE_ALT | RE_NO_BK_PARENS \
404
#define RE_SYNTAX_POSIX_EGREP \
405
(RE_SYNTAX_EGREP | RE_INTERVALS | RE_NO_BK_BRACES)
407
/* P1003.2/D11.2, section 4.20.7.1, lines 5078ff. */
408
#define RE_SYNTAX_ED RE_SYNTAX_POSIX_BASIC
410
#define RE_SYNTAX_SED RE_SYNTAX_POSIX_BASIC
412
/* Syntax bits common to both basic and extended POSIX regex syntax. */
413
#define _RE_SYNTAX_POSIX_COMMON \
414
(RE_CHAR_CLASSES | RE_DOT_NEWLINE | RE_DOT_NOT_NULL \
415
| RE_INTERVALS | RE_NO_EMPTY_RANGES)
417
#define RE_SYNTAX_POSIX_BASIC \
418
(_RE_SYNTAX_POSIX_COMMON | RE_BK_PLUS_QM)
420
/* Differs from ..._POSIX_BASIC only in that RE_BK_PLUS_QM becomes
421
RE_LIMITED_OPS, i.e., \? \+ \| are not recognized. Actually, this
422
isn't minimal, since other operators, such as \`, aren't disabled. */
423
#define RE_SYNTAX_POSIX_MINIMAL_BASIC \
424
(_RE_SYNTAX_POSIX_COMMON | RE_LIMITED_OPS)
426
#define RE_SYNTAX_POSIX_EXTENDED \
427
(_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \
428
| RE_CONTEXT_INDEP_OPS | RE_NO_BK_BRACES \
429
| RE_NO_BK_PARENS | RE_NO_BK_VBAR \
430
| RE_UNMATCHED_RIGHT_PAREN_ORD)
432
/* Differs from ..._POSIX_EXTENDED in that RE_CONTEXT_INVALID_OPS
433
replaces RE_CONTEXT_INDEP_OPS and RE_NO_BK_REFS is added. */
434
#define RE_SYNTAX_POSIX_MINIMAL_EXTENDED \
435
(_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \
436
| RE_CONTEXT_INVALID_OPS | RE_NO_BK_BRACES \
437
| RE_NO_BK_PARENS | RE_NO_BK_REFS \
438
| RE_NO_BK_VBAR | RE_UNMATCHED_RIGHT_PAREN_ORD)
441
File: regex.info, Node: Collating Elements vs. Characters, Next: The Backslash Character, Prev: Predefined Syntaxes, Up: Regular Expression Syntax
443
Collating Elements vs. Characters
444
=================================
446
POSIX generalizes the notion of a character to that of a collating
447
element. It defines a "collating element" to be "a sequence of one or
448
more bytes defined in the current collating sequence as a unit of
451
This generalizes the notion of a character in two ways. First, a
452
single character can map into two or more collating elements. For
453
example, the German "es-zet" collates as the collating element `s'
454
followed by another collating element `s'. Second, two or more
455
characters can map into one collating element. For example, the
456
Spanish `ll' collates after `l' and before `m'.
458
Since POSIX's "collating element" preserves the essential idea of a
459
"character," we use the latter, more familiar, term in this document.
462
File: regex.info, Node: The Backslash Character, Prev: Collating Elements vs. Characters, Up: Regular Expression Syntax
464
The Backslash Character
465
=======================
467
The `\' character has one of four different meanings, depending on
468
the context in which you use it and what syntax bits are set (*note
469
Syntax Bits::.). It can: 1) stand for itself, 2) quote the next
470
character, 3) introduce an operator, or 4) do nothing.
472
1. It stands for itself inside a list (*note List Operators::.) if
473
the syntax bit `RE_BACKSLASH_ESCAPE_IN_LISTS' is not set. For
474
example, `[\]' would match `\'.
476
2. It quotes (makes ordinary, if it's special) the next character
477
when you use it either:
479
* outside a list,(1) or
481
* inside a list and the syntax bit
482
`RE_BACKSLASH_ESCAPE_IN_LISTS' is set.
484
3. It introduces an operator when followed by certain ordinary
485
characters--sometimes only when certain syntax bits are set. See
486
the cases `RE_BK_PLUS_QM', `RE_NO_BK_BRACES', `RE_NO_BK_VAR',
487
`RE_NO_BK_PARENS', `RE_NO_BK_REF' in *Note Syntax Bits::. Also:
489
* `\b' represents the match-word-boundary operator (*note
490
Match-word-boundary Operator::.).
492
* `\B' represents the match-within-word operator (*note
493
Match-within-word Operator::.).
495
* `\<' represents the match-beginning-of-word operator
496
(*note Match-beginning-of-word Operator::.).
498
* `\>' represents the match-end-of-word operator (*note
499
Match-end-of-word Operator::.).
501
* `\w' represents the match-word-constituent operator (*note
502
Match-word-constituent Operator::.).
504
* `\W' represents the match-non-word-constituent operator
505
(*note Match-non-word-constituent Operator::.).
507
* `\`' represents the match-beginning-of-buffer operator and
508
`\'' represents the match-end-of-buffer operator (*note
509
Buffer Operators::.).
511
* If Regex was compiled with the C preprocessor symbol `emacs'
512
defined, then `\sCLASS' represents the match-syntactic-class
513
operator and `\SCLASS' represents the
514
match-not-syntactic-class operator (*note Syntactic Class
517
4. In all other cases, Regex ignores `\'. For example, `\n' matches
521
---------- Footnotes ----------
523
(1) Sometimes you don't have to explicitly quote special characters
524
to make them ordinary. For instance, most characters lose any special
525
meaning inside a list (*note List Operators::.). In addition, if the
526
syntax bits `RE_CONTEXT_INVALID_OPS' and `RE_CONTEXT_INDEP_OPS' aren't
527
set, then (for historical reasons) the matcher considers special
528
characters ordinary if they are in contexts where the operations they
529
represent make no sense; for example, then the match-zero-or-more
530
operator (represented by `*') matches itself in the regular expression
531
`*foo' because there is no preceding expression on which it can
532
operate. It is poor practice, however, to depend on this behavior; if
533
you want a special character to be ordinary outside a list, it's better
534
to always quote it, regardless.
537
File: regex.info, Node: Common Operators, Next: GNU Operators, Prev: Regular Expression Syntax, Up: Top
542
You compose regular expressions from operators. In the following
543
sections, we describe the regular expression operators specified by
544
POSIX; GNU also uses these. Most operators have more than one
545
representation as characters. *Note Regular Expression Syntax::, for
546
what characters represent what operators under what circumstances.
548
For most operators that can be represented in two ways, one
549
representation is a single character and the other is that character
550
preceded by `\'. For example, either `(' or `\(' represents the
551
open-group operator. Which one does depends on the setting of a syntax
552
bit, in this case `RE_NO_BK_PARENS'. Why is this so? Historical
553
reasons dictate some of the varying representations, while POSIX
556
Finally, almost all characters lose any special meaning inside a list
557
(*note List Operators::.).
561
* Match-self Operator:: Ordinary characters.
562
* Match-any-character Operator:: .
563
* Concatenation Operator:: Juxtaposition.
564
* Repetition Operators:: * + ? {}
565
* Alternation Operator:: |
566
* List Operators:: [...] [^...]
567
* Grouping Operators:: (...)
568
* Back-reference Operator:: \digit
569
* Anchoring Operators:: ^ $
572
File: regex.info, Node: Match-self Operator, Next: Match-any-character Operator, Up: Common Operators
574
The Match-self Operator (ORDINARY CHARACTER)
575
============================================
577
This operator matches the character itself. All ordinary characters
578
(*note Regular Expression Syntax::.) represent this operator. For
579
example, `f' is always an ordinary character, so the regular expression
580
`f' matches only the string `f'. In particular, it does *not* match
584
File: regex.info, Node: Match-any-character Operator, Next: Concatenation Operator, Prev: Match-self Operator, Up: Common Operators
586
The Match-any-character Operator (`.')
587
======================================
589
This operator matches any single printing or nonprinting character
590
except it won't match a:
593
if the syntax bit `RE_DOT_NEWLINE' isn't set.
596
if the syntax bit `RE_DOT_NOT_NULL' is set.
598
The `.' (period) character represents this operator. For example,
599
`a.b' matches any three-character string beginning with `a' and ending
603
File: regex.info, Node: Concatenation Operator, Next: Repetition Operators, Prev: Match-any-character Operator, Up: Common Operators
605
The Concatenation Operator
606
==========================
608
This operator concatenates two regular expressions A and B. No
609
character represents this operator; you simply put B after A. The
610
result is a regular expression that will match a string if A matches
611
its first part and B matches the rest. For example, `xy' (two
612
match-self operators) matches `xy'.
615
File: regex.info, Node: Repetition Operators, Next: Alternation Operator, Prev: Concatenation Operator, Up: Common Operators
620
Repetition operators repeat the preceding regular expression a
621
specified number of times.
625
* Match-zero-or-more Operator:: *
626
* Match-one-or-more Operator:: +
627
* Match-zero-or-one Operator:: ?
628
* Interval Operators:: {}
631
File: regex.info, Node: Match-zero-or-more Operator, Next: Match-one-or-more Operator, Up: Repetition Operators
633
The Match-zero-or-more Operator (`*')
634
-------------------------------------
636
This operator repeats the smallest possible preceding regular
637
expression as many times as necessary (including zero) to match the
638
pattern. `*' represents this operator. For example, `o*' matches any
639
string made up of zero or more `o's. Since this operator operates on
640
the smallest preceding regular expression, `fo*' has a repeating `o',
641
not a repeating `fo'. So, `fo*' matches `f', `fo', `foo', and so on.
643
Since the match-zero-or-more operator is a suffix operator, it may be
644
useless as such when no regular expression precedes it. This is the
647
* is first in a regular expression, or
649
* follows a match-beginning-of-line, open-group, or alternation
652
Three different things can happen in these cases:
654
1. If the syntax bit `RE_CONTEXT_INVALID_OPS' is set, then the
655
regular expression is invalid.
657
2. If `RE_CONTEXT_INVALID_OPS' isn't set, but `RE_CONTEXT_INDEP_OPS'
658
is, then `*' represents the match-zero-or-more operator (which
659
then operates on the empty string).
661
3. Otherwise, `*' is ordinary.
664
The matcher processes a match-zero-or-more operator by first matching
665
as many repetitions of the smallest preceding regular expression as it
666
can. Then it continues to match the rest of the pattern.
668
If it can't match the rest of the pattern, it backtracks (as many
669
times as necessary), each time discarding one of the matches until it
670
can either match the entire pattern or be certain that it cannot get a
671
match. For example, when matching `ca*ar' against `caaar', the matcher
672
first matches all three `a's of the string with the `a*' of the regular
673
expression. However, it cannot then match the final `ar' of the
674
regular expression against the final `r' of the string. So it
675
backtracks, discarding the match of the last `a' in the string. It can
676
then match the remaining `ar'.
679
File: regex.info, Node: Match-one-or-more Operator, Next: Match-zero-or-one Operator, Prev: Match-zero-or-more Operator, Up: Repetition Operators
681
The Match-one-or-more Operator (`+' or `\+')
682
--------------------------------------------
684
If the syntax bit `RE_LIMITED_OPS' is set, then Regex doesn't
685
recognize this operator. Otherwise, if the syntax bit `RE_BK_PLUS_QM'
686
isn't set, then `+' represents this operator; if it is, then `\+' does.
688
This operator is similar to the match-zero-or-more operator except
689
that it repeats the preceding regular expression at least once; *note
690
Match-zero-or-more Operator::., for what it operates on, how some
691
syntax bits affect it, and how Regex backtracks to match it.
693
For example, supposing that `+' represents the match-one-or-more
694
operator; then `ca+r' matches, e.g., `car' and `caaaar', but not `cr'.
697
File: regex.info, Node: Match-zero-or-one Operator, Next: Interval Operators, Prev: Match-one-or-more Operator, Up: Repetition Operators
699
The Match-zero-or-one Operator (`?' or `\?')
700
--------------------------------------------
702
If the syntax bit `RE_LIMITED_OPS' is set, then Regex doesn't
703
recognize this operator. Otherwise, if the syntax bit `RE_BK_PLUS_QM'
704
isn't set, then `?' represents this operator; if it is, then `\?' does.
706
This operator is similar to the match-zero-or-more operator except
707
that it repeats the preceding regular expression once or not at all;
708
*note Match-zero-or-more Operator::., to see what it operates on, how
709
some syntax bits affect it, and how Regex backtracks to match it.
711
For example, supposing that `?' represents the match-zero-or-one
712
operator; then `ca?r' matches both `car' and `cr', but nothing else.
715
File: regex.info, Node: Interval Operators, Prev: Match-zero-or-one Operator, Up: Repetition Operators
717
Interval Operators (`{' ... `}' or `\{' ... `\}')
718
-------------------------------------------------
720
If the syntax bit `RE_INTERVALS' is set, then Regex recognizes
721
"interval expressions". They repeat the smallest possible preceding
722
regular expression a specified number of times.
724
If the syntax bit `RE_NO_BK_BRACES' is set, `{' represents the
725
"open-interval operator" and `}' represents the "close-interval
726
operator" ; otherwise, `\{' and `\}' do.
728
Specifically, supposing that `{' and `}' represent the open-interval
729
and close-interval operators; then:
732
matches exactly COUNT occurrences of the preceding regular
736
matches MIN or more occurrences of the preceding regular
740
matches at least MIN but no more than MAX occurrences of the
741
preceding regular expression.
743
The interval expression (but not necessarily the regular expression
744
that contains it) is invalid if:
746
* MIN is greater than MAX, or
748
* any of COUNT, MIN, or MAX are outside the range zero to
749
`RE_DUP_MAX' (which symbol `regex.h' defines).
751
If the interval expression is invalid and the syntax bit
752
`RE_NO_BK_BRACES' is set, then Regex considers all the characters in
753
the would-be interval to be ordinary. If that bit isn't set, then the
754
regular expression is invalid.
756
If the interval expression is valid but there is no preceding regular
757
expression on which to operate, then if the syntax bit
758
`RE_CONTEXT_INVALID_OPS' is set, the regular expression is invalid. If
759
that bit isn't set, then Regex considers all the characters--other than
760
backslashes, which it ignores--in the would-be interval to be ordinary.
763
File: regex.info, Node: Alternation Operator, Next: List Operators, Prev: Repetition Operators, Up: Common Operators
765
The Alternation Operator (`|' or `\|')
766
======================================
768
If the syntax bit `RE_LIMITED_OPS' is set, then Regex doesn't
769
recognize this operator. Otherwise, if the syntax bit `RE_NO_BK_VBAR'
770
is set, then `|' represents this operator; otherwise, `\|' does.
772
Alternatives match one of a choice of regular expressions: if you put
773
the character(s) representing the alternation operator between any two
774
regular expressions A and B, the result matches the union of the
775
strings that A and B match. For example, supposing that `|' is the
776
alternation operator, then `foo|bar|quux' would match any of `foo',
779
The alternation operator operates on the *largest* possible
780
surrounding regular expressions. (Put another way, it has the lowest
781
precedence of any regular expression operator.) Thus, the only way you
782
can delimit its arguments is to use grouping. For example, if `(' and
783
`)' are the open and close-group operators, then `fo(o|b)ar' would
784
match either `fooar' or `fobar'. (`foo|bar' would match `foo' or
787
The matcher usually tries all combinations of alternatives so as to
788
match the longest possible string. For example, when matching
789
`(fooq|foo)*(qbarquux|bar)' against `fooqbarquux', it cannot take, say,
790
the first ("depth-first") combination it could match, since then it
791
would be content to match just `fooqbar'.
794
File: regex.info, Node: List Operators, Next: Grouping Operators, Prev: Alternation Operator, Up: Common Operators
796
List Operators (`[' ... `]' and `[^' ... `]')
797
=============================================
799
"Lists", also called "bracket expressions", are a set of one or more
800
items. An "item" is a character, a character class expression, or a
801
range expression. The syntax bits affect which kinds of items you can
802
put in a list. We explain the last two items in subsections below.
803
Empty lists are invalid.
805
A "matching list" matches a single character represented by one of
806
the list items. You form a matching list by enclosing one or more items
807
within an "open-matching-list operator" (represented by `[') and a
808
"close-list operator" (represented by `]').
810
For example, `[ab]' matches either `a' or `b'. `[ad]*' matches the
811
empty string and any string composed of just `a's and `d's in any
812
order. Regex considers invalid a regular expression with a `[' but no
815
"Nonmatching lists" are similar to matching lists except that they
816
match a single character *not* represented by one of the list items.
817
You use an "open-nonmatching-list operator" (represented by `[^'(1))
818
instead of an open-matching-list operator to start a nonmatching list.
820
For example, `[^ab]' matches any character except `a' or `b'.
822
If the `posix_newline' field in the pattern buffer (*note GNU Pattern
823
Buffers::. is set, then nonmatching lists do not match a newline.
825
Most characters lose any special meaning inside a list. The special
826
characters inside a list follow.
829
ends the list if it's not the first list item. So, if you want to
830
make the `]' character a list item, you must put it first.
833
quotes the next character if the syntax bit
834
`RE_BACKSLASH_ESCAPE_IN_LISTS' is set.
837
represents the open-character-class operator (*note Character
838
Class Operators::.) if the syntax bit `RE_CHAR_CLASSES' is set and
839
what follows is a valid character class expression.
842
represents the close-character-class operator if the syntax bit
843
`RE_CHAR_CLASSES' is set and what precedes it is an
844
open-character-class operator followed by a valid character class
848
represents the range operator (*note Range Operator::.) if it's
849
not first or last in a list or the ending point of a range.
851
All other characters are ordinary. For example, `[.*]' matches `.' and
856
* Character Class Operators:: [:class:]
857
* Range Operator:: start-end
859
---------- Footnotes ----------
861
(1) Regex therefore doesn't consider the `^' to be the first
862
character in the list. If you put a `^' character first in (what you
863
think is) a matching list, you'll turn it into a nonmatching list.
866
File: regex.info, Node: Character Class Operators, Next: Range Operator, Up: List Operators
868
Character Class Operators (`[:' ... `:]')
869
-----------------------------------------
871
If the syntax bit `RE_CHARACTER_CLASSES' is set, then Regex
872
recognizes character class expressions inside lists. A "character
873
class expression" matches one character from a given class. You form a
874
character class expression by putting a character class name between an
875
"open-character-class operator" (represented by `[:') and a
876
"close-character-class operator" (represented by `:]'). The character
877
class names and their meanings are:
886
system-dependent; for GNU, a space or tab
889
control characters (in the ASCII encoding, code 0177 and codes
896
same as `print' except omits space
902
printable characters (in the ASCII encoding, space tilde--codes
906
neither control nor alphanumeric characters
909
space, carriage return, newline, vertical tab, and form feed
915
hexadecimal digits: `0'-`9', `a'-`f', `A'-`F'
917
These correspond to the definitions in the C library's `<ctype.h>'
918
facility. For example, `[:alpha:]' corresponds to the standard
919
facility `isalpha'. Regex recognizes character class expressions only
920
inside of lists; so `[[:alpha:]]' matches any letter, but `[:alpha:]'
921
outside of a bracket expression and not followed by a repetition
922
operator matches just itself.
925
File: regex.info, Node: Range Operator, Prev: Character Class Operators, Up: List Operators
927
The Range Operator (`-')
928
------------------------
930
Regex recognizes "range expressions" inside a list. They represent
931
those characters that fall between two elements in the current
932
collating sequence. You form a range expression by putting a "range
933
operator" between two characters.(1) `-' represents the range operator.
934
For example, `a-f' within a list represents all the characters from `a'
935
through `f' inclusively.
937
If the syntax bit `RE_NO_EMPTY_RANGES' is set, then if the range's
938
ending point collates less than its starting point, the range (and the
939
regular expression containing it) is invalid. For example, the regular
940
expression `[z-a]' would be invalid. If this bit isn't set, then Regex
941
considers such a range to be empty.
943
Since `-' represents the range operator, if you want to make a `-'
944
character itself a list item, you must do one of the following:
946
* Put the `-' either first or last in the list.
948
* Include a range whose starting point collates strictly lower than
949
`-' and whose ending point collates equal or higher. Unless a
950
range is the first item in a list, a `-' can't be its starting
951
point, but *can* be its ending point. That is because Regex
952
considers `-' to be the range operator unless it is preceded by
953
another `-'. For example, in the ASCII encoding, `)', `*', `+',
954
`,', `-', `.', and `/' are contiguous characters in the collating
955
sequence. You might think that `[)-+--/]' has two ranges: `)-+'
956
and `--/'. Rather, it has the ranges `)-+' and `+--', plus the
957
character `/', so it matches, e.g., `,', not `.'.
959
* Put a range whose starting point is `-' first in the list.
961
For example, `[-a-z]' matches a lowercase letter or a hyphen (in
964
---------- Footnotes ----------
966
(1) You can't use a character class for the starting or ending point
967
of a range, since a character class is not a single character.
970
File: regex.info, Node: Grouping Operators, Next: Back-reference Operator, Prev: List Operators, Up: Common Operators
972
Grouping Operators (`(' ... `)' or `\(' ... `\)')
973
=================================================
975
A "group", also known as a "subexpression", consists of an
976
"open-group operator", any number of other operators, and a
977
"close-group operator". Regex treats this sequence as a unit, just as
978
mathematics and programming languages treat a parenthesized expression
981
Therefore, using "groups", you can:
983
* delimit the argument(s) to an alternation operator (*note
984
Alternation Operator::.) or a repetition operator (*note
985
Repetition Operators::.).
987
* keep track of the indices of the substring that matched a given
988
group. *Note Using Registers::, for a precise explanation. This
991
* use the back-reference operator (*note Back-reference
994
* use registers (*note Using Registers::.).
996
If the syntax bit `RE_NO_BK_PARENS' is set, then `(' represents the
997
open-group operator and `)' represents the close-group operator;
998
otherwise, `\(' and `\)' do.
1000
If the syntax bit `RE_UNMATCHED_RIGHT_PAREN_ORD' is set and a
1001
close-group operator has no matching open-group operator, then Regex
1002
considers it to match `)'.
1005
File: regex.info, Node: Back-reference Operator, Next: Anchoring Operators, Prev: Grouping Operators, Up: Common Operators
1007
The Back-reference Operator ("\"DIGIT)
1008
======================================
1010
If the syntax bit `RE_NO_BK_REF' isn't set, then Regex recognizes
1011
back references. A back reference matches a specified preceding group.
1012
The back reference operator is represented by `\DIGIT' anywhere after
1013
the end of a regular expression's DIGIT-th group (*note Grouping
1016
DIGIT must be between `1' and `9'. The matcher assigns numbers 1
1017
through 9 to the first nine groups it encounters. By using one of `\1'
1018
through `\9' after the corresponding group's close-group operator, you
1019
can match a substring identical to the one that the group does.
1021
Back references match according to the following (in all examples
1022
below, `(' represents the open-group, `)' the close-group, `{' the
1023
open-interval and `}' the close-interval operator):
1025
* If the group matches a substring, the back reference matches an
1026
identical substring. For example, `(a)\1' matches `aa' and
1027
`(bana)na\1bo\1' matches `bananabanabobana'. Likewise, `(.*)\1'
1028
matches any (newline-free if the syntax bit `RE_DOT_NEWLINE' isn't
1029
set) string that is composed of two identical halves; the `(.*)'
1030
matches the first half and the `\1' matches the second half.
1032
* If the group matches more than once (as it might if followed by,
1033
e.g., a repetition operator), then the back reference matches the
1034
substring the group *last* matched. For example, `((a*)b)*\1\2'
1035
matches `aabababa'; first group 1 (the outer one) matches `aab'
1036
and group 2 (the inner one) matches `aa'. Then group 1 matches
1037
`ab' and group 2 matches `a'. So, `\1' matches `ab' and `\2'
1040
* If the group doesn't participate in a match, i.e., it is part of an
1041
alternative not taken or a repetition operator allows zero
1042
repetitions of it, then the back reference makes the whole match
1043
fail. For example, `(one()|two())-and-(three\2|four\3)' matches
1044
`one-and-three' and `two-and-four', but not `one-and-four' or
1045
`two-and-three'. For example, if the pattern matches `one-and-',
1046
then its group 2 matches the empty string and its group 3 doesn't
1047
participate in the match. So, if it then matches `four', then
1048
when it tries to back reference group 3--which it will attempt to
1049
do because `\3' follows the `four'--the match will fail because
1050
group 3 didn't participate in the match.
1052
You can use a back reference as an argument to a repetition operator.
1053
For example, `(a(b))\2*' matches `a' followed by two or more `b's.
1054
Similarly, `(a(b))\2{3}' matches `abbbb'.
1056
If there is no preceding DIGIT-th subexpression, the regular
1057
expression is invalid.
1060
File: regex.info, Node: Anchoring Operators, Prev: Back-reference Operator, Up: Common Operators
1065
These operators can constrain a pattern to match only at the
1066
beginning or end of the entire string or at the beginning or end of a
1071
* Match-beginning-of-line Operator:: ^
1072
* Match-end-of-line Operator:: $
1075
File: regex.info, Node: Match-beginning-of-line Operator, Next: Match-end-of-line Operator, Up: Anchoring Operators
1077
The Match-beginning-of-line Operator (`^')
1078
------------------------------------------
1080
This operator can match the empty string either at the beginning of
1081
the string or after a newline character. Thus, it is said to "anchor"
1082
the pattern to the beginning of a line.
1084
In the cases following, `^' represents this operator. (Otherwise,
1087
* It (the `^') is first in the pattern, as in `^foo'.
1089
* The syntax bit `RE_CONTEXT_INDEP_ANCHORS' is set, and it is outside
1090
a bracket expression.
1092
* It follows an open-group or alternation operator, as in `a\(^b\)'
1093
and `a\|^b'. *Note Grouping Operators::, and *Note Alternation
1096
These rules imply that some valid patterns containing `^' cannot be
1097
matched; for example, `foo^bar' if `RE_CONTEXT_INDEP_ANCHORS' is set.
1099
If the `not_bol' field is set in the pattern buffer (*note GNU
1100
Pattern Buffers::.), then `^' fails to match at the beginning of the
1101
string. *Note POSIX Matching::, for when you might find this useful.
1103
If the `newline_anchor' field is set in the pattern buffer, then `^'
1104
fails to match after a newline. This is useful when you do not regard
1105
the string to be matched as broken into lines.
1108
File: regex.info, Node: Match-end-of-line Operator, Prev: Match-beginning-of-line Operator, Up: Anchoring Operators
1110
The Match-end-of-line Operator (`$')
1111
------------------------------------
1113
This operator can match the empty string either at the end of the
1114
string or before a newline character in the string. Thus, it is said
1115
to "anchor" the pattern to the end of a line.
1117
It is always represented by `$'. For example, `foo$' usually
1118
matches, e.g., `foo' and, e.g., the first three characters of
1121
Its interaction with the syntax bits and pattern buffer fields is
1122
exactly the dual of `^''s; see the previous section. (That is,
1123
"beginning" becomes "end", "next" becomes "previous", and "after"
1127
File: regex.info, Node: GNU Operators, Next: GNU Emacs Operators, Prev: Common Operators, Up: Top
1132
Following are operators that GNU defines (and POSIX doesn't).
1137
* Buffer Operators::
1140
File: regex.info, Node: Word Operators, Next: Buffer Operators, Up: GNU Operators
1145
The operators in this section require Regex to recognize parts of
1146
words. Regex uses a syntax table to determine whether or not a
1147
character is part of a word, i.e., whether or not it is
1152
* Non-Emacs Syntax Tables::
1153
* Match-word-boundary Operator:: \b
1154
* Match-within-word Operator:: \B
1155
* Match-beginning-of-word Operator:: \<
1156
* Match-end-of-word Operator:: \>
1157
* Match-word-constituent Operator:: \w
1158
* Match-non-word-constituent Operator:: \W
1161
File: regex.info, Node: Non-Emacs Syntax Tables, Next: Match-word-boundary Operator, Up: Word Operators
1163
Non-Emacs Syntax Tables
1164
-----------------------
1166
A "syntax table" is an array indexed by the characters in your
1167
character set. In the ASCII encoding, therefore, a syntax table has
1168
256 elements. Regex always uses a `char *' variable `re_syntax_table'
1169
as its syntax table. In some cases, it initializes this variable and
1170
in others it expects you to initialize it.
1172
* If Regex is compiled with the preprocessor symbols `emacs' and
1173
`SYNTAX_TABLE' both undefined, then Regex allocates
1174
`re_syntax_table' and initializes an element I either to `Sword'
1175
(which it defines) if I is a letter, number, or `_', or to zero if
1178
* If Regex is compiled with `emacs' undefined but `SYNTAX_TABLE'
1179
defined, then Regex expects you to define a `char *' variable
1180
`re_syntax_table' to be a valid syntax table.
1182
* *Note Emacs Syntax Tables::, for what happens when Regex is
1183
compiled with the preprocessor symbol `emacs' defined.
1186
File: regex.info, Node: Match-word-boundary Operator, Next: Match-within-word Operator, Prev: Non-Emacs Syntax Tables, Up: Word Operators
1188
The Match-word-boundary Operator (`\b')
1189
---------------------------------------
1191
This operator (represented by `\b') matches the empty string at
1192
either the beginning or the end of a word. For example, `\brat\b'
1193
matches the separate word `rat'.
1196
File: regex.info, Node: Match-within-word Operator, Next: Match-beginning-of-word Operator, Prev: Match-word-boundary Operator, Up: Word Operators
1198
The Match-within-word Operator (`\B')
1199
-------------------------------------
1201
This operator (represented by `\B') matches the empty string within a
1202
word. For example, `c\Brat\Be' matches `crate', but `dirty \Brat'
1203
doesn't match `dirty rat'.
1206
File: regex.info, Node: Match-beginning-of-word Operator, Next: Match-end-of-word Operator, Prev: Match-within-word Operator, Up: Word Operators
1208
The Match-beginning-of-word Operator (`\<')
1209
-------------------------------------------
1211
This operator (represented by `\<') matches the empty string at the
1212
beginning of a word.
1215
File: regex.info, Node: Match-end-of-word Operator, Next: Match-word-constituent Operator, Prev: Match-beginning-of-word Operator, Up: Word Operators
1217
The Match-end-of-word Operator (`\>')
1218
-------------------------------------
1220
This operator (represented by `\>') matches the empty string at the
1224
File: regex.info, Node: Match-word-constituent Operator, Next: Match-non-word-constituent Operator, Prev: Match-end-of-word Operator, Up: Word Operators
1226
The Match-word-constituent Operator (`\w')
1227
------------------------------------------
1229
This operator (represented by `\w') matches any word-constituent
1233
File: regex.info, Node: Match-non-word-constituent Operator, Prev: Match-word-constituent Operator, Up: Word Operators
1235
The Match-non-word-constituent Operator (`\W')
1236
----------------------------------------------
1238
This operator (represented by `\W') matches any character that is not
1242
File: regex.info, Node: Buffer Operators, Prev: Word Operators, Up: GNU Operators
1247
Following are operators which work on buffers. In Emacs, a "buffer"
1248
is, naturally, an Emacs buffer. For other programs, Regex considers the
1249
entire string to be matched as the buffer.
1253
* Match-beginning-of-buffer Operator:: \`
1254
* Match-end-of-buffer Operator:: \'
1257
File: regex.info, Node: Match-beginning-of-buffer Operator, Next: Match-end-of-buffer Operator, Up: Buffer Operators
1259
The Match-beginning-of-buffer Operator (`\`')
1260
---------------------------------------------
1262
This operator (represented by `\`') matches the empty string at the
1263
beginning of the buffer.
1266
File: regex.info, Node: Match-end-of-buffer Operator, Prev: Match-beginning-of-buffer Operator, Up: Buffer Operators
1268
The Match-end-of-buffer Operator (`\'')
1269
---------------------------------------
1271
This operator (represented by `\'') matches the empty string at the
1275
File: regex.info, Node: GNU Emacs Operators, Next: What Gets Matched?, Prev: GNU Operators, Up: Top
1280
Following are operators that GNU defines (and POSIX doesn't) that you
1281
can use only when Regex is compiled with the preprocessor symbol
1286
* Syntactic Class Operators::
1289
File: regex.info, Node: Syntactic Class Operators, Up: GNU Emacs Operators
1291
Syntactic Class Operators
1292
=========================
1294
The operators in this section require Regex to recognize the syntactic
1295
classes of characters. Regex uses a syntax table to determine this.
1299
* Emacs Syntax Tables::
1300
* Match-syntactic-class Operator:: \sCLASS
1301
* Match-not-syntactic-class Operator:: \SCLASS
1304
File: regex.info, Node: Emacs Syntax Tables, Next: Match-syntactic-class Operator, Up: Syntactic Class Operators
1309
A "syntax table" is an array indexed by the characters in your
1310
character set. In the ASCII encoding, therefore, a syntax table has
1313
If Regex is compiled with the preprocessor symbol `emacs' defined,
1314
then Regex expects you to define and initialize the variable
1315
`re_syntax_table' to be an Emacs syntax table. Emacs' syntax tables
1316
are more complicated than Regex's own (*note Non-Emacs Syntax
1317
Tables::.). *Note Syntax: (emacs)Syntax, for a description of Emacs'
1321
File: regex.info, Node: Match-syntactic-class Operator, Next: Match-not-syntactic-class Operator, Prev: Emacs Syntax Tables, Up: Syntactic Class Operators
1323
The Match-syntactic-class Operator (`\s'CLASS)
1324
----------------------------------------------
1326
This operator matches any character whose syntactic class is
1327
represented by a specified character. `\sCLASS' represents this
1328
operator where CLASS is the character representing the syntactic class
1329
you want. For example, `w' represents the syntactic class of
1330
word-constituent characters, so `\sw' matches any word-constituent
1334
File: regex.info, Node: Match-not-syntactic-class Operator, Prev: Match-syntactic-class Operator, Up: Syntactic Class Operators
1336
The Match-not-syntactic-class Operator (`\S'CLASS)
1337
--------------------------------------------------
1339
This operator is similar to the match-syntactic-class operator except
1340
that it matches any character whose syntactic class is *not*
1341
represented by the specified character. `\SCLASS' represents this
1342
operator. For example, `w' represents the syntactic class of
1343
word-constituent characters, so `\Sw' matches any character that is not
1347
File: regex.info, Node: What Gets Matched?, Next: Programming with Regex, Prev: GNU Emacs Operators, Up: Top
1352
Regex usually matches strings according to the "leftmost longest"
1353
rule; that is, it chooses the longest of the leftmost matches. This
1354
does not mean that for a regular expression containing subexpressions
1355
that it simply chooses the longest match for each subexpression, left to
1356
right; the overall match must also be the longest possible one.
1358
For example, `(ac*)(c*d[ac]*)\1' matches `acdacaaa', not `acdac', as
1359
it would if it were to choose the longest match for the first
1363
File: regex.info, Node: Programming with Regex, Next: Copying, Prev: What Gets Matched?, Up: Top
1365
Programming with Regex
1366
**********************
1368
Here we describe how you use the Regex data structures and functions
1369
in C programs. Regex has three interfaces: one designed for GNU, one
1370
compatible with POSIX and one compatible with Berkeley UNIX.
1374
* GNU Regex Functions::
1375
* POSIX Regex Functions::
1376
* BSD Regex Functions::
1379
File: regex.info, Node: GNU Regex Functions, Next: POSIX Regex Functions, Up: Programming with Regex
1384
If you're writing code that doesn't need to be compatible with either
1385
POSIX or Berkeley UNIX, you can use these functions. They provide more
1386
options than the other interfaces.
1390
* GNU Pattern Buffers:: The re_pattern_buffer type.
1391
* GNU Regular Expression Compiling:: re_compile_pattern ()
1392
* GNU Matching:: re_match ()
1393
* GNU Searching:: re_search ()
1394
* Matching/Searching with Split Data:: re_match_2 (), re_search_2 ()
1395
* Searching with Fastmaps:: re_compile_fastmap ()
1396
* GNU Translate Tables:: The `translate' field.
1397
* Using Registers:: The re_registers type and related fns.
1398
* Freeing GNU Pattern Buffers:: regfree ()
1401
File: regex.info, Node: GNU Pattern Buffers, Next: GNU Regular Expression Compiling, Up: GNU Regex Functions
1406
To compile, match, or search for a given regular expression, you must
1407
supply a pattern buffer. A "pattern buffer" holds one compiled regular
1410
You can have several different pattern buffers simultaneously, each
1411
holding a compiled pattern for a different regular expression.
1413
`regex.h' defines the pattern buffer `struct' as follows:
1415
/* Space that holds the compiled pattern. It is declared as
1416
`unsigned char *' because its elements are
1417
sometimes used as array indexes. */
1418
unsigned char *buffer;
1420
/* Number of bytes to which `buffer' points. */
1421
unsigned long allocated;
1423
/* Number of bytes actually used in `buffer'. */
1426
/* Syntax setting with which the pattern was compiled. */
1427
reg_syntax_t syntax;
1429
/* Pointer to a fastmap, if any, otherwise zero. re_search uses
1430
the fastmap, if there is one, to skip over impossible
1431
starting points for matches. */
1434
/* Either a translate table to apply to all characters before
1435
comparing them, or zero for no translation. The translation
1436
is applied to a pattern when it is compiled and to a string
1437
when it is matched. */
1440
/* Number of subexpressions found by the compiler. */
1443
/* Zero if this pattern cannot match the empty string, one else.
1444
Well, in truth it's used only in `re_search_2', to see
1445
whether or not we should use the fastmap, so we don't set
1446
this absolutely perfectly; see `re_compile_fastmap' (the
1447
`duplicate' case). */
1448
unsigned can_be_null : 1;
1450
/* If REGS_UNALLOCATED, allocate space in the `regs' structure
1451
for `max (RE_NREGS, re_nsub + 1)' groups.
1452
If REGS_REALLOCATE, reallocate space if necessary.
1453
If REGS_FIXED, use what's there. */
1454
#define REGS_UNALLOCATED 0
1455
#define REGS_REALLOCATE 1
1456
#define REGS_FIXED 2
1457
unsigned regs_allocated : 2;
1459
/* Set to zero when `regex_compile' compiles a pattern; set to one
1460
by `re_compile_fastmap' if it updates the fastmap. */
1461
unsigned fastmap_accurate : 1;
1463
/* If set, `re_match_2' does not return information about
1465
unsigned no_sub : 1;
1467
/* If set, a beginning-of-line anchor doesn't match at the
1468
beginning of the string. */
1469
unsigned not_bol : 1;
1471
/* Similarly for an end-of-line anchor. */
1472
unsigned not_eol : 1;
1474
/* If true, an anchor at a newline matches. */
1475
unsigned newline_anchor : 1;
1477
---------- Footnotes ----------
1479
(1) Regular expressions are also referred to as "patterns," hence
1480
the name "pattern buffer."
1483
File: regex.info, Node: GNU Regular Expression Compiling, Next: GNU Matching, Prev: GNU Pattern Buffers, Up: GNU Regex Functions
1485
GNU Regular Expression Compiling
1486
--------------------------------
1488
In GNU, you can both match and search for a given regular expression.
1489
To do either, you must first compile it in a pattern buffer (*note GNU
1490
Pattern Buffers::.).
1492
Regular expressions match according to the syntax with which they were
1493
compiled; with GNU, you indicate what syntax you want by setting the
1494
variable `re_syntax_options' (declared in `regex.h' and defined in
1495
`regex.c') before calling the compiling function, `re_compile_pattern'
1496
(see below). *Note Syntax Bits::, and *Note Predefined Syntaxes::.
1498
You can change the value of `re_syntax_options' at any time.
1499
Usually, however, you set its value once and then never change it.
1501
`re_compile_pattern' takes a pattern buffer as an argument. You must
1502
initialize the following fields:
1504
`translate initialization'
1506
Initialize this to point to a translate table if you want one, or
1507
to zero if you don't. We explain translate tables in *Note GNU
1511
Initialize this to nonzero if you want a fastmap, or to zero if you
1516
If you want `re_compile_pattern' to allocate memory for the
1517
compiled pattern, set both of these to zero. If you have an
1518
existing block of memory (allocated with `malloc') you want Regex
1519
to use, set `buffer' to its address and `allocated' to its size (in
1522
`re_compile_pattern' uses `realloc' to extend the space for the
1523
compiled pattern as necessary.
1525
To compile a pattern buffer, use:
1528
re_compile_pattern (const char *REGEX, const int REGEX_SIZE,
1529
struct re_pattern_buffer *PATTERN_BUFFER)
1531
REGEX is the regular expression's address, REGEX_SIZE is its length,
1532
and PATTERN_BUFFER is the pattern buffer's address.
1534
If `re_compile_pattern' successfully compiles the regular expression,
1535
it returns zero and sets `*PATTERN_BUFFER' to the compiled pattern. It
1536
sets the pattern buffer's fields as follows:
1539
to the compiled pattern.
1542
to the number of bytes the compiled pattern in `buffer' occupies.
1545
to the current value of `re_syntax_options'.
1548
to the number of subexpressions in REGEX.
1551
to zero on the theory that the pattern you're compiling is
1552
different than the one previously compiled into `buffer'; in that
1553
case (since you can't make a fastmap without a compiled pattern),
1554
`fastmap' would either contain an incompatible fastmap, or nothing
1557
If `re_compile_pattern' can't compile REGEX, it returns an error
1558
string corresponding to one of the errors listed in *Note POSIX Regular
1559
Expression Compiling::.
1562
File: regex.info, Node: GNU Matching, Next: GNU Searching, Prev: GNU Regular Expression Compiling, Up: GNU Regex Functions
1567
Matching the GNU way means trying to match as much of a string as
1568
possible starting at a position within it you specify. Once you've
1569
compiled a pattern into a pattern buffer (*note GNU Regular Expression
1570
Compiling::.), you can ask the matcher to match that pattern against a
1574
re_match (struct re_pattern_buffer *PATTERN_BUFFER,
1575
const char *STRING, const int SIZE,
1576
const int START, struct re_registers *REGS)
1578
PATTERN_BUFFER is the address of a pattern buffer containing a compiled
1579
pattern. STRING is the string you want to match; it can contain
1580
newline and null characters. SIZE is the length of that string. START
1581
is the string index at which you want to begin matching; the first
1582
character of STRING is at index zero. *Note Using Registers::, for a
1583
explanation of REGS; you can safely pass zero.
1585
`re_match' matches the regular expression in PATTERN_BUFFER against
1586
the string STRING according to the syntax in PATTERN_BUFFERS's `syntax'
1587
field. (*Note GNU Regular Expression Compiling::, for how to set it.)
1588
The function returns -1 if the compiled pattern does not match any part
1589
of STRING and -2 if an internal error happens; otherwise, it returns
1590
how many (possibly zero) characters of STRING the pattern matched.
1592
An example: suppose PATTERN_BUFFER points to a pattern buffer
1593
containing the compiled pattern for `a*', and STRING points to `aaaaab'
1594
(whereupon SIZE should be 6). Then if START is 2, `re_match' returns 3,
1595
i.e., `a*' would have matched the last three `a's in STRING. If START
1596
is 0, `re_match' returns 5, i.e., `a*' would have matched all the `a's
1597
in STRING. If START is either 5 or 6, it returns zero.
1599
If START is not between zero and SIZE, then `re_match' returns -1.
1602
File: regex.info, Node: GNU Searching, Next: Matching/Searching with Split Data, Prev: GNU Matching, Up: GNU Regex Functions
1607
"Searching" means trying to match starting at successive positions
1608
within a string. The function `re_search' does this.
1610
Before calling `re_search', you must compile your regular expression.
1611
*Note GNU Regular Expression Compiling::.
1613
Here is the function declaration:
1616
re_search (struct re_pattern_buffer *PATTERN_BUFFER,
1617
const char *STRING, const int SIZE,
1618
const int START, const int RANGE,
1619
struct re_registers *REGS)
1621
whose arguments are the same as those to `re_match' (*note GNU
1622
Matching::.) except that the two arguments START and RANGE replace
1623
`re_match''s argument START.
1625
If RANGE is positive, then `re_search' attempts a match starting
1626
first at index START, then at START + 1 if that fails, and so on, up to
1627
START + RANGE; if RANGE is negative, then it attempts a match starting
1628
first at index START, then at START -1 if that fails, and so on.
1630
If START is not between zero and SIZE, then `re_search' returns -1.
1631
When RANGE is positive, `re_search' adjusts RANGE so that START + RANGE
1632
- 1 is between zero and SIZE, if necessary; that way it won't search
1633
outside of STRING. Similarly, when RANGE is negative, `re_search'
1634
adjusts RANGE so that START + RANGE + 1 is between zero and SIZE, if
1637
If the `fastmap' field of PATTERN_BUFFER is zero, `re_search' matches
1638
starting at consecutive positions; otherwise, it uses `fastmap' to make
1639
the search more efficient. *Note Searching with Fastmaps::.
1641
If no match is found, `re_search' returns -1. If a match is found,
1642
it returns the index where the match began. If an internal error
1643
happens, it returns -2.
1646
File: regex.info, Node: Matching/Searching with Split Data, Next: Searching with Fastmaps, Prev: GNU Searching, Up: GNU Regex Functions
1648
Matching and Searching with Split Data
1649
--------------------------------------
1651
Using the functions `re_match_2' and `re_search_2', you can match or
1652
search in data that is divided into two strings.
1657
re_match_2 (struct re_pattern_buffer *BUFFER,
1658
const char *STRING1, const int SIZE1,
1659
const char *STRING2, const int SIZE2,
1661
struct re_registers *REGS,
1664
is similar to `re_match' (*note GNU Matching::.) except that you pass
1665
*two* data strings and sizes, and an index STOP beyond which you don't
1666
want the matcher to try matching. As with `re_match', if it succeeds,
1667
`re_match_2' returns how many characters of STRING it matched. Regard
1668
STRING1 and STRING2 as concatenated when you set the arguments START and
1669
STOP and use the contents of REGS; `re_match_2' never returns a value
1670
larger than SIZE1 + SIZE2.
1675
re_search_2 (struct re_pattern_buffer *BUFFER,
1676
const char *STRING1, const int SIZE1,
1677
const char *STRING2, const int SIZE2,
1678
const int START, const int RANGE,
1679
struct re_registers *REGS,
1682
is similarly related to `re_search'.
1685
File: regex.info, Node: Searching with Fastmaps, Next: GNU Translate Tables, Prev: Matching/Searching with Split Data, Up: GNU Regex Functions
1687
Searching with Fastmaps
1688
-----------------------
1690
If you're searching through a long string, you should use a fastmap.
1691
Without one, the searcher tries to match at consecutive positions in the
1692
string. Generally, most of the characters in the string could not start
1693
a match. It takes much longer to try matching at a given position in
1694
the string than it does to check in a table whether or not the
1695
character at that position could start a match. A "fastmap" is such a
1698
More specifically, a fastmap is an array indexed by the characters in
1699
your character set. Under the ASCII encoding, therefore, a fastmap has
1700
256 elements. If you want the searcher to use a fastmap with a given
1701
pattern buffer, you must allocate the array and assign the array's
1702
address to the pattern buffer's `fastmap' field. You either can
1703
compile the fastmap yourself or have `re_search' do it for you; when
1704
`fastmap' is nonzero, it automatically compiles a fastmap the first
1705
time you search using a particular compiled pattern.
1707
To compile a fastmap yourself, use:
1710
re_compile_fastmap (struct re_pattern_buffer *PATTERN_BUFFER)
1712
PATTERN_BUFFER is the address of a pattern buffer. If the character C
1713
could start a match for the pattern, `re_compile_fastmap' makes
1714
`PATTERN_BUFFER->fastmap[C]' nonzero. It returns 0 if it can compile a
1715
fastmap and -2 if there is an internal error. For example, if `|' is
1716
the alternation operator and PATTERN_BUFFER holds the compiled pattern
1717
for `a|b', then `re_compile_fastmap' sets `fastmap['a']' and
1718
`fastmap['b']' (and no others).
1720
`re_search' uses a fastmap as it moves along in the string: it checks
1721
the string's characters until it finds one that's in the fastmap. Then
1722
it tries matching at that character. If the match fails, it repeats
1723
the process. So, by using a fastmap, `re_search' doesn't waste time
1724
trying to match at positions in the string that couldn't start a match.
1726
If you don't want `re_search' to use a fastmap, store zero in the
1727
`fastmap' field of the pattern buffer before calling `re_search'.
1729
Once you've initialized a pattern buffer's `fastmap' field, you need
1730
never do so again--even if you compile a new pattern in it--provided
1731
the way the field is set still reflects whether or not you want a
1732
fastmap. `re_search' will still either do nothing if `fastmap' is null
1733
or, if it isn't, compile a new fastmap for the new pattern.
1736
File: regex.info, Node: GNU Translate Tables, Next: Using Registers, Prev: Searching with Fastmaps, Up: GNU Regex Functions
1738
GNU Translate Tables
1739
--------------------
1741
If you set the `translate' field of a pattern buffer to a translate
1742
table, then the GNU Regex functions to which you've passed that pattern
1743
buffer use it to apply a simple transformation to all the regular
1744
expression and string characters at which they look.
1746
A "translate table" is an array indexed by the characters in your
1747
character set. Under the ASCII encoding, therefore, a translate table
1748
has 256 elements. The array's elements are also characters in your
1749
character set. When the Regex functions see a character C, they use
1750
`translate[C]' in its place, with one exception: the character after a
1751
`\' is not translated. (This ensures that, the operators, e.g., `\B'
1752
and `\b', are always distinguishable.)
1754
For example, a table that maps all lowercase letters to the
1755
corresponding uppercase ones would cause the matcher to ignore
1756
differences in case.(1) Such a table would map all characters except
1757
lowercase letters to themselves, and lowercase letters to the
1758
corresponding uppercase ones. Under the ASCII encoding, here's how you
1759
could initialize such a table (we'll call it `case_fold'):
1761
for (i = 0; i < 256; i++)
1763
for (i = 'a'; i <= 'z'; i++)
1764
case_fold[i] = i - ('a' - 'A');
1766
You tell Regex to use a translate table on a given pattern buffer by
1767
assigning that table's address to the `translate' field of that buffer.
1768
If you don't want Regex to do any translation, put zero into this
1769
field. You'll get weird results if you change the table's contents
1770
anytime between compiling the pattern buffer, compiling its fastmap, and
1771
matching or searching with the pattern buffer.
1773
---------- Footnotes ----------
1775
(1) A table that maps all uppercase letters to the corresponding
1776
lowercase ones would work just as well for this purpose.
1779
File: regex.info, Node: Using Registers, Next: Freeing GNU Pattern Buffers, Prev: GNU Translate Tables, Up: GNU Regex Functions
1784
A group in a regular expression can match a (posssibly empty)
1785
substring of the string that regular expression as a whole matched.
1786
The matcher remembers the beginning and end of the substring matched by
1789
To find out what they matched, pass a nonzero REGS argument to a GNU
1790
matching or searching function (*note GNU Matching::. and *Note GNU
1791
Searching::), i.e., the address of a structure of this type, as defined
1801
Except for (possibly) the NUM_REGS'th element (see below), the Ith
1802
element of the `start' and `end' arrays records information about the
1803
Ith group in the pattern. (They're declared as C pointers, but this is
1804
only because not all C compilers accept zero-length arrays;
1805
conceptually, it is simplest to think of them as arrays.)
1807
The `start' and `end' arrays are allocated in various ways, depending
1808
on the value of the `regs_allocated' field in the pattern buffer passed
1811
The simplest and perhaps most useful is to let the matcher
1812
(re)allocate enough space to record information for all the groups in
1813
the regular expression. If `regs_allocated' is `REGS_UNALLOCATED', the
1814
matcher allocates 1 + RE_NSUB (another field in the pattern buffer;
1815
*note GNU Pattern Buffers::.). The extra element is set to -1, and
1816
sets `regs_allocated' to `REGS_REALLOCATE'. Then on subsequent calls
1817
with the same pattern buffer and REGS arguments, the matcher
1818
reallocates more space if necessary.
1820
It would perhaps be more logical to make the `regs_allocated' field
1821
part of the `re_registers' structure, instead of part of the pattern
1822
buffer. But in that case the caller would be forced to initialize the
1823
structure before passing it. Much existing code doesn't do this
1824
initialization, and it's arguably better to avoid it anyway.
1826
`re_compile_pattern' sets `regs_allocated' to `REGS_UNALLOCATED', so
1827
if you use the GNU regular expression functions, you get this behavior
1830
xx document re_set_registers
1832
POSIX, on the other hand, requires a different interface: the caller
1833
is supposed to pass in a fixed-length array which the matcher fills.
1834
Therefore, if `regs_allocated' is `REGS_FIXED' the matcher simply fills
1837
The following examples illustrate the information recorded in the
1838
`re_registers' structure. (In all of them, `(' represents the
1839
open-group and `)' the close-group operator. The first character in
1840
the string STRING is at index 0.)
1842
* If the regular expression has an I-th group not contained within
1843
another group that matches a substring of STRING, then the
1844
function sets `REGS->start[I]' to the index in STRING where the
1845
substring matched by the I-th group begins, and `REGS->end[I]' to
1846
the index just beyond that substring's end. The function sets
1847
`REGS->start[0]' and `REGS->end[0]' to analogous information about
1850
For example, when you match `((a)(b))' against `ab', you get:
1852
* 0 in `REGS->start[0]' and 2 in `REGS->end[0]'
1854
* 0 in `REGS->start[1]' and 2 in `REGS->end[1]'
1856
* 0 in `REGS->start[2]' and 1 in `REGS->end[2]'
1858
* 1 in `REGS->start[3]' and 2 in `REGS->end[3]'
1860
* If a group matches more than once (as it might if followed by,
1861
e.g., a repetition operator), then the function reports the
1862
information about what the group *last* matched.
1864
For example, when you match the pattern `(a)*' against the string
1867
* 0 in `REGS->start[0]' and 2 in `REGS->end[0]'
1869
* 1 in `REGS->start[1]' and 2 in `REGS->end[1]'
1871
* If the I-th group does not participate in a successful match,
1872
e.g., it is an alternative not taken or a repetition operator
1873
allows zero repetitions of it, then the function sets
1874
`REGS->start[I]' and `REGS->end[I]' to -1.
1876
For example, when you match the pattern `(a)*b' against the string
1879
* 0 in `REGS->start[0]' and 1 in `REGS->end[0]'
1881
* -1 in `REGS->start[1]' and -1 in `REGS->end[1]'
1883
* If the I-th group matches a zero-length string, then the function
1884
sets `REGS->start[I]' and `REGS->end[I]' to the index just beyond
1885
that zero-length string.
1887
For example, when you match the pattern `(a*)b' against the string
1890
* 0 in `REGS->start[0]' and 1 in `REGS->end[0]'
1892
* 0 in `REGS->start[1]' and 0 in `REGS->end[1]'
1894
* If an I-th group contains a J-th group in turn not contained
1895
within any other group within group I and the function reports a
1896
match of the I-th group, then it records in `REGS->start[J]' and
1897
`REGS->end[J]' the last match (if it matched) of the J-th group.
1899
For example, when you match the pattern `((a*)b)*' against the
1900
string `abb', group 2 last matches the empty string, so you get
1901
what it previously matched:
1903
* 0 in `REGS->start[0]' and 3 in `REGS->end[0]'
1905
* 2 in `REGS->start[1]' and 3 in `REGS->end[1]'
1907
* 2 in `REGS->start[2]' and 2 in `REGS->end[2]'
1909
When you match the pattern `((a)*b)*' against the string `abb',
1910
group 2 doesn't participate in the last match, so you get:
1912
* 0 in `REGS->start[0]' and 3 in `REGS->end[0]'
1914
* 2 in `REGS->start[1]' and 3 in `REGS->end[1]'
1916
* 0 in `REGS->start[2]' and 1 in `REGS->end[2]'
1918
* If an I-th group contains a J-th group in turn not contained
1919
within any other group within group I and the function sets
1920
`REGS->start[I]' and `REGS->end[I]' to -1, then it also sets
1921
`REGS->start[J]' and `REGS->end[J]' to -1.
1923
For example, when you match the pattern `((a)*b)*c' against the
1924
string `c', you get:
1926
* 0 in `REGS->start[0]' and 1 in `REGS->end[0]'
1928
* -1 in `REGS->start[1]' and -1 in `REGS->end[1]'
1930
* -1 in `REGS->start[2]' and -1 in `REGS->end[2]'
1933
File: regex.info, Node: Freeing GNU Pattern Buffers, Prev: Using Registers, Up: GNU Regex Functions
1935
Freeing GNU Pattern Buffers
1936
---------------------------
1938
To free any allocated fields of a pattern buffer, you can use the
1939
POSIX function described in *Note Freeing POSIX Pattern Buffers::,
1940
since the type `regex_t'--the type for POSIX pattern buffers--is
1941
equivalent to the type `re_pattern_buffer'. After freeing a pattern
1942
buffer, you need to again compile a regular expression in it (*note GNU
1943
Regular Expression Compiling::.) before passing it to a matching or
1947
File: regex.info, Node: POSIX Regex Functions, Next: BSD Regex Functions, Prev: GNU Regex Functions, Up: Programming with Regex
1949
POSIX Regex Functions
1950
=====================
1952
If you're writing code that has to be POSIX compatible, you'll need
1953
to use these functions. Their interfaces are as specified by POSIX,
1958
* POSIX Pattern Buffers:: The regex_t type.
1959
* POSIX Regular Expression Compiling:: regcomp ()
1960
* POSIX Matching:: regexec ()
1961
* Reporting Errors:: regerror ()
1962
* Using Byte Offsets:: The regmatch_t type.
1963
* Freeing POSIX Pattern Buffers:: regfree ()
1966
File: regex.info, Node: POSIX Pattern Buffers, Next: POSIX Regular Expression Compiling, Up: POSIX Regex Functions
1968
POSIX Pattern Buffers
1969
---------------------
1971
To compile or match a given regular expression the POSIX way, you
1972
must supply a pattern buffer exactly the way you do for GNU (*note GNU
1973
Pattern Buffers::.). POSIX pattern buffers have type `regex_t', which
1974
is equivalent to the GNU pattern buffer type `re_pattern_buffer'.
1977
File: regex.info, Node: POSIX Regular Expression Compiling, Next: POSIX Matching, Prev: POSIX Pattern Buffers, Up: POSIX Regex Functions
1979
POSIX Regular Expression Compiling
1980
----------------------------------
1982
With POSIX, you can only search for a given regular expression; you
1983
can't match it. To do this, you must first compile it in a pattern
1984
buffer, using `regcomp'.
1986
To compile a pattern buffer, use:
1989
regcomp (regex_t *PREG, const char *REGEX, int CFLAGS)
1991
PREG is the initialized pattern buffer's address, REGEX is the regular
1992
expression's address, and CFLAGS is the compilation flags, which Regex
1993
considers as a collection of bits. Here are the valid bits, as defined
1997
says to use POSIX Extended Regular Expression syntax; if this isn't
1998
set, then says to use POSIX Basic Regular Expression syntax.
1999
`regcomp' sets PREG's `syntax' field accordingly.
2002
says to ignore case; `regcomp' sets PREG's `translate' field to a
2003
translate table which ignores case, replacing anything you've put
2007
says to set PREG's `no_sub' field; *note POSIX Matching::., for
2013
* match-any-character operator (*note Match-any-character
2014
Operator::.) doesn't match a newline.
2016
* nonmatching list not containing a newline (*note List
2017
Operators::.) matches a newline.
2019
* match-beginning-of-line operator (*note
2020
Match-beginning-of-line Operator::.) matches the empty string
2021
immediately after a newline, regardless of how `REG_NOTBOL'
2022
is set (*note POSIX Matching::., for an explanation of
2025
* match-end-of-line operator (*note Match-beginning-of-line
2026
Operator::.) matches the empty string immediately before a
2027
newline, regardless of how `REG_NOTEOL' is set (*note POSIX
2028
Matching::., for an explanation of `REG_NOTEOL').
2030
If `regcomp' successfully compiles the regular expression, it returns
2031
zero and sets `*PATTERN_BUFFER' to the compiled pattern. Except for
2032
`syntax' (which it sets as explained above), it also sets the same
2033
fields the same way as does the GNU compiling function (*note GNU
2034
Regular Expression Compiling::.).
2036
If `regcomp' can't compile the regular expression, it returns one of
2037
the error codes listed here. (Except when noted differently, the
2038
syntax of in all examples below is basic regular expression syntax.)
2041
For example, the consecutive repetition operators `**' in `a**'
2042
are invalid. As another example, if the syntax is extended
2043
regular expression syntax, then the repetition operator `*' with
2044
nothing on which to operate in `*' is invalid.
2047
For example, the COUNT `-1' in `a\{-1' is invalid.
2050
For example, `a\{1' is missing a close-interval operator.
2053
For example, `[a' is missing a close-list operator.
2056
For example, the range ending point `z' that collates lower than
2057
does its starting point `a' in `[z-a]' is invalid. Also, the
2058
range with the character class `[:alpha:]' as its starting point in
2062
For example, the character class name `foo' in `[[:foo:]' is
2066
For example, `a\)' is missing an open-group operator and `\(a' is
2067
missing a close-group operator.
2070
For example, the back reference `\2' that refers to a nonexistent
2071
subexpression in `\(a\)\2' is invalid.
2074
Returned when a regular expression causes no other more specific
2078
For example, the trailing backslash `\' in `a\' is invalid, as is
2082
For example, in the extended regular expression syntax, the empty
2083
group `()' in `a()b' is invalid.
2086
Returned when a regular expression needs a pattern buffer larger
2090
Returned when a regular expression makes Regex to run out of
2094
File: regex.info, Node: POSIX Matching, Next: Reporting Errors, Prev: POSIX Regular Expression Compiling, Up: POSIX Regex Functions
2099
Matching the POSIX way means trying to match a null-terminated string
2100
starting at its first character. Once you've compiled a pattern into a
2101
pattern buffer (*note POSIX Regular Expression Compiling::.), you can
2102
ask the matcher to match that pattern against a string using:
2105
regexec (const regex_t *PREG, const char *STRING,
2106
size_t NMATCH, regmatch_t PMATCH[], int EFLAGS)
2108
PREG is the address of a pattern buffer for a compiled pattern. STRING
2109
is the string you want to match.
2111
*Note Using Byte Offsets::, for an explanation of PMATCH. If you
2112
pass zero for NMATCH or you compiled PREG with the compilation flag
2113
`REG_NOSUB' set, then `regexec' will ignore PMATCH; otherwise, you must
2114
allocate it to have at least NMATCH elements. `regexec' will record
2115
NMATCH byte offsets in PMATCH, and set to -1 any unused elements up to
2116
PMATCH`[NMATCH]' - 1.
2118
EFLAGS specifies "execution flags"--namely, the two bits `REG_NOTBOL'
2119
and `REG_NOTEOL' (defined in `regex.h'). If you set `REG_NOTBOL', then
2120
the match-beginning-of-line operator (*note Match-beginning-of-line
2121
Operator::.) always fails to match. This lets you match against pieces
2122
of a line, as you would need to if, say, searching for repeated
2123
instances of a given pattern in a line; it would work correctly for
2124
patterns both with and without match-beginning-of-line operators.
2125
`REG_NOTEOL' works analogously for the match-end-of-line operator
2126
(*note Match-end-of-line Operator::.); it exists for symmetry.
2128
`regexec' tries to find a match for PREG in STRING according to the
2129
syntax in PREG's `syntax' field. (*Note POSIX Regular Expression
2130
Compiling::, for how to set it.) The function returns zero if the
2131
compiled pattern matches STRING and `REG_NOMATCH' (defined in
2132
`regex.h') if it doesn't.
2135
File: regex.info, Node: Reporting Errors, Next: Using Byte Offsets, Prev: POSIX Matching, Up: POSIX Regex Functions
2140
If either `regcomp' or `regexec' fail, they return a nonzero error
2141
code, the possibilities for which are defined in `regex.h'. *Note
2142
POSIX Regular Expression Compiling::, and *Note POSIX Matching::, for
2143
what these codes mean. To get an error string corresponding to these
2147
regerror (int ERRCODE,
2148
const regex_t *PREG,
2152
ERRCODE is an error code, PREG is the address of the pattern buffer
2153
which provoked the error, ERRBUF is the error buffer, and ERRBUF_SIZE
2156
`regerror' returns the size in bytes of the error string
2157
corresponding to ERRCODE (including its terminating null). If ERRBUF
2158
and ERRBUF_SIZE are nonzero, it also returns in ERRBUF the first
2159
ERRBUF_SIZE - 1 characters of the error string, followed by a null.
2160
eRRBUF_SIZE must be a nonnegative number less than or equal to the size
2163
You can call `regerror' with a null ERRBUF and a zero ERRBUF_SIZE to
2164
determine how large ERRBUF need be to accommodate `regerror''s error
2168
File: regex.info, Node: Using Byte Offsets, Next: Freeing POSIX Pattern Buffers, Prev: Reporting Errors, Up: POSIX Regex Functions
2173
In POSIX, variables of type `regmatch_t' hold analogous information,
2174
but are not identical to, GNU's registers (*note Using Registers::.).
2175
To get information about registers in POSIX, pass to `regexec' a
2176
nonzero PMATCH of type `regmatch_t', i.e., the address of a structure
2177
of this type, defined in `regex.h':
2185
When reading in *Note Using Registers::, about how the matching
2186
function stores the information into the registers, substitute PMATCH
2187
for REGS, `PMATCH[I]->rm_so' for `REGS->start[I]' and
2188
`PMATCH[I]->rm_eo' for `REGS->end[I]'.
2191
File: regex.info, Node: Freeing POSIX Pattern Buffers, Prev: Using Byte Offsets, Up: POSIX Regex Functions
2193
Freeing POSIX Pattern Buffers
2194
-----------------------------
2196
To free any allocated fields of a pattern buffer, use:
2199
regfree (regex_t *PREG)
2201
PREG is the pattern buffer whose allocated fields you want freed.
2202
`regfree' also sets PREG's `allocated' and `used' fields to zero.
2203
After freeing a pattern buffer, you need to again compile a regular
2204
expression in it (*note POSIX Regular Expression Compiling::.) before
2205
passing it to the matching function (*note POSIX Matching::.).
2208
File: regex.info, Node: BSD Regex Functions, Prev: POSIX Regex Functions, Up: Programming with Regex
2213
If you're writing code that has to be Berkeley UNIX compatible,
2214
you'll need to use these functions whose interfaces are the same as
2215
those in Berkeley UNIX.
2219
* BSD Regular Expression Compiling:: re_comp ()
2220
* BSD Searching:: re_exec ()
2223
File: regex.info, Node: BSD Regular Expression Compiling, Next: BSD Searching, Up: BSD Regex Functions
2225
BSD Regular Expression Compiling
2226
--------------------------------
2228
With Berkeley UNIX, you can only search for a given regular
2229
expression; you can't match one. To search for it, you must first
2230
compile it. Before you compile it, you must indicate the regular
2231
expression syntax you want it compiled according to by setting the
2232
variable `re_syntax_options' (declared in `regex.h' to some syntax
2233
(*note Regular Expression Syntax::.).
2235
To compile a regular expression use:
2238
re_comp (char *REGEX)
2240
REGEX is the address of a null-terminated regular expression.
2241
`re_comp' uses an internal pattern buffer, so you can use only the most
2242
recently compiled pattern buffer. This means that if you want to use a
2243
given regular expression that you've already compiled--but it isn't the
2244
latest one you've compiled--you'll have to recompile it. If you call
2245
`re_comp' with the null string (*not* the empty string) as the
2246
argument, it doesn't change the contents of the pattern buffer.
2248
If `re_comp' successfully compiles the regular expression, it returns
2249
zero. If it can't compile the regular expression, it returns an error
2250
string. `re_comp''s error messages are identical to those of
2251
`re_compile_pattern' (*note GNU Regular Expression Compiling::.).
2254
File: regex.info, Node: BSD Searching, Prev: BSD Regular Expression Compiling, Up: BSD Regex Functions
2259
Searching the Berkeley UNIX way means searching in a string starting
2260
at its first character and trying successive positions within it to
2261
find a match. Once you've compiled a pattern using `re_comp' (*note
2262
BSD Regular Expression Compiling::.), you can ask Regex to search for
2263
that pattern in a string using:
2266
re_exec (char *STRING)
2268
STRING is the address of the null-terminated string in which you want
2271
`re_exec' returns either 1 for success or 0 for failure. It
2272
automatically uses a GNU fastmap (*note Searching with Fastmaps::.).
2275
File: regex.info, Node: Copying, Next: Index, Prev: Programming with Regex, Up: Top
2277
GNU GENERAL PUBLIC LICENSE
2278
**************************
2280
Version 2, June 1991
2282
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
2283
675 Mass Ave, Cambridge, MA 02139, USA
2285
Everyone is permitted to copy and distribute verbatim copies
2286
of this license document, but changing it is not allowed.
2291
The licenses for most software are designed to take away your freedom
2292
to share and change it. By contrast, the GNU General Public License is
2293
intended to guarantee your freedom to share and change free
2294
software--to make sure the software is free for all its users. This
2295
General Public License applies to most of the Free Software
2296
Foundation's software and to any other program whose authors commit to
2297
using it. (Some other Free Software Foundation software is covered by
2298
the GNU Library General Public License instead.) You can apply it to
2301
When we speak of free software, we are referring to freedom, not
2302
price. Our General Public Licenses are designed to make sure that you
2303
have the freedom to distribute copies of free software (and charge for
2304
this service if you wish), that you receive source code or can get it
2305
if you want it, that you can change the software or use pieces of it in
2306
new free programs; and that you know you can do these things.
2308
To protect your rights, we need to make restrictions that forbid
2309
anyone to deny you these rights or to ask you to surrender the rights.
2310
These restrictions translate to certain responsibilities for you if you
2311
distribute copies of the software, or if you modify it.
2313
For example, if you distribute copies of such a program, whether
2314
gratis or for a fee, you must give the recipients all the rights that
2315
you have. You must make sure that they, too, receive or can get the
2316
source code. And you must show them these terms so they know their
2319
We protect your rights with two steps: (1) copyright the software, and
2320
(2) offer you this license which gives you legal permission to copy,
2321
distribute and/or modify the software.
2323
Also, for each author's protection and ours, we want to make certain
2324
that everyone understands that there is no warranty for this free
2325
software. If the software is modified by someone else and passed on, we
2326
want its recipients to know that what they have is not the original, so
2327
that any problems introduced by others will not reflect on the original
2328
authors' reputations.
2330
Finally, any free program is threatened constantly by software
2331
patents. We wish to avoid the danger that redistributors of a free
2332
program will individually obtain patent licenses, in effect making the
2333
program proprietary. To prevent this, we have made it clear that any
2334
patent must be licensed for everyone's free use or not licensed at all.
2336
The precise terms and conditions for copying, distribution and
2337
modification follow.
2339
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
2341
1. This License applies to any program or other work which contains a
2342
notice placed by the copyright holder saying it may be distributed
2343
under the terms of this General Public License. The "Program",
2344
below, refers to any such program or work, and a "work based on
2345
the Program" means either the Program or any derivative work under
2346
copyright law: that is to say, a work containing the Program or a
2347
portion of it, either verbatim or with modifications and/or
2348
translated into another language. (Hereinafter, translation is
2349
included without limitation in the term "modification".) Each
2350
licensee is addressed as "you".
2352
Activities other than copying, distribution and modification are
2353
not covered by this License; they are outside its scope. The act
2354
of running the Program is not restricted, and the output from the
2355
Program is covered only if its contents constitute a work based on
2356
the Program (independent of having been made by running the
2357
Program). Whether that is true depends on what the Program does.
2359
2. You may copy and distribute verbatim copies of the Program's
2360
source code as you receive it, in any medium, provided that you
2361
conspicuously and appropriately publish on each copy an appropriate
2362
copyright notice and disclaimer of warranty; keep intact all the
2363
notices that refer to this License and to the absence of any
2364
warranty; and give any other recipients of the Program a copy of
2365
this License along with the Program.
2367
You may charge a fee for the physical act of transferring a copy,
2368
and you may at your option offer warranty protection in exchange
2371
3. You may modify your copy or copies of the Program or any portion
2372
of it, thus forming a work based on the Program, and copy and
2373
distribute such modifications or work under the terms of Section 1
2374
above, provided that you also meet all of these conditions:
2376
a. You must cause the modified files to carry prominent notices
2377
stating that you changed the files and the date of any change.
2379
b. You must cause any work that you distribute or publish, that
2380
in whole or in part contains or is derived from the Program
2381
or any part thereof, to be licensed as a whole at no charge
2382
to all third parties under the terms of this License.
2384
c. If the modified program normally reads commands interactively
2385
when run, you must cause it, when started running for such
2386
interactive use in the most ordinary way, to print or display
2387
an announcement including an appropriate copyright notice and
2388
a notice that there is no warranty (or else, saying that you
2389
provide a warranty) and that users may redistribute the
2390
program under these conditions, and telling the user how to
2391
view a copy of this License. (Exception: if the Program
2392
itself is interactive but does not normally print such an
2393
announcement, your work based on the Program is not required
2394
to print an announcement.)
2396
These requirements apply to the modified work as a whole. If
2397
identifiable sections of that work are not derived from the
2398
Program, and can be reasonably considered independent and separate
2399
works in themselves, then this License, and its terms, do not
2400
apply to those sections when you distribute them as separate
2401
works. But when you distribute the same sections as part of a
2402
whole which is a work based on the Program, the distribution of
2403
the whole must be on the terms of this License, whose permissions
2404
for other licensees extend to the entire whole, and thus to each
2405
and every part regardless of who wrote it.
2407
Thus, it is not the intent of this section to claim rights or
2408
contest your rights to work written entirely by you; rather, the
2409
intent is to exercise the right to control the distribution of
2410
derivative or collective works based on the Program.
2412
In addition, mere aggregation of another work not based on the
2413
Program with the Program (or with a work based on the Program) on
2414
a volume of a storage or distribution medium does not bring the
2415
other work under the scope of this License.
2417
4. You may copy and distribute the Program (or a work based on it,
2418
under Section 2) in object code or executable form under the terms
2419
of Sections 1 and 2 above provided that you also do one of the
2422
a. Accompany it with the complete corresponding machine-readable
2423
source code, which must be distributed under the terms of
2424
Sections 1 and 2 above on a medium customarily used for
2425
software interchange; or,
2427
b. Accompany it with a written offer, valid for at least three
2428
years, to give any third party, for a charge no more than your
2429
cost of physically performing source distribution, a complete
2430
machine-readable copy of the corresponding source code, to be
2431
distributed under the terms of Sections 1 and 2 above on a
2432
medium customarily used for software interchange; or,
2434
c. Accompany it with the information you received as to the offer
2435
to distribute corresponding source code. (This alternative is
2436
allowed only for noncommercial distribution and only if you
2437
received the program in object code or executable form with
2438
such an offer, in accord with Subsection b above.)
2440
The source code for a work means the preferred form of the work for
2441
making modifications to it. For an executable work, complete
2442
source code means all the source code for all modules it contains,
2443
plus any associated interface definition files, plus the scripts
2444
used to control compilation and installation of the executable.
2445
However, as a special exception, the source code distributed need
2446
not include anything that is normally distributed (in either
2447
source or binary form) with the major components (compiler,
2448
kernel, and so on) of the operating system on which the executable
2449
runs, unless that component itself accompanies the executable.
2451
If distribution of executable or object code is made by offering
2452
access to copy from a designated place, then offering equivalent
2453
access to copy the source code from the same place counts as
2454
distribution of the source code, even though third parties are not
2455
compelled to copy the source along with the object code.
2457
5. You may not copy, modify, sublicense, or distribute the Program
2458
except as expressly provided under this License. Any attempt
2459
otherwise to copy, modify, sublicense or distribute the Program is
2460
void, and will automatically terminate your rights under this
2461
License. However, parties who have received copies, or rights,
2462
from you under this License will not have their licenses
2463
terminated so long as such parties remain in full compliance.
2465
6. You are not required to accept this License, since you have not
2466
signed it. However, nothing else grants you permission to modify
2467
or distribute the Program or its derivative works. These actions
2468
are prohibited by law if you do not accept this License.
2469
Therefore, by modifying or distributing the Program (or any work
2470
based on the Program), you indicate your acceptance of this
2471
License to do so, and all its terms and conditions for copying,
2472
distributing or modifying the Program or works based on it.
2474
7. Each time you redistribute the Program (or any work based on the
2475
Program), the recipient automatically receives a license from the
2476
original licensor to copy, distribute or modify the Program
2477
subject to these terms and conditions. You may not impose any
2478
further restrictions on the recipients' exercise of the rights
2479
granted herein. You are not responsible for enforcing compliance
2480
by third parties to this License.
2482
8. If, as a consequence of a court judgment or allegation of patent
2483
infringement or for any other reason (not limited to patent
2484
issues), conditions are imposed on you (whether by court order,
2485
agreement or otherwise) that contradict the conditions of this
2486
License, they do not excuse you from the conditions of this
2487
License. If you cannot distribute so as to satisfy simultaneously
2488
your obligations under this License and any other pertinent
2489
obligations, then as a consequence you may not distribute the
2490
Program at all. For example, if a patent license would not permit
2491
royalty-free redistribution of the Program by all those who
2492
receive copies directly or indirectly through you, then the only
2493
way you could satisfy both it and this License would be to refrain
2494
entirely from distribution of the Program.
2496
If any portion of this section is held invalid or unenforceable
2497
under any particular circumstance, the balance of the section is
2498
intended to apply and the section as a whole is intended to apply
2499
in other circumstances.
2501
It is not the purpose of this section to induce you to infringe any
2502
patents or other property right claims or to contest validity of
2503
any such claims; this section has the sole purpose of protecting
2504
the integrity of the free software distribution system, which is
2505
implemented by public license practices. Many people have made
2506
generous contributions to the wide range of software distributed
2507
through that system in reliance on consistent application of that
2508
system; it is up to the author/donor to decide if he or she is
2509
willing to distribute software through any other system and a
2510
licensee cannot impose that choice.
2512
This section is intended to make thoroughly clear what is believed
2513
to be a consequence of the rest of this License.
2515
9. If the distribution and/or use of the Program is restricted in
2516
certain countries either by patents or by copyrighted interfaces,
2517
the original copyright holder who places the Program under this
2518
License may add an explicit geographical distribution limitation
2519
excluding those countries, so that distribution is permitted only
2520
in or among countries not thus excluded. In such case, this
2521
License incorporates the limitation as if written in the body of
2524
10. The Free Software Foundation may publish revised and/or new
2525
versions of the General Public License from time to time. Such
2526
new versions will be similar in spirit to the present version, but
2527
may differ in detail to address new problems or concerns.
2529
Each version is given a distinguishing version number. If the
2530
Program specifies a version number of this License which applies
2531
to it and "any later version", you have the option of following
2532
the terms and conditions either of that version or of any later
2533
version published by the Free Software Foundation. If the Program
2534
does not specify a version number of this License, you may choose
2535
any version ever published by the Free Software Foundation.
2537
11. If you wish to incorporate parts of the Program into other free
2538
programs whose distribution conditions are different, write to the
2539
author to ask for permission. For software which is copyrighted
2540
by the Free Software Foundation, write to the Free Software
2541
Foundation; we sometimes make exceptions for this. Our decision
2542
will be guided by the two goals of preserving the free status of
2543
all derivatives of our free software and of promoting the sharing
2544
and reuse of software generally.
2548
12. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO
2549
WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE
2550
LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
2551
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT
2552
WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT
2553
NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
2554
FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE
2555
QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
2556
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY
2557
SERVICING, REPAIR OR CORRECTION.
2559
13. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
2560
WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY
2561
MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE
2562
LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL,
2563
INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR
2564
INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
2565
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU
2566
OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY
2567
OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN
2568
ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
2570
END OF TERMS AND CONDITIONS
2572
Appendix: How to Apply These Terms to Your New Programs
2573
=======================================================
2575
If you develop a new program, and you want it to be of the greatest
2576
possible use to the public, the best way to achieve this is to make it
2577
free software which everyone can redistribute and change under these
2580
To do so, attach the following notices to the program. It is safest
2581
to attach them to the start of each source file to most effectively
2582
convey the exclusion of warranty; and each file should have at least
2583
the "copyright" line and a pointer to where the full notice is found.
2585
ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES.
2586
Copyright (C) 19YY NAME OF AUTHOR
2588
This program is free software; you can redistribute it and/or modify
2589
it under the terms of the GNU General Public License as published by
2590
the Free Software Foundation; either version 2 of the License, or
2591
(at your option) any later version.
2593
This program is distributed in the hope that it will be useful,
2594
but WITHOUT ANY WARRANTY; without even the implied warranty of
2595
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
2596
GNU General Public License for more details.
2598
You should have received a copy of the GNU General Public License
2599
along with this program; if not, write to the Free Software
2600
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
2602
Also add information on how to contact you by electronic and paper
2605
If the program is interactive, make it output a short notice like this
2606
when it starts in an interactive mode:
2608
Gnomovision version 69, Copyright (C) 19YY NAME OF AUTHOR
2609
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
2610
This is free software, and you are welcome to redistribute it
2611
under certain conditions; type `show c' for details.
2613
The hypothetical commands `show w' and `show c' should show the
2614
appropriate parts of the General Public License. Of course, the
2615
commands you use may be called something other than `show w' and `show
2616
c'; they could even be mouse-clicks or menu items--whatever suits your
2619
You should also get your employer (if you work as a programmer) or
2620
your school, if any, to sign a "copyright disclaimer" for the program,
2621
if necessary. Here is a sample; alter the names:
2623
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
2624
`Gnomovision' (which makes passes at compilers) written by James Hacker.
2626
SIGNATURE OF TY COON, 1 April 1989
2627
Ty Coon, President of Vice
2629
This General Public License does not permit incorporating your
2630
program into proprietary programs. If your program is a subroutine
2631
library, you may consider it more useful to permit linking proprietary
2632
applications with the library. If this is what you want to do, use the
2633
GNU Library General Public License instead of this License.
2636
File: regex.info, Node: Index, Prev: Copying, Up: Top
2643
* $: Match-end-of-line Operator.
2644
* (: Grouping Operators.
2645
* ): Grouping Operators.
2646
* *: Match-zero-or-more Operator.
2647
* +: Match-one-or-more Operator.
2648
* -: List Operators.
2649
* .: Match-any-character Operator.
2650
* :] in regex: Character Class Operators.
2651
* ?: Match-zero-or-one Operator.
2652
* {: Interval Operators.
2653
* }: Interval Operators.
2654
* [: in regex: Character Class Operators.
2655
* [^: List Operators.
2656
* [: List Operators.
2657
* \': Match-end-of-buffer Operator.
2658
* \<: Match-beginning-of-word Operator.
2659
* \>: Match-end-of-word Operator.
2660
* \{: Interval Operators.
2661
* \}: Interval Operators.
2662
* \b: Match-word-boundary Operator.
2663
* \B: Match-within-word Operator.
2664
* \s: Match-syntactic-class Operator.
2665
* \S: Match-not-syntactic-class Operator.
2666
* \w: Match-word-constituent Operator.
2667
* \W: Match-non-word-constituent Operator.
2668
* \`: Match-beginning-of-buffer Operator.
2669
* \: List Operators.
2670
* ]: List Operators.
2671
* ^: List Operators.
2672
* allocated initialization: GNU Regular Expression Compiling.
2673
* alternation operator: Alternation Operator.
2674
* alternation operator and ^: Match-beginning-of-line Operator.
2675
* anchoring: Anchoring Operators.
2676
* anchors: Match-end-of-line Operator.
2677
* anchors: Match-beginning-of-line Operator.
2678
* Awk: Predefined Syntaxes.
2679
* back references: Back-reference Operator.
2680
* backtracking: Match-zero-or-more Operator.
2681
* backtracking: Alternation Operator.
2682
* beginning-of-line operator: Match-beginning-of-line Operator.
2683
* bracket expression: List Operators.
2684
* buffer field, set by re_compile_pattern: GNU Regular Expression Compiling.
2685
* buffer initialization: GNU Regular Expression Compiling.
2686
* character classes: Character Class Operators.
2687
* Egrep: Predefined Syntaxes.
2688
* Emacs: Predefined Syntaxes.
2689
* end in struct re_registers: Using Registers.
2690
* end-of-line operator: Match-end-of-line Operator.
2691
* fastmap initialization: GNU Regular Expression Compiling.
2692
* fastmaps: Searching with Fastmaps.
2693
* fastmap_accurate field, set by re_compile_pattern: GNU Regular Expression Compiling.
2694
* Grep: Predefined Syntaxes.
2695
* grouping: Grouping Operators.
2696
* ignoring case: POSIX Regular Expression Compiling.
2697
* interval expression: Interval Operators.
2698
* matching list: List Operators.
2699
* matching newline: List Operators.
2700
* matching with GNU functions: GNU Matching.
2701
* newline_anchor field in pattern buffer: Match-beginning-of-line Operator.
2702
* nonmatching list: List Operators.
2703
* not_bol field in pattern buffer: Match-beginning-of-line Operator.
2704
* num_regs in struct re_registers: Using Registers.
2705
* open-group operator and ^: Match-beginning-of-line Operator.
2706
* or operator: Alternation Operator.
2707
* parenthesizing: Grouping Operators.
2708
* pattern buffer initialization: GNU Regular Expression Compiling.
2709
* pattern buffer, definition of: GNU Pattern Buffers.
2710
* POSIX Awk: Predefined Syntaxes.
2711
* range argument to re_search: GNU Searching.
2712
* regex.c: Overview.
2713
* regex.h: Overview.
2714
* regexp anchoring: Anchoring Operators.
2715
* regmatch_t: Using Byte Offsets.
2716
* regs_allocated: Using Registers.
2717
* REGS_FIXED: Using Registers.
2718
* REGS_REALLOCATE: Using Registers.
2719
* REGS_UNALLOCATED: Using Registers.
2720
* regular expressions, syntax of: Regular Expression Syntax.
2721
* REG_EXTENDED: POSIX Regular Expression Compiling.
2722
* REG_ICASE: POSIX Regular Expression Compiling.
2723
* REG_NEWLINE: POSIX Regular Expression Compiling.
2724
* REG_NOSUB: POSIX Regular Expression Compiling.
2725
* RE_BACKSLASH_ESCAPE_IN_LIST: Syntax Bits.
2726
* RE_BK_PLUS_QM: Syntax Bits.
2727
* RE_CHAR_CLASSES: Syntax Bits.
2728
* RE_CONTEXT_INDEP_ANCHORS: Syntax Bits.
2729
* RE_CONTEXT_INDEP_ANCHORS (and ^): Match-beginning-of-line Operator.
2730
* RE_CONTEXT_INDEP_OPS: Syntax Bits.
2731
* RE_CONTEXT_INVALID_OPS: Syntax Bits.
2732
* RE_DOT_NEWLINE: Syntax Bits.
2733
* RE_DOT_NOT_NULL: Syntax Bits.
2734
* RE_INTERVALS: Syntax Bits.
2735
* RE_LIMITED_OPS: Syntax Bits.
2736
* RE_NEWLINE_ALT: Syntax Bits.
2737
* RE_NO_BK_BRACES: Syntax Bits.
2738
* RE_NO_BK_PARENS: Syntax Bits.
2739
* RE_NO_BK_REFS: Syntax Bits.
2740
* RE_NO_BK_VBAR: Syntax Bits.
2741
* RE_NO_EMPTY_RANGES: Syntax Bits.
2742
* re_nsub field, set by re_compile_pattern: GNU Regular Expression Compiling.
2743
* re_pattern_buffer definition: GNU Pattern Buffers.
2744
* re_registers: Using Registers.
2745
* re_syntax_options initialization: GNU Regular Expression Compiling.
2746
* RE_UNMATCHED_RIGHT_PAREN_ORD: Syntax Bits.
2747
* searching with GNU functions: GNU Searching.
2748
* start argument to re_search: GNU Searching.
2749
* start in struct re_registers: Using Registers.
2750
* struct re_pattern_buffer definition: GNU Pattern Buffers.
2751
* subexpressions: Grouping Operators.
2752
* syntax field, set by re_compile_pattern: GNU Regular Expression Compiling.
2753
* syntax bits: Syntax Bits.
2754
* syntax initialization: GNU Regular Expression Compiling.
2755
* syntax of regular expressions: Regular Expression Syntax.
2756
* translate initialization: GNU Regular Expression Compiling.
2757
* used field, set by re_compile_pattern: GNU Regular Expression Compiling.
2758
* word boundaries, matching: Match-word-boundary Operator.
2759
* \: The Backslash Character.
2760
* \(: Grouping Operators.
2761
* \): Grouping Operators.
2762
* \|: Alternation Operator.
2763
* ^: Match-beginning-of-line Operator.
2764
* |: Alternation Operator.
2771
Node: Regular Expression Syntax6746
2772
Node: Syntax Bits7916
2773
Node: Predefined Syntaxes14018
2774
Node: Collating Elements vs. Characters17872
2775
Node: The Backslash Character18835
2776
Node: Common Operators21992
2777
Node: Match-self Operator23445
2778
Node: Match-any-character Operator23941
2779
Node: Concatenation Operator24520
2780
Node: Repetition Operators25017
2781
Node: Match-zero-or-more Operator25436
2782
Node: Match-one-or-more Operator27483
2783
Node: Match-zero-or-one Operator28341
2784
Node: Interval Operators29196
2785
Node: Alternation Operator30991
2786
Node: List Operators32489
2787
Node: Character Class Operators35272
2788
Node: Range Operator36901
2789
Node: Grouping Operators38930
2790
Node: Back-reference Operator40251
2791
Node: Anchoring Operators43073
2792
Node: Match-beginning-of-line Operator43447
2793
Node: Match-end-of-line Operator44779
2794
Node: GNU Operators45518
2795
Node: Word Operators45767
2796
Node: Non-Emacs Syntax Tables46391
2797
Node: Match-word-boundary Operator47465
2798
Node: Match-within-word Operator47858
2799
Node: Match-beginning-of-word Operator48255
2800
Node: Match-end-of-word Operator48588
2801
Node: Match-word-constituent Operator48908
2802
Node: Match-non-word-constituent Operator49234
2803
Node: Buffer Operators49545
2804
Node: Match-beginning-of-buffer Operator49952
2805
Node: Match-end-of-buffer Operator50264
2806
Node: GNU Emacs Operators50558
2807
Node: Syntactic Class Operators50901
2808
Node: Emacs Syntax Tables51307
2809
Node: Match-syntactic-class Operator51963
2810
Node: Match-not-syntactic-class Operator52560
2811
Node: What Gets Matched?53150
2812
Node: Programming with Regex53799
2813
Node: GNU Regex Functions54237
2814
Node: GNU Pattern Buffers55078
2815
Node: GNU Regular Expression Compiling58303
2816
Node: GNU Matching61181
2817
Node: GNU Searching63101
2818
Node: Matching/Searching with Split Data64913
2819
Node: Searching with Fastmaps66369
2820
Node: GNU Translate Tables68921
2821
Node: Using Registers70892
2822
Node: Freeing GNU Pattern Buffers77000
2823
Node: POSIX Regex Functions77593
2824
Node: POSIX Pattern Buffers78266
2825
Node: POSIX Regular Expression Compiling78709
2826
Node: POSIX Matching82836
2827
Node: Reporting Errors84791
2828
Node: Using Byte Offsets86048
2829
Node: Freeing POSIX Pattern Buffers86861
2830
Node: BSD Regex Functions87467
2831
Node: BSD Regular Expression Compiling87886
2832
Node: BSD Searching89258