3
@setfilename regex.info
7
@c \\{fill-paragraph} works better (for me, anyway) if the text in the
8
@c source file isn't indented.
11
@c Define a new index for our magic constants.
14
@c Put everything in one index (arbitrarily chosen to be the concept index).
21
@c Here is what we use in the Info `dir' file:
22
@c * Regex: (regex). Regular expression library.
26
This file documents the GNU regular expression library.
28
Copyright (C) 1992, 1993 Free Software Foundation, Inc.
30
Permission is granted to make and distribute verbatim copies of this
31
manual provided the copyright notice and this permission notice are
32
preserved on all copies.
35
Permission is granted to process this file through TeX and print the
36
results, provided the printed document carries a copying permission
37
notice identical to this one except for the removal of this paragraph
38
(this paragraph not being relevant to the printed manual).
41
Permission is granted to copy and distribute modified versions of this
42
manual under the conditions for verbatim copying, provided also that the
43
section entitled ``GNU General Public License'' is included exactly as
44
in the original, and provided that the entire resulting derived work is
45
distributed under the terms of a permission notice identical to this one.
47
Permission is granted to copy and distribute translations of this manual
48
into another language, under the above conditions for modified versions,
49
except that the section entitled ``GNU General Public License'' may be
50
included in a translation approved by the Free Software Foundation
51
instead of in the original English.
58
@subtitle edition 0.12a
59
@subtitle 19 September 1992
60
@author Kathryn A. Hargreaves
65
@vskip 0pt plus 1filll
66
Copyright @copyright{} 1992 Free Software Foundation.
68
Permission is granted to make and distribute verbatim copies of this
69
manual provided the copyright notice and this permission notice are
70
preserved on all copies.
72
Permission is granted to copy and distribute modified versions of this
73
manual under the conditions for verbatim copying, provided also that the
74
section entitled ``GNU General Public License'' is included exactly as
75
in the original, and provided that the entire resulting derived work is
76
distributed under the terms of a permission notice identical to this
79
Permission is granted to copy and distribute translations of this manual
80
into another language, under the above conditions for modified versions,
81
except that the section entitled ``GNU General Public License'' may be
82
included in a translation approved by the Free Software Foundation
83
instead of in the original English.
89
@node Top, Overview, (dir), (dir)
90
@top Regular Expression Library
92
This manual documents how to program with the GNU regular expression
93
library. This is edition 0.12a of the manual, 19 September 1992.
95
The first part of this master menu lists the major nodes in this Info
96
document, including the index. The rest of the menu lists all the
97
lower level nodes in the document.
101
* Regular Expression Syntax::
104
* GNU Emacs Operators::
105
* What Gets Matched?::
106
* Programming with Regex::
107
* Copying:: Copying and sharing Regex.
108
* Index:: General index.
109
--- The Detailed Node Listing ---
111
Regular Expression Syntax
114
* Predefined Syntaxes::
115
* Collating Elements vs. Characters::
116
* The Backslash Character::
120
* Match-self Operator:: Ordinary characters.
121
* Match-any-character Operator:: .
122
* Concatenation Operator:: Juxtaposition.
123
* Repetition Operators:: * + ? @{@}
124
* Alternation Operator:: |
125
* List Operators:: [...] [^...]
126
* Grouping Operators:: (...)
127
* Back-reference Operator:: \digit
128
* Anchoring Operators:: ^ $
132
* Match-zero-or-more Operator:: *
133
* Match-one-or-more Operator:: +
134
* Match-zero-or-one Operator:: ?
135
* Interval Operators:: @{@}
137
List Operators (@code{[} @dots{} @code{]} and @code{[^} @dots{} @code{]})
139
* Character Class Operators:: [:class:]
140
* Range Operator:: start-end
144
* Match-beginning-of-line Operator:: ^
145
* Match-end-of-line Operator:: $
154
* Non-Emacs Syntax Tables::
155
* Match-word-boundary Operator:: \b
156
* Match-within-word Operator:: \B
157
* Match-beginning-of-word Operator:: \<
158
* Match-end-of-word Operator:: \>
159
* Match-word-constituent Operator:: \w
160
* Match-non-word-constituent Operator:: \W
164
* Match-beginning-of-buffer Operator:: \`
165
* Match-end-of-buffer Operator:: \'
169
* Syntactic Class Operators::
171
Syntactic Class Operators
173
* Emacs Syntax Tables::
174
* Match-syntactic-class Operator:: \sCLASS
175
* Match-not-syntactic-class Operator:: \SCLASS
177
Programming with Regex
179
* GNU Regex Functions::
180
* POSIX Regex Functions::
181
* BSD Regex Functions::
185
* GNU Pattern Buffers:: The re_pattern_buffer type.
186
* GNU Regular Expression Compiling:: re_compile_pattern ()
187
* GNU Matching:: re_match ()
188
* GNU Searching:: re_search ()
189
* Matching/Searching with Split Data:: re_match_2 (), re_search_2 ()
190
* Searching with Fastmaps:: re_compile_fastmap ()
191
* GNU Translate Tables:: The `translate' field.
192
* Using Registers:: The re_registers type and related fns.
193
* Freeing GNU Pattern Buffers:: regfree ()
195
POSIX Regex Functions
197
* POSIX Pattern Buffers:: The regex_t type.
198
* POSIX Regular Expression Compiling:: regcomp ()
199
* POSIX Matching:: regexec ()
200
* Reporting Errors:: regerror ()
201
* Using Byte Offsets:: The regmatch_t type.
202
* Freeing POSIX Pattern Buffers:: regfree ()
206
* BSD Regular Expression Compiling:: re_comp ()
207
* BSD Searching:: re_exec ()
210
@node Overview, Regular Expression Syntax, Top, Top
213
A @dfn{regular expression} (or @dfn{regexp}, or @dfn{pattern}) is a text
214
string that describes some (mathematical) set of strings. A regexp
215
@var{r} @dfn{matches} a string @var{s} if @var{s} is in the set of
216
strings described by @var{r}.
218
Using the Regex library, you can:
223
see if a string matches a specified pattern as a whole, and
226
search within a string for a substring matching a specified pattern.
230
Some regular expressions match only one string, i.e., the set they
231
describe has only one member. For example, the regular expression
232
@samp{foo} matches the string @samp{foo} and no others. Other regular
233
expressions match more than one string, i.e., the set they describe has
234
more than one member. For example, the regular expression @samp{f*}
235
matches the set of strings made up of any number (including zero) of
236
@samp{f}s. As you can see, some characters in regular expressions match
237
themselves (such as @samp{f}) and some don't (such as @samp{*}); the
238
ones that don't match themselves instead let you specify patterns that
239
describe many different strings.
241
To either match or search for a regular expression with the Regex
242
library functions, you must first compile it with a Regex pattern
243
compiling function. A @dfn{compiled pattern} is a regular expression
244
converted to the internal format used by the library functions. Once
245
you've compiled a pattern, you can use it for matching or searching any
248
The Regex library consists of two source files: @file{regex.h} and
252
Regex provides three groups of functions with which you can operate on
253
regular expressions. One group---the @sc{gnu} group---is more powerful
254
but not completely compatible with the other two, namely the @sc{posix}
255
and Berkeley @sc{unix} groups; its interface was designed specifically
256
for @sc{gnu}. The other groups have the same interfaces as do the
257
regular expression functions in @sc{posix} and Berkeley
260
We wrote this chapter with programmers in mind, not users of
261
programs---such as Emacs---that use Regex. We describe the Regex
262
library in its entirety, not how to write regular expressions that a
263
particular program understands.
266
@node Regular Expression Syntax, Common Operators, Overview, Top
267
@chapter Regular Expression Syntax
269
@cindex regular expressions, syntax of
270
@cindex syntax of regular expressions
272
@dfn{Characters} are things you can type. @dfn{Operators} are things in
273
a regular expression that match one or more characters. You compose
274
regular expressions from operators, which in turn you specify using one
277
Most characters represent what we call the match-self operator, i.e.,
278
they match themselves; we call these characters @dfn{ordinary}. Other
279
characters represent either all or parts of fancier operators; e.g.,
280
@samp{.} represents what we call the match-any-character operator
281
(which, no surprise, matches (almost) any character); we call these
282
characters @dfn{special}. Two different things determine what
283
characters represent what operators:
287
the regular expression syntax your program has told the Regex library to
291
the context of the character in the regular expression.
294
In the following sections, we describe these things in more detail.
298
* Predefined Syntaxes::
299
* Collating Elements vs. Characters::
300
* The Backslash Character::
304
@node Syntax Bits, Predefined Syntaxes, , Regular Expression Syntax
309
In any particular syntax for regular expressions, some characters are
310
always special, others are sometimes special, and others are never
311
special. The particular syntax that Regex recognizes for a given
312
regular expression depends on the value in the @code{syntax} field of
313
the pattern buffer of that regular expression.
315
You get a pattern buffer by compiling a regular expression. @xref{GNU
316
Pattern Buffers}, and @ref{POSIX Pattern Buffers}, for more information
317
on pattern buffers. @xref{GNU Regular Expression Compiling}, @ref{POSIX
318
Regular Expression Compiling}, and @ref{BSD Regular Expression
319
Compiling}, for more information on compiling.
321
Regex considers the value of the @code{syntax} field to be a collection
322
of bits; we refer to these bits as @dfn{syntax bits}. In most cases,
323
they affect what characters represent what operators. We describe the
324
meanings of the operators to which we refer in @ref{Common Operators},
325
@ref{GNU Operators}, and @ref{GNU Emacs Operators}.
327
For reference, here is the complete list of syntax bits, in alphabetical
332
@cnindex RE_BACKSLASH_ESCAPE_IN_LIST
333
@item RE_BACKSLASH_ESCAPE_IN_LISTS
334
If this bit is set, then @samp{\} inside a list (@pxref{List Operators}
335
quotes (makes ordinary, if it's special) the following character; if
336
this bit isn't set, then @samp{\} is an ordinary character inside lists.
337
(@xref{The Backslash Character}, for what `\' does outside of lists.)
339
@cnindex RE_BK_PLUS_QM
341
If this bit is set, then @samp{\+} represents the match-one-or-more
342
operator and @samp{\?} represents the match-zero-or-more operator; if
343
this bit isn't set, then @samp{+} represents the match-one-or-more
344
operator and @samp{?} represents the match-zero-or-one operator. This
345
bit is irrelevant if @code{RE_LIMITED_OPS} is set.
347
@cnindex RE_CHAR_CLASSES
348
@item RE_CHAR_CLASSES
349
If this bit is set, then you can use character classes in lists; if this
350
bit isn't set, then you can't.
352
@cnindex RE_CONTEXT_INDEP_ANCHORS
353
@item RE_CONTEXT_INDEP_ANCHORS
354
If this bit is set, then @samp{^} and @samp{$} are special anywhere outside
355
a list; if this bit isn't set, then these characters are special only in
356
certain contexts. @xref{Match-beginning-of-line Operator}, and
357
@ref{Match-end-of-line Operator}.
359
@cnindex RE_CONTEXT_INDEP_OPS
360
@item RE_CONTEXT_INDEP_OPS
361
If this bit is set, then certain characters are special anywhere outside
362
a list; if this bit isn't set, then those characters are special only in
363
some contexts and are ordinary elsewhere. Specifically, if this bit
364
isn't set then @samp{*}, and (if the syntax bit @code{RE_LIMITED_OPS}
365
isn't set) @samp{+} and @samp{?} (or @samp{\+} and @samp{\?}, depending
366
on the syntax bit @code{RE_BK_PLUS_QM}) represent repetition operators
367
only if they're not first in a regular expression or just after an
368
open-group or alternation operator. The same holds for @samp{@{} (or
369
@samp{\@{}, depending on the syntax bit @code{RE_NO_BK_BRACES}) if
370
it is the beginning of a valid interval and the syntax bit
371
@code{RE_INTERVALS} is set.
373
@cnindex RE_CONTEXT_INVALID_OPS
374
@item RE_CONTEXT_INVALID_OPS
375
If this bit is set, then repetition and alternation operators can't be
376
in certain positions within a regular expression. Specifically, the
377
regular expression is invalid if it has:
382
a repetition operator first in the regular expression or just after a
383
match-beginning-of-line, open-group, or alternation operator; or
386
an alternation operator first or last in the regular expression, just
387
before a match-end-of-line operator, or just after an alternation or
392
If this bit isn't set, then you can put the characters representing the
393
repetition and alternation characters anywhere in a regular expression.
394
Whether or not they will in fact be operators in certain positions
395
depends on other syntax bits.
397
@cnindex RE_DOT_NEWLINE
399
If this bit is set, then the match-any-character operator matches
400
a newline; if this bit isn't set, then it doesn't.
402
@cnindex RE_DOT_NOT_NULL
403
@item RE_DOT_NOT_NULL
404
If this bit is set, then the match-any-character operator doesn't match
405
a null character; if this bit isn't set, then it does.
407
@cnindex RE_INTERVALS
409
If this bit is set, then Regex recognizes interval operators; if this bit
410
isn't set, then it doesn't.
412
@cnindex RE_LIMITED_OPS
414
If this bit is set, then Regex doesn't recognize the match-one-or-more,
415
match-zero-or-one or alternation operators; if this bit isn't set, then
418
@cnindex RE_NEWLINE_ALT
420
If this bit is set, then newline represents the alternation operator; if
421
this bit isn't set, then newline is ordinary.
423
@cnindex RE_NO_BK_BRACES
424
@item RE_NO_BK_BRACES
425
If this bit is set, then @samp{@{} represents the open-interval operator
426
and @samp{@}} represents the close-interval operator; if this bit isn't
427
set, then @samp{\@{} represents the open-interval operator and
428
@samp{\@}} represents the close-interval operator. This bit is relevant
429
only if @code{RE_INTERVALS} is set.
431
@cnindex RE_NO_BK_PARENS
432
@item RE_NO_BK_PARENS
433
If this bit is set, then @samp{(} represents the open-group operator and
434
@samp{)} represents the close-group operator; if this bit isn't set, then
435
@samp{\(} represents the open-group operator and @samp{\)} represents
436
the close-group operator.
438
@cnindex RE_NO_BK_REFS
440
If this bit is set, then Regex doesn't recognize @samp{\}@var{digit} as
441
the back reference operator; if this bit isn't set, then it does.
443
@cnindex RE_NO_BK_VBAR
445
If this bit is set, then @samp{|} represents the alternation operator;
446
if this bit isn't set, then @samp{\|} represents the alternation
447
operator. This bit is irrelevant if @code{RE_LIMITED_OPS} is set.
449
@cnindex RE_NO_EMPTY_RANGES
450
@item RE_NO_EMPTY_RANGES
451
If this bit is set, then a regular expression with a range whose ending
452
point collates lower than its starting point is invalid; if this bit
453
isn't set, then Regex considers such a range to be empty.
455
@cnindex RE_UNMATCHED_RIGHT_PAREN_ORD
456
@item RE_UNMATCHED_RIGHT_PAREN_ORD
457
If this bit is set and the regular expression has no matching open-group
458
operator, then Regex considers what would otherwise be a close-group
459
operator (based on how @code{RE_NO_BK_PARENS} is set) to match @samp{)}.
464
@node Predefined Syntaxes, Collating Elements vs. Characters, Syntax Bits, Regular Expression Syntax
465
@section Predefined Syntaxes
467
If you're programming with Regex, you can set a pattern buffer's
468
(@pxref{GNU Pattern Buffers}, and @ref{POSIX Pattern Buffers})
469
@code{syntax} field either to an arbitrary combination of syntax bits
470
(@pxref{Syntax Bits}) or else to the configurations defined by Regex.
471
These configurations define the syntaxes used by certain
472
programs---@sc{gnu} Emacs,
481
Egrep---in addition to syntaxes for @sc{posix} basic and extended
484
The predefined syntaxes--taken directly from @file{regex.h}---are:
487
#define RE_SYNTAX_EMACS 0
489
#define RE_SYNTAX_AWK \
490
(RE_BACKSLASH_ESCAPE_IN_LISTS | RE_DOT_NOT_NULL \
491
| RE_NO_BK_PARENS | RE_NO_BK_REFS \
492
| RE_NO_BK_VBAR | RE_NO_EMPTY_RANGES \
493
| RE_UNMATCHED_RIGHT_PAREN_ORD)
495
#define RE_SYNTAX_POSIX_AWK \
496
(RE_SYNTAX_POSIX_EXTENDED | RE_BACKSLASH_ESCAPE_IN_LISTS)
498
#define RE_SYNTAX_GREP \
499
(RE_BK_PLUS_QM | RE_CHAR_CLASSES \
500
| RE_HAT_LISTS_NOT_NEWLINE | RE_INTERVALS \
503
#define RE_SYNTAX_EGREP \
504
(RE_CHAR_CLASSES | RE_CONTEXT_INDEP_ANCHORS \
505
| RE_CONTEXT_INDEP_OPS | RE_HAT_LISTS_NOT_NEWLINE \
506
| RE_NEWLINE_ALT | RE_NO_BK_PARENS \
509
#define RE_SYNTAX_POSIX_EGREP \
510
(RE_SYNTAX_EGREP | RE_INTERVALS | RE_NO_BK_BRACES)
512
/* P1003.2/D11.2, section 4.20.7.1, lines 5078ff. */
513
#define RE_SYNTAX_ED RE_SYNTAX_POSIX_BASIC
515
#define RE_SYNTAX_SED RE_SYNTAX_POSIX_BASIC
517
/* Syntax bits common to both basic and extended POSIX regex syntax. */
518
#define _RE_SYNTAX_POSIX_COMMON \
519
(RE_CHAR_CLASSES | RE_DOT_NEWLINE | RE_DOT_NOT_NULL \
520
| RE_INTERVALS | RE_NO_EMPTY_RANGES)
522
#define RE_SYNTAX_POSIX_BASIC \
523
(_RE_SYNTAX_POSIX_COMMON | RE_BK_PLUS_QM)
525
/* Differs from ..._POSIX_BASIC only in that RE_BK_PLUS_QM becomes
526
RE_LIMITED_OPS, i.e., \? \+ \| are not recognized. Actually, this
527
isn't minimal, since other operators, such as \`, aren't disabled. */
528
#define RE_SYNTAX_POSIX_MINIMAL_BASIC \
529
(_RE_SYNTAX_POSIX_COMMON | RE_LIMITED_OPS)
531
#define RE_SYNTAX_POSIX_EXTENDED \
532
(_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \
533
| RE_CONTEXT_INDEP_OPS | RE_NO_BK_BRACES \
534
| RE_NO_BK_PARENS | RE_NO_BK_VBAR \
535
| RE_UNMATCHED_RIGHT_PAREN_ORD)
537
/* Differs from ..._POSIX_EXTENDED in that RE_CONTEXT_INVALID_OPS
538
replaces RE_CONTEXT_INDEP_OPS and RE_NO_BK_REFS is added. */
539
#define RE_SYNTAX_POSIX_MINIMAL_EXTENDED \
540
(_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \
541
| RE_CONTEXT_INVALID_OPS | RE_NO_BK_BRACES \
542
| RE_NO_BK_PARENS | RE_NO_BK_REFS \
543
| RE_NO_BK_VBAR | RE_UNMATCHED_RIGHT_PAREN_ORD)
546
@node Collating Elements vs. Characters, The Backslash Character, Predefined Syntaxes, Regular Expression Syntax
547
@section Collating Elements vs.@: Characters
549
@sc{posix} generalizes the notion of a character to that of a
550
collating element. It defines a @dfn{collating element} to be ``a
551
sequence of one or more bytes defined in the current collating sequence
552
as a unit of collation.''
554
This generalizes the notion of a character in
555
two ways. First, a single character can map into two or more collating
556
elements. For example, the German
563
collates as the collating element @samp{s} followed by another collating
564
element @samp{s}. Second, two or more characters can map into one
565
collating element. For example, the Spanish @samp{ll} collates after
566
@samp{l} and before @samp{m}.
568
Since @sc{posix}'s ``collating element'' preserves the essential idea of
569
a ``character,'' we use the latter, more familiar, term in this document.
571
@node The Backslash Character, , Collating Elements vs. Characters, Regular Expression Syntax
572
@section The Backslash Character
575
The @samp{\} character has one of four different meanings, depending on
576
the context in which you use it and what syntax bits are set
577
(@pxref{Syntax Bits}). It can: 1) stand for itself, 2) quote the next
578
character, 3) introduce an operator, or 4) do nothing.
582
It stands for itself inside a list
583
(@pxref{List Operators}) if the syntax bit
584
@code{RE_BACKSLASH_ESCAPE_IN_LISTS} is not set. For example, @samp{[\]}
585
would match @samp{\}.
588
It quotes (makes ordinary, if it's special) the next character when you
593
outside a list,@footnote{Sometimes
594
you don't have to explicitly quote special characters to make
595
them ordinary. For instance, most characters lose any special meaning
596
inside a list (@pxref{List Operators}). In addition, if the syntax bits
597
@code{RE_CONTEXT_INVALID_OPS} and @code{RE_CONTEXT_INDEP_OPS}
598
aren't set, then (for historical reasons) the matcher considers special
599
characters ordinary if they are in contexts where the operations they
600
represent make no sense; for example, then the match-zero-or-more
601
operator (represented by @samp{*}) matches itself in the regular
602
expression @samp{*foo} because there is no preceding expression on which
603
it can operate. It is poor practice, however, to depend on this
604
behavior; if you want a special character to be ordinary outside a list,
605
it's better to always quote it, regardless.} or
608
inside a list and the syntax bit @code{RE_BACKSLASH_ESCAPE_IN_LISTS} is set.
613
It introduces an operator when followed by certain ordinary
614
characters---sometimes only when certain syntax bits are set. See the
615
cases @code{RE_BK_PLUS_QM}, @code{RE_NO_BK_BRACES}, @code{RE_NO_BK_VAR},
616
@code{RE_NO_BK_PARENS}, @code{RE_NO_BK_REF} in @ref{Syntax Bits}. Also:
620
@samp{\b} represents the match-word-boundary operator
621
(@pxref{Match-word-boundary Operator}).
624
@samp{\B} represents the match-within-word operator
625
(@pxref{Match-within-word Operator}).
628
@samp{\<} represents the match-beginning-of-word operator @*
629
(@pxref{Match-beginning-of-word Operator}).
632
@samp{\>} represents the match-end-of-word operator
633
(@pxref{Match-end-of-word Operator}).
636
@samp{\w} represents the match-word-constituent operator
637
(@pxref{Match-word-constituent Operator}).
640
@samp{\W} represents the match-non-word-constituent operator
641
(@pxref{Match-non-word-constituent Operator}).
644
@samp{\`} represents the match-beginning-of-buffer
645
operator and @samp{\'} represents the match-end-of-buffer operator
646
(@pxref{Buffer Operators}).
649
If Regex was compiled with the C preprocessor symbol @code{emacs}
650
defined, then @samp{\s@var{class}} represents the match-syntactic-class
651
operator and @samp{\S@var{class}} represents the
652
match-not-syntactic-class operator (@pxref{Syntactic Class Operators}).
657
In all other cases, Regex ignores @samp{\}. For example,
658
@samp{\n} matches @samp{n}.
662
@node Common Operators, GNU Operators, Regular Expression Syntax, Top
663
@chapter Common Operators
665
You compose regular expressions from operators. In the following
666
sections, we describe the regular expression operators specified by
667
@sc{posix}; @sc{gnu} also uses these. Most operators have more than one
668
representation as characters. @xref{Regular Expression Syntax}, for
669
what characters represent what operators under what circumstances.
671
For most operators that can be represented in two ways, one
672
representation is a single character and the other is that character
673
preceded by @samp{\}. For example, either @samp{(} or @samp{\(}
674
represents the open-group operator. Which one does depends on the
675
setting of a syntax bit, in this case @code{RE_NO_BK_PARENS}. Why is
676
this so? Historical reasons dictate some of the varying
677
representations, while @sc{posix} dictates others.
679
Finally, almost all characters lose any special meaning inside a list
680
(@pxref{List Operators}).
683
* Match-self Operator:: Ordinary characters.
684
* Match-any-character Operator:: .
685
* Concatenation Operator:: Juxtaposition.
686
* Repetition Operators:: * + ? @{@}
687
* Alternation Operator:: |
688
* List Operators:: [...] [^...]
689
* Grouping Operators:: (...)
690
* Back-reference Operator:: \digit
691
* Anchoring Operators:: ^ $
694
@node Match-self Operator, Match-any-character Operator, , Common Operators
695
@section The Match-self Operator (@var{ordinary character})
697
This operator matches the character itself. All ordinary characters
698
(@pxref{Regular Expression Syntax}) represent this operator. For
699
example, @samp{f} is always an ordinary character, so the regular
700
expression @samp{f} matches only the string @samp{f}. In
701
particular, it does @emph{not} match the string @samp{ff}.
703
@node Match-any-character Operator, Concatenation Operator, Match-self Operator, Common Operators
704
@section The Match-any-character Operator (@code{.})
708
This operator matches any single printing or nonprinting character
709
except it won't match a:
713
if the syntax bit @code{RE_DOT_NEWLINE} isn't set.
716
if the syntax bit @code{RE_DOT_NOT_NULL} is set.
720
The @samp{.} (period) character represents this operator. For example,
721
@samp{a.b} matches any three-character string beginning with @samp{a}
722
and ending with @samp{b}.
724
@node Concatenation Operator, Repetition Operators, Match-any-character Operator, Common Operators
725
@section The Concatenation Operator
727
This operator concatenates two regular expressions @var{a} and @var{b}.
728
No character represents this operator; you simply put @var{b} after
729
@var{a}. The result is a regular expression that will match a string if
730
@var{a} matches its first part and @var{b} matches the rest. For
731
example, @samp{xy} (two match-self operators) matches @samp{xy}.
733
@node Repetition Operators, Alternation Operator, Concatenation Operator, Common Operators
734
@section Repetition Operators
736
Repetition operators repeat the preceding regular expression a specified
740
* Match-zero-or-more Operator:: *
741
* Match-one-or-more Operator:: +
742
* Match-zero-or-one Operator:: ?
743
* Interval Operators:: @{@}
746
@node Match-zero-or-more Operator, Match-one-or-more Operator, , Repetition Operators
747
@subsection The Match-zero-or-more Operator (@code{*})
751
This operator repeats the smallest possible preceding regular expression
752
as many times as necessary (including zero) to match the pattern.
753
@samp{*} represents this operator. For example, @samp{o*}
754
matches any string made up of zero or more @samp{o}s. Since this
755
operator operates on the smallest preceding regular expression,
756
@samp{fo*} has a repeating @samp{o}, not a repeating @samp{fo}. So,
757
@samp{fo*} matches @samp{f}, @samp{fo}, @samp{foo}, and so on.
759
Since the match-zero-or-more operator is a suffix operator, it may be
760
useless as such when no regular expression precedes it. This is the
765
is first in a regular expression, or
768
follows a match-beginning-of-line, open-group, or alternation
774
Three different things can happen in these cases:
778
If the syntax bit @code{RE_CONTEXT_INVALID_OPS} is set, then the
779
regular expression is invalid.
782
If @code{RE_CONTEXT_INVALID_OPS} isn't set, but
783
@code{RE_CONTEXT_INDEP_OPS} is, then @samp{*} represents the
784
match-zero-or-more operator (which then operates on the empty string).
787
Otherwise, @samp{*} is ordinary.
792
The matcher processes a match-zero-or-more operator by first matching as
793
many repetitions of the smallest preceding regular expression as it can.
794
Then it continues to match the rest of the pattern.
796
If it can't match the rest of the pattern, it backtracks (as many times
797
as necessary), each time discarding one of the matches until it can
798
either match the entire pattern or be certain that it cannot get a
799
match. For example, when matching @samp{ca*ar} against @samp{caaar},
800
the matcher first matches all three @samp{a}s of the string with the
801
@samp{a*} of the regular expression. However, it cannot then match the
802
final @samp{ar} of the regular expression against the final @samp{r} of
803
the string. So it backtracks, discarding the match of the last @samp{a}
804
in the string. It can then match the remaining @samp{ar}.
807
@node Match-one-or-more Operator, Match-zero-or-one Operator, Match-zero-or-more Operator, Repetition Operators
808
@subsection The Match-one-or-more Operator (@code{+} or @code{\+})
812
If the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't recognize
813
this operator. Otherwise, if the syntax bit @code{RE_BK_PLUS_QM} isn't
814
set, then @samp{+} represents this operator; if it is, then @samp{\+}
817
This operator is similar to the match-zero-or-more operator except that
818
it repeats the preceding regular expression at least once;
819
@pxref{Match-zero-or-more Operator}, for what it operates on, how some
820
syntax bits affect it, and how Regex backtracks to match it.
822
For example, supposing that @samp{+} represents the match-one-or-more
823
operator; then @samp{ca+r} matches, e.g., @samp{car} and
824
@samp{caaaar}, but not @samp{cr}.
826
@node Match-zero-or-one Operator, Interval Operators, Match-one-or-more Operator, Repetition Operators
827
@subsection The Match-zero-or-one Operator (@code{?} or @code{\?})
830
If the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't
831
recognize this operator. Otherwise, if the syntax bit
832
@code{RE_BK_PLUS_QM} isn't set, then @samp{?} represents this operator;
833
if it is, then @samp{\?} does.
835
This operator is similar to the match-zero-or-more operator except that
836
it repeats the preceding regular expression once or not at all;
837
@pxref{Match-zero-or-more Operator}, to see what it operates on, how
838
some syntax bits affect it, and how Regex backtracks to match it.
840
For example, supposing that @samp{?} represents the match-zero-or-one
841
operator; then @samp{ca?r} matches both @samp{car} and @samp{cr}, but
844
@node Interval Operators, , Match-zero-or-one Operator, Repetition Operators
845
@subsection Interval Operators (@code{@{} @dots{} @code{@}} or @code{\@{} @dots{} @code{\@}})
847
@cindex interval expression
853
If the syntax bit @code{RE_INTERVALS} is set, then Regex recognizes
854
@dfn{interval expressions}. They repeat the smallest possible preceding
855
regular expression a specified number of times.
857
If the syntax bit @code{RE_NO_BK_BRACES} is set, @samp{@{} represents
858
the @dfn{open-interval operator} and @samp{@}} represents the
859
@dfn{close-interval operator} ; otherwise, @samp{\@{} and @samp{\@}} do.
861
Specifically, supposing that @samp{@{} and @samp{@}} represent the
862
open-interval and close-interval operators; then:
865
@item @{@var{count}@}
866
matches exactly @var{count} occurrences of the preceding regular
870
matches @var{min} or more occurrences of the preceding regular
873
@item @{@var{min, max}@}
874
matches at least @var{min} but no more than @var{max} occurrences of
875
the preceding regular expression.
879
The interval expression (but not necessarily the regular expression that
880
contains it) is invalid if:
884
@var{min} is greater than @var{max}, or
887
any of @var{count}, @var{min}, or @var{max} are outside the range
888
zero to @code{RE_DUP_MAX} (which symbol @file{regex.h}
893
If the interval expression is invalid and the syntax bit
894
@code{RE_NO_BK_BRACES} is set, then Regex considers all the
895
characters in the would-be interval to be ordinary. If that bit
896
isn't set, then the regular expression is invalid.
898
If the interval expression is valid but there is no preceding regular
899
expression on which to operate, then if the syntax bit
900
@code{RE_CONTEXT_INVALID_OPS} is set, the regular expression is invalid.
901
If that bit isn't set, then Regex considers all the characters---other
902
than backslashes, which it ignores---in the would-be interval to be
906
@node Alternation Operator, List Operators, Repetition Operators, Common Operators
907
@section The Alternation Operator (@code{|} or @code{\|})
911
@cindex alternation operator
914
If the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't
915
recognize this operator. Otherwise, if the syntax bit
916
@code{RE_NO_BK_VBAR} is set, then @samp{|} represents this operator;
917
otherwise, @samp{\|} does.
919
Alternatives match one of a choice of regular expressions:
920
if you put the character(s) representing the alternation operator between
921
any two regular expressions @var{a} and @var{b}, the result matches
922
the union of the strings that @var{a} and @var{b} match. For
923
example, supposing that @samp{|} is the alternation operator, then
924
@samp{foo|bar|quux} would match any of @samp{foo}, @samp{bar} or
928
@c Nobody needs to disallow empty alternatives any more.
929
If the syntax bit @code{RE_NO_EMPTY_ALTS} is set, then if either of the regular
930
expressions @var{a} or @var{b} is empty, the
931
regular expression is invalid. More precisely, if this syntax bit is
932
set, then the alternation operator can't:
936
be first or last in a regular expression;
939
follow either another alternation operator or an open-group operator
940
(@pxref{Grouping Operators}); or
943
precede a close-group operator.
948
For example, supposing @samp{(} and @samp{)} represent the open and
949
close-group operators, then @samp{|foo}, @samp{foo|}, @samp{foo||bar},
950
@samp{foo(|bar)}, and @samp{(foo|)bar} would all be invalid.
953
The alternation operator operates on the @emph{largest} possible
954
surrounding regular expressions. (Put another way, it has the lowest
955
precedence of any regular expression operator.)
956
Thus, the only way you can
957
delimit its arguments is to use grouping. For example, if @samp{(} and
958
@samp{)} are the open and close-group operators, then @samp{fo(o|b)ar}
959
would match either @samp{fooar} or @samp{fobar}. (@samp{foo|bar} would
960
match @samp{foo} or @samp{bar}.)
963
The matcher usually tries all combinations of alternatives so as to
964
match the longest possible string. For example, when matching
965
@samp{(fooq|foo)*(qbarquux|bar)} against @samp{fooqbarquux}, it cannot
966
take, say, the first (``depth-first'') combination it could match, since
967
then it would be content to match just @samp{fooqbar}.
969
@comment xx something about leftmost-longest
972
@node List Operators, Grouping Operators, Alternation Operator, Common Operators
973
@section List Operators (@code{[} @dots{} @code{]} and @code{[^} @dots{} @code{]})
975
@cindex matching list
982
@cindex nonmatching list
983
@cindex matching newline
984
@cindex bracket expression
986
@dfn{Lists}, also called @dfn{bracket expressions}, are a set of one or
987
more items. An @dfn{item} is a character,
989
(These get added when they get implemented.)
990
a collating symbol, an equivalence class expression,
992
a character class expression, or a range expression. The syntax bits
993
affect which kinds of items you can put in a list. We explain the last
994
two items in subsections below. Empty lists are invalid.
996
A @dfn{matching list} matches a single character represented by one of
997
the list items. You form a matching list by enclosing one or more items
998
within an @dfn{open-matching-list operator} (represented by @samp{[})
999
and a @dfn{close-list operator} (represented by @samp{]}).
1001
For example, @samp{[ab]} matches either @samp{a} or @samp{b}.
1002
@samp{[ad]*} matches the empty string and any string composed of just
1003
@samp{a}s and @samp{d}s in any order. Regex considers invalid a regular
1004
expression with a @samp{[} but no matching
1007
@dfn{Nonmatching lists} are similar to matching lists except that they
1008
match a single character @emph{not} represented by one of the list
1009
items. You use an @dfn{open-nonmatching-list operator} (represented by
1010
@samp{[^}@footnote{Regex therefore doesn't consider the @samp{^} to be
1011
the first character in the list. If you put a @samp{^} character first
1012
in (what you think is) a matching list, you'll turn it into a
1013
nonmatching list.}) instead of an open-matching-list operator to start a
1016
For example, @samp{[^ab]} matches any character except @samp{a} or
1019
If the @code{posix_newline} field in the pattern buffer (@pxref{GNU
1020
Pattern Buffers} is set, then nonmatching lists do not match a newline.
1022
Most characters lose any special meaning inside a list. The special
1023
characters inside a list follow.
1027
ends the list if it's not the first list item. So, if you want to make
1028
the @samp{]} character a list item, you must put it first.
1031
quotes the next character if the syntax bit @code{RE_BACKSLASH_ESCAPE_IN_LISTS} is
1035
Put these in if they get implemented.
1038
represents the open-collating-symbol operator (@pxref{Collating Symbol
1042
represents the close-collating-symbol operator.
1045
represents the open-equivalence-class operator (@pxref{Equivalence Class
1049
represents the close-equivalence-class operator.
1054
represents the open-character-class operator (@pxref{Character Class
1055
Operators}) if the syntax bit @code{RE_CHAR_CLASSES} is set and what
1056
follows is a valid character class expression.
1059
represents the close-character-class operator if the syntax bit
1060
@code{RE_CHAR_CLASSES} is set and what precedes it is an
1061
open-character-class operator followed by a valid character class name.
1064
represents the range operator (@pxref{Range Operator}) if it's
1065
not first or last in a list or the ending point of a range.
1070
All other characters are ordinary. For example, @samp{[.*]} matches
1071
@samp{.} and @samp{*}.
1074
* Character Class Operators:: [:class:]
1075
* Range Operator:: start-end
1079
(If collating symbols and equivalence class expressions get implemented,
1082
node Collating Symbol Operators
1083
subsubsection Collating Symbol Operators (@code{[.} @dots{} @code{.]})
1085
If the syntax bit @code{XX} is set, then you can represent
1086
collating symbols inside lists. You form a @dfn{collating symbol} by
1087
putting a collating element between an @dfn{open-collating-symbol
1088
operator} and an @dfn{close-collating-symbol operator}. @samp{[.}
1089
represents the open-collating-symbol operator and @samp{.]} represents
1090
the close-collating-symbol operator. For example, if @samp{ll} is a
1091
collating element, then @samp{[[.ll.]]} would match @samp{ll}.
1093
node Equivalence Class Operators
1094
subsubsection Equivalence Class Operators (@code{[=} @dots{} @code{=]})
1095
@cindex equivalence class expression in regex
1096
@cindex @samp{[=} in regex
1097
@cindex @samp{=]} in regex
1099
If the syntax bit @code{XX} is set, then Regex recognizes equivalence class
1100
expressions inside lists. A @dfn{equivalence class expression} is a set
1101
of collating elements which all belong to the same equivalence class.
1102
You form an equivalence class expression by putting a collating
1103
element between an @dfn{open-equivalence-class operator} and a
1104
@dfn{close-equivalence-class operator}. @samp{[=} represents the
1105
open-equivalence-class operator and @samp{=]} represents the
1106
close-equivalence-class operator. For example, if @samp{a} and @samp{A}
1107
were an equivalence class, then both @samp{[[=a=]]} and @samp{[[=A=]]}
1108
would match both @samp{a} and @samp{A}. If the collating element in an
1109
equivalence class expression isn't part of an equivalence class, then
1110
the matcher considers the equivalence class expression to be a collating
1115
@node Character Class Operators, Range Operator, , List Operators
1116
@subsection Character Class Operators (@code{[:} @dots{} @code{:]})
1118
@cindex character classes
1119
@cindex @samp{[:} in regex
1120
@cindex @samp{:]} in regex
1122
If the syntax bit @code{RE_CHARACTER_CLASSES} is set, then Regex
1123
recognizes character class expressions inside lists. A @dfn{character
1124
class expression} matches one character from a given class. You form a
1125
character class expression by putting a character class name between an
1126
@dfn{open-character-class operator} (represented by @samp{[:}) and a
1127
@dfn{close-character-class operator} (represented by @samp{:]}). The
1128
character class names and their meanings are:
1139
system-dependent; for @sc{gnu}, a space or tab
1142
control characters (in the @sc{ascii} encoding, code 0177 and codes
1149
same as @code{print} except omits space
1155
printable characters (in the @sc{ascii} encoding, space
1156
tilde---codes 040 through 0176)
1159
neither control nor alphanumeric characters
1162
space, carriage return, newline, vertical tab, and form feed
1168
hexadecimal digits: @code{0}--@code{9}, @code{a}--@code{f}, @code{A}--@code{F}
1173
These correspond to the definitions in the C library's @file{<ctype.h>}
1174
facility. For example, @samp{[:alpha:]} corresponds to the standard
1175
facility @code{isalpha}. Regex recognizes character class expressions
1176
only inside of lists; so @samp{[[:alpha:]]} matches any letter, but
1177
@samp{[:alpha:]} outside of a bracket expression and not followed by a
1178
repetition operator matches just itself.
1180
@node Range Operator, , Character Class Operators, List Operators
1181
@subsection The Range Operator (@code{-})
1183
Regex recognizes @dfn{range expressions} inside a list. They represent
1185
that fall between two elements in the current collating sequence. You
1186
form a range expression by putting a @dfn{range operator} between two
1188
(If these get implemented, then substitute this for ``characters.'')
1189
of any of the following: characters, collating elements, collating symbols,
1190
and equivalence class expressions. The starting point of the range and
1191
the ending point of the range don't have to be the same kind of item,
1192
e.g., the starting point could be a collating element and the ending
1193
point could be an equivalence class expression. If a range's ending
1194
point is an equivalence class, then all the collating elements in that
1195
class will be in the range.
1197
characters.@footnote{You can't use a character class for the starting
1198
or ending point of a range, since a character class is not a single
1199
character.} @samp{-} represents the range operator. For example,
1200
@samp{a-f} within a list represents all the characters from @samp{a}
1204
If the syntax bit @code{RE_NO_EMPTY_RANGES} is set, then if the range's
1205
ending point collates less than its starting point, the range (and the
1206
regular expression containing it) is invalid. For example, the regular
1207
expression @samp{[z-a]} would be invalid. If this bit isn't set, then
1208
Regex considers such a range to be empty.
1210
Since @samp{-} represents the range operator, if you want to make a
1211
@samp{-} character itself
1212
a list item, you must do one of the following:
1216
Put the @samp{-} either first or last in the list.
1219
Include a range whose starting point collates strictly lower than
1220
@samp{-} and whose ending point collates equal or higher. Unless a
1221
range is the first item in a list, a @samp{-} can't be its starting
1222
point, but @emph{can} be its ending point. That is because Regex
1223
considers @samp{-} to be the range operator unless it is preceded by
1224
another @samp{-}. For example, in the @sc{ascii} encoding, @samp{)},
1225
@samp{*}, @samp{+}, @samp{,}, @samp{-}, @samp{.}, and @samp{/} are
1226
contiguous characters in the collating sequence. You might think that
1227
@samp{[)-+--/]} has two ranges: @samp{)-+} and @samp{--/}. Rather, it
1228
has the ranges @samp{)-+} and @samp{+--}, plus the character @samp{/}, so
1229
it matches, e.g., @samp{,}, not @samp{.}.
1232
Put a range whose starting point is @samp{-} first in the list.
1236
For example, @samp{[-a-z]} matches a lowercase letter or a hyphen (in
1237
English, in @sc{ascii}).
1240
@node Grouping Operators, Back-reference Operator, List Operators, Common Operators
1241
@section Grouping Operators (@code{(} @dots{} @code{)} or @code{\(} @dots{} @code{\)})
1248
@cindex subexpressions
1249
@cindex parenthesizing
1251
A @dfn{group}, also known as a @dfn{subexpression}, consists of an
1252
@dfn{open-group operator}, any number of other operators, and a
1253
@dfn{close-group operator}. Regex treats this sequence as a unit, just
1254
as mathematics and programming languages treat a parenthesized
1255
expression as a unit.
1257
Therefore, using @dfn{groups}, you can:
1261
delimit the argument(s) to an alternation operator (@pxref{Alternation
1262
Operator}) or a repetition operator (@pxref{Repetition
1266
keep track of the indices of the substring that matched a given group.
1267
@xref{Using Registers}, for a precise explanation.
1272
use the back-reference operator (@pxref{Back-reference Operator}).
1275
use registers (@pxref{Using Registers}).
1281
If the syntax bit @code{RE_NO_BK_PARENS} is set, then @samp{(} represents
1282
the open-group operator and @samp{)} represents the
1283
close-group operator; otherwise, @samp{\(} and @samp{\)} do.
1285
If the syntax bit @code{RE_UNMATCHED_RIGHT_PAREN_ORD} is set and a
1286
close-group operator has no matching open-group operator, then Regex
1287
considers it to match @samp{)}.
1290
@node Back-reference Operator, Anchoring Operators, Grouping Operators, Common Operators
1291
@section The Back-reference Operator (@dfn{\}@var{digit})
1293
@cindex back references
1295
If the syntax bit @code{RE_NO_BK_REF} isn't set, then Regex recognizes
1296
back references. A back reference matches a specified preceding group.
1297
The back reference operator is represented by @samp{\@var{digit}}
1298
anywhere after the end of a regular expression's @w{@var{digit}-th}
1299
group (@pxref{Grouping Operators}).
1301
@var{digit} must be between @samp{1} and @samp{9}. The matcher assigns
1302
numbers 1 through 9 to the first nine groups it encounters. By using
1303
one of @samp{\1} through @samp{\9} after the corresponding group's
1304
close-group operator, you can match a substring identical to the
1305
one that the group does.
1307
Back references match according to the following (in all examples below,
1308
@samp{(} represents the open-group, @samp{)} the close-group, @samp{@{}
1309
the open-interval and @samp{@}} the close-interval operator):
1313
If the group matches a substring, the back reference matches an
1314
identical substring. For example, @samp{(a)\1} matches @samp{aa} and
1315
@samp{(bana)na\1bo\1} matches @samp{bananabanabobana}. Likewise,
1316
@samp{(.*)\1} matches any (newline-free if the syntax bit
1317
@code{RE_DOT_NEWLINE} isn't set) string that is composed of two
1318
identical halves; the @samp{(.*)} matches the first half and the
1319
@samp{\1} matches the second half.
1322
If the group matches more than once (as it might if followed
1323
by, e.g., a repetition operator), then the back reference matches the
1324
substring the group @emph{last} matched. For example,
1325
@samp{((a*)b)*\1\2} matches @samp{aabababa}; first @w{group 1} (the
1326
outer one) matches @samp{aab} and @w{group 2} (the inner one) matches
1327
@samp{aa}. Then @w{group 1} matches @samp{ab} and @w{group 2} matches
1328
@samp{a}. So, @samp{\1} matches @samp{ab} and @samp{\2} matches
1332
If the group doesn't participate in a match, i.e., it is part of an
1333
alternative not taken or a repetition operator allows zero repetitions
1334
of it, then the back reference makes the whole match fail. For example,
1335
@samp{(one()|two())-and-(three\2|four\3)} matches @samp{one-and-three}
1336
and @samp{two-and-four}, but not @samp{one-and-four} or
1337
@samp{two-and-three}. For example, if the pattern matches
1338
@samp{one-and-}, then its @w{group 2} matches the empty string and its
1339
@w{group 3} doesn't participate in the match. So, if it then matches
1340
@samp{four}, then when it tries to back reference @w{group 3}---which it
1341
will attempt to do because @samp{\3} follows the @samp{four}---the match
1342
will fail because @w{group 3} didn't participate in the match.
1346
You can use a back reference as an argument to a repetition operator. For
1347
example, @samp{(a(b))\2*} matches @samp{a} followed by two or more
1348
@samp{b}s. Similarly, @samp{(a(b))\2@{3@}} matches @samp{abbbb}.
1350
If there is no preceding @w{@var{digit}-th} subexpression, the regular
1351
expression is invalid.
1354
@node Anchoring Operators, , Back-reference Operator, Common Operators
1355
@section Anchoring Operators
1358
@cindex regexp anchoring
1360
These operators can constrain a pattern to match only at the beginning or
1361
end of the entire string or at the beginning or end of a line.
1364
* Match-beginning-of-line Operator:: ^
1365
* Match-end-of-line Operator:: $
1369
@node Match-beginning-of-line Operator, Match-end-of-line Operator, , Anchoring Operators
1370
@subsection The Match-beginning-of-line Operator (@code{^})
1373
@cindex beginning-of-line operator
1376
This operator can match the empty string either at the beginning of the
1377
string or after a newline character. Thus, it is said to @dfn{anchor}
1378
the pattern to the beginning of a line.
1380
In the cases following, @samp{^} represents this operator. (Otherwise,
1381
@samp{^} is ordinary.)
1386
It (the @samp{^}) is first in the pattern, as in @samp{^foo}.
1388
@cnindex RE_CONTEXT_INDEP_ANCHORS @r{(and @samp{^})}
1390
The syntax bit @code{RE_CONTEXT_INDEP_ANCHORS} is set, and it is outside
1391
a bracket expression.
1393
@cindex open-group operator and @samp{^}
1394
@cindex alternation operator and @samp{^}
1396
It follows an open-group or alternation operator, as in @samp{a\(^b\)}
1397
and @samp{a\|^b}. @xref{Grouping Operators}, and @ref{Alternation
1402
These rules imply that some valid patterns containing @samp{^} cannot be
1403
matched; for example, @samp{foo^bar} if @code{RE_CONTEXT_INDEP_ANCHORS}
1406
@vindex not_bol @r{field in pattern buffer}
1407
If the @code{not_bol} field is set in the pattern buffer (@pxref{GNU
1408
Pattern Buffers}), then @samp{^} fails to match at the beginning of the
1409
string. @xref{POSIX Matching}, for when you might find this useful.
1411
@vindex newline_anchor @r{field in pattern buffer}
1412
If the @code{newline_anchor} field is set in the pattern buffer, then
1413
@samp{^} fails to match after a newline. This is useful when you do not
1414
regard the string to be matched as broken into lines.
1417
@node Match-end-of-line Operator, , Match-beginning-of-line Operator, Anchoring Operators
1418
@subsection The Match-end-of-line Operator (@code{$})
1421
@cindex end-of-line operator
1424
This operator can match the empty string either at the end of
1425
the string or before a newline character in the string. Thus, it is
1426
said to @dfn{anchor} the pattern to the end of a line.
1428
It is always represented by @samp{$}. For example, @samp{foo$} usually
1429
matches, e.g., @samp{foo} and, e.g., the first three characters of
1432
Its interaction with the syntax bits and pattern buffer fields is
1433
exactly the dual of @samp{^}'s; see the previous section. (That is,
1434
``beginning'' becomes ``end'', ``next'' becomes ``previous'', and
1435
``after'' becomes ``before''.)
1438
@node GNU Operators, GNU Emacs Operators, Common Operators, Top
1439
@chapter GNU Operators
1441
Following are operators that @sc{gnu} defines (and @sc{posix} doesn't).
1445
* Buffer Operators::
1448
@node Word Operators, Buffer Operators, , GNU Operators
1449
@section Word Operators
1451
The operators in this section require Regex to recognize parts of words.
1452
Regex uses a syntax table to determine whether or not a character is
1453
part of a word, i.e., whether or not it is @dfn{word-constituent}.
1456
* Non-Emacs Syntax Tables::
1457
* Match-word-boundary Operator:: \b
1458
* Match-within-word Operator:: \B
1459
* Match-beginning-of-word Operator:: \<
1460
* Match-end-of-word Operator:: \>
1461
* Match-word-constituent Operator:: \w
1462
* Match-non-word-constituent Operator:: \W
1465
@node Non-Emacs Syntax Tables, Match-word-boundary Operator, , Word Operators
1466
@subsection Non-Emacs Syntax Tables
1468
A @dfn{syntax table} is an array indexed by the characters in your
1469
character set. In the @sc{ascii} encoding, therefore, a syntax table
1470
has 256 elements. Regex always uses a @code{char *} variable
1471
@code{re_syntax_table} as its syntax table. In some cases, it
1472
initializes this variable and in others it expects you to initialize it.
1476
If Regex is compiled with the preprocessor symbols @code{emacs} and
1477
@code{SYNTAX_TABLE} both undefined, then Regex allocates
1478
@code{re_syntax_table} and initializes an element @var{i} either to
1479
@code{Sword} (which it defines) if @var{i} is a letter, number, or
1480
@samp{_}, or to zero if it's not.
1483
If Regex is compiled with @code{emacs} undefined but @code{SYNTAX_TABLE}
1484
defined, then Regex expects you to define a @code{char *} variable
1485
@code{re_syntax_table} to be a valid syntax table.
1488
@xref{Emacs Syntax Tables}, for what happens when Regex is compiled with
1489
the preprocessor symbol @code{emacs} defined.
1493
@node Match-word-boundary Operator, Match-within-word Operator, Non-Emacs Syntax Tables, Word Operators
1494
@subsection The Match-word-boundary Operator (@code{\b})
1497
@cindex word boundaries, matching
1499
This operator (represented by @samp{\b}) matches the empty string at
1500
either the beginning or the end of a word. For example, @samp{\brat\b}
1501
matches the separate word @samp{rat}.
1503
@node Match-within-word Operator, Match-beginning-of-word Operator, Match-word-boundary Operator, Word Operators
1504
@subsection The Match-within-word Operator (@code{\B})
1508
This operator (represented by @samp{\B}) matches the empty string within
1509
a word. For example, @samp{c\Brat\Be} matches @samp{crate}, but
1510
@samp{dirty \Brat} doesn't match @samp{dirty rat}.
1512
@node Match-beginning-of-word Operator, Match-end-of-word Operator, Match-within-word Operator, Word Operators
1513
@subsection The Match-beginning-of-word Operator (@code{\<})
1517
This operator (represented by @samp{\<}) matches the empty string at the
1518
beginning of a word.
1520
@node Match-end-of-word Operator, Match-word-constituent Operator, Match-beginning-of-word Operator, Word Operators
1521
@subsection The Match-end-of-word Operator (@code{\>})
1525
This operator (represented by @samp{\>}) matches the empty string at the
1528
@node Match-word-constituent Operator, Match-non-word-constituent Operator, Match-end-of-word Operator, Word Operators
1529
@subsection The Match-word-constituent Operator (@code{\w})
1533
This operator (represented by @samp{\w}) matches any word-constituent
1536
@node Match-non-word-constituent Operator, , Match-word-constituent Operator, Word Operators
1537
@subsection The Match-non-word-constituent Operator (@code{\W})
1541
This operator (represented by @samp{\W}) matches any character that is
1542
not word-constituent.
1545
@node Buffer Operators, , Word Operators, GNU Operators
1546
@section Buffer Operators
1548
Following are operators which work on buffers. In Emacs, a @dfn{buffer}
1549
is, naturally, an Emacs buffer. For other programs, Regex considers the
1550
entire string to be matched as the buffer.
1553
* Match-beginning-of-buffer Operator:: \`
1554
* Match-end-of-buffer Operator:: \'
1558
@node Match-beginning-of-buffer Operator, Match-end-of-buffer Operator, , Buffer Operators
1559
@subsection The Match-beginning-of-buffer Operator (@code{\`})
1563
This operator (represented by @samp{\`}) matches the empty string at the
1564
beginning of the buffer.
1566
@node Match-end-of-buffer Operator, , Match-beginning-of-buffer Operator, Buffer Operators
1567
@subsection The Match-end-of-buffer Operator (@code{\'})
1571
This operator (represented by @samp{\'}) matches the empty string at the
1575
@node GNU Emacs Operators, What Gets Matched?, GNU Operators, Top
1576
@chapter GNU Emacs Operators
1578
Following are operators that @sc{gnu} defines (and @sc{posix} doesn't)
1579
that you can use only when Regex is compiled with the preprocessor
1580
symbol @code{emacs} defined.
1583
* Syntactic Class Operators::
1587
@node Syntactic Class Operators, , , GNU Emacs Operators
1588
@section Syntactic Class Operators
1590
The operators in this section require Regex to recognize the syntactic
1591
classes of characters. Regex uses a syntax table to determine this.
1594
* Emacs Syntax Tables::
1595
* Match-syntactic-class Operator:: \sCLASS
1596
* Match-not-syntactic-class Operator:: \SCLASS
1599
@node Emacs Syntax Tables, Match-syntactic-class Operator, , Syntactic Class Operators
1600
@subsection Emacs Syntax Tables
1602
A @dfn{syntax table} is an array indexed by the characters in your
1603
character set. In the @sc{ascii} encoding, therefore, a syntax table
1606
If Regex is compiled with the preprocessor symbol @code{emacs} defined,
1607
then Regex expects you to define and initialize the variable
1608
@code{re_syntax_table} to be an Emacs syntax table. Emacs' syntax
1609
tables are more complicated than Regex's own (@pxref{Non-Emacs Syntax
1610
Tables}). @xref{Syntax, , Syntax, emacs, The GNU Emacs User's Manual},
1611
for a description of Emacs' syntax tables.
1613
@node Match-syntactic-class Operator, Match-not-syntactic-class Operator, Emacs Syntax Tables, Syntactic Class Operators
1614
@subsection The Match-syntactic-class Operator (@code{\s}@var{class})
1618
This operator matches any character whose syntactic class is represented
1619
by a specified character. @samp{\s@var{class}} represents this operator
1620
where @var{class} is the character representing the syntactic class you
1621
want. For example, @samp{w} represents the syntactic
1622
class of word-constituent characters, so @samp{\sw} matches any
1623
word-constituent character.
1625
@node Match-not-syntactic-class Operator, , Match-syntactic-class Operator, Syntactic Class Operators
1626
@subsection The Match-not-syntactic-class Operator (@code{\S}@var{class})
1630
This operator is similar to the match-syntactic-class operator except
1631
that it matches any character whose syntactic class is @emph{not}
1632
represented by the specified character. @samp{\S@var{class}} represents
1633
this operator. For example, @samp{w} represents the syntactic class of
1634
word-constituent characters, so @samp{\Sw} matches any character that is
1635
not word-constituent.
1638
@node What Gets Matched?, Programming with Regex, GNU Emacs Operators, Top
1639
@chapter What Gets Matched?
1641
Regex usually matches strings according to the ``leftmost longest''
1642
rule; that is, it chooses the longest of the leftmost matches. This
1643
does not mean that for a regular expression containing subexpressions
1644
that it simply chooses the longest match for each subexpression, left to
1645
right; the overall match must also be the longest possible one.
1647
For example, @samp{(ac*)(c*d[ac]*)\1} matches @samp{acdacaaa}, not
1648
@samp{acdac}, as it would if it were to choose the longest match for the
1649
first subexpression.
1652
@node Programming with Regex, Copying, What Gets Matched?, Top
1653
@chapter Programming with Regex
1655
Here we describe how you use the Regex data structures and functions in
1656
C programs. Regex has three interfaces: one designed for @sc{gnu}, one
1657
compatible with @sc{posix} and one compatible with Berkeley @sc{unix}.
1660
* GNU Regex Functions::
1661
* POSIX Regex Functions::
1662
* BSD Regex Functions::
1666
@node GNU Regex Functions, POSIX Regex Functions, , Programming with Regex
1667
@section GNU Regex Functions
1669
If you're writing code that doesn't need to be compatible with either
1670
@sc{posix} or Berkeley @sc{unix}, you can use these functions. They
1671
provide more options than the other interfaces.
1674
* GNU Pattern Buffers:: The re_pattern_buffer type.
1675
* GNU Regular Expression Compiling:: re_compile_pattern ()
1676
* GNU Matching:: re_match ()
1677
* GNU Searching:: re_search ()
1678
* Matching/Searching with Split Data:: re_match_2 (), re_search_2 ()
1679
* Searching with Fastmaps:: re_compile_fastmap ()
1680
* GNU Translate Tables:: The `translate' field.
1681
* Using Registers:: The re_registers type and related fns.
1682
* Freeing GNU Pattern Buffers:: regfree ()
1686
@node GNU Pattern Buffers, GNU Regular Expression Compiling, , GNU Regex Functions
1687
@subsection GNU Pattern Buffers
1689
@cindex pattern buffer, definition of
1690
@tindex re_pattern_buffer @r{definition}
1691
@tindex struct re_pattern_buffer @r{definition}
1693
To compile, match, or search for a given regular expression, you must
1694
supply a pattern buffer. A @dfn{pattern buffer} holds one compiled
1695
regular expression.@footnote{Regular expressions are also referred to as
1696
``patterns,'' hence the name ``pattern buffer.''}
1698
You can have several different pattern buffers simultaneously, each
1699
holding a compiled pattern for a different regular expression.
1701
@file{regex.h} defines the pattern buffer @code{struct} as follows:
1704
/* Space that holds the compiled pattern. It is declared as
1705
`unsigned char *' because its elements are
1706
sometimes used as array indexes. */
1707
unsigned char *buffer;
1709
/* Number of bytes to which `buffer' points. */
1710
unsigned long allocated;
1712
/* Number of bytes actually used in `buffer'. */
1715
/* Syntax setting with which the pattern was compiled. */
1716
reg_syntax_t syntax;
1718
/* Pointer to a fastmap, if any, otherwise zero. re_search uses
1719
the fastmap, if there is one, to skip over impossible
1720
starting points for matches. */
1723
/* Either a translate table to apply to all characters before
1724
comparing them, or zero for no translation. The translation
1725
is applied to a pattern when it is compiled and to a string
1726
when it is matched. */
1729
/* Number of subexpressions found by the compiler. */
1732
/* Zero if this pattern cannot match the empty string, one else.
1733
Well, in truth it's used only in `re_search_2', to see
1734
whether or not we should use the fastmap, so we don't set
1735
this absolutely perfectly; see `re_compile_fastmap' (the
1736
`duplicate' case). */
1737
unsigned can_be_null : 1;
1739
/* If REGS_UNALLOCATED, allocate space in the `regs' structure
1740
for `max (RE_NREGS, re_nsub + 1)' groups.
1741
If REGS_REALLOCATE, reallocate space if necessary.
1742
If REGS_FIXED, use what's there. */
1743
#define REGS_UNALLOCATED 0
1744
#define REGS_REALLOCATE 1
1745
#define REGS_FIXED 2
1746
unsigned regs_allocated : 2;
1748
/* Set to zero when `regex_compile' compiles a pattern; set to one
1749
by `re_compile_fastmap' if it updates the fastmap. */
1750
unsigned fastmap_accurate : 1;
1752
/* If set, `re_match_2' does not return information about
1754
unsigned no_sub : 1;
1756
/* If set, a beginning-of-line anchor doesn't match at the
1757
beginning of the string. */
1758
unsigned not_bol : 1;
1760
/* Similarly for an end-of-line anchor. */
1761
unsigned not_eol : 1;
1763
/* If true, an anchor at a newline matches. */
1764
unsigned newline_anchor : 1;
1769
@node GNU Regular Expression Compiling, GNU Matching, GNU Pattern Buffers, GNU Regex Functions
1770
@subsection GNU Regular Expression Compiling
1772
In @sc{gnu}, you can both match and search for a given regular
1773
expression. To do either, you must first compile it in a pattern buffer
1774
(@pxref{GNU Pattern Buffers}).
1776
@cindex syntax initialization
1777
@vindex re_syntax_options @r{initialization}
1778
Regular expressions match according to the syntax with which they were
1779
compiled; with @sc{gnu}, you indicate what syntax you want by setting
1780
the variable @code{re_syntax_options} (declared in @file{regex.h} and
1781
defined in @file{regex.c}) before calling the compiling function,
1782
@code{re_compile_pattern} (see below). @xref{Syntax Bits}, and
1783
@ref{Predefined Syntaxes}.
1785
You can change the value of @code{re_syntax_options} at any time.
1786
Usually, however, you set its value once and then never change it.
1788
@cindex pattern buffer initialization
1789
@code{re_compile_pattern} takes a pattern buffer as an argument. You
1790
must initialize the following fields:
1794
@item translate @r{initialization}
1797
@vindex translate @r{initialization}
1798
Initialize this to point to a translate table if you want one, or to
1799
zero if you don't. We explain translate tables in @ref{GNU Translate
1803
@vindex fastmap @r{initialization}
1804
Initialize this to nonzero if you want a fastmap, or to zero if you
1809
@vindex buffer @r{initialization}
1810
@vindex allocated @r{initialization}
1812
If you want @code{re_compile_pattern} to allocate memory for the
1813
compiled pattern, set both of these to zero. If you have an existing
1814
block of memory (allocated with @code{malloc}) you want Regex to use,
1815
set @code{buffer} to its address and @code{allocated} to its size (in
1818
@code{re_compile_pattern} uses @code{realloc} to extend the space for
1819
the compiled pattern as necessary.
1823
To compile a pattern buffer, use:
1825
@findex re_compile_pattern
1828
re_compile_pattern (const char *@var{regex}, const int @var{regex_size},
1829
struct re_pattern_buffer *@var{pattern_buffer})
1833
@var{regex} is the regular expression's address, @var{regex_size} is its
1834
length, and @var{pattern_buffer} is the pattern buffer's address.
1836
If @code{re_compile_pattern} successfully compiles the regular
1837
expression, it returns zero and sets @code{*@var{pattern_buffer}} to the
1838
compiled pattern. It sets the pattern buffer's fields as follows:
1842
@vindex buffer @r{field, set by @code{re_compile_pattern}}
1843
to the compiled pattern.
1846
@vindex used @r{field, set by @code{re_compile_pattern}}
1847
to the number of bytes the compiled pattern in @code{buffer} occupies.
1850
@vindex syntax @r{field, set by @code{re_compile_pattern}}
1851
to the current value of @code{re_syntax_options}.
1854
@vindex re_nsub @r{field, set by @code{re_compile_pattern}}
1855
to the number of subexpressions in @var{regex}.
1857
@item fastmap_accurate
1858
@vindex fastmap_accurate @r{field, set by @code{re_compile_pattern}}
1859
to zero on the theory that the pattern you're compiling is different
1860
than the one previously compiled into @code{buffer}; in that case (since
1861
you can't make a fastmap without a compiled pattern),
1862
@code{fastmap} would either contain an incompatible fastmap, or nothing
1868
If @code{re_compile_pattern} can't compile @var{regex}, it returns an
1869
error string corresponding to one of the errors listed in @ref{POSIX
1870
Regular Expression Compiling}.
1873
@node GNU Matching, GNU Searching, GNU Regular Expression Compiling, GNU Regex Functions
1874
@subsection GNU Matching
1876
@cindex matching with GNU functions
1878
Matching the @sc{gnu} way means trying to match as much of a string as
1879
possible starting at a position within it you specify. Once you've compiled
1880
a pattern into a pattern buffer (@pxref{GNU Regular Expression
1881
Compiling}), you can ask the matcher to match that pattern against a
1887
re_match (struct re_pattern_buffer *@var{pattern_buffer},
1888
const char *@var{string}, const int @var{size},
1889
const int @var{start}, struct re_registers *@var{regs})
1893
@var{pattern_buffer} is the address of a pattern buffer containing a
1894
compiled pattern. @var{string} is the string you want to match; it can
1895
contain newline and null characters. @var{size} is the length of that
1896
string. @var{start} is the string index at which you want to
1897
begin matching; the first character of @var{string} is at index zero.
1898
@xref{Using Registers}, for a explanation of @var{regs}; you can safely
1901
@code{re_match} matches the regular expression in @var{pattern_buffer}
1902
against the string @var{string} according to the syntax in
1903
@var{pattern_buffers}'s @code{syntax} field. (@xref{GNU Regular
1904
Expression Compiling}, for how to set it.) The function returns
1905
@math{-1} if the compiled pattern does not match any part of
1906
@var{string} and @math{-2} if an internal error happens; otherwise, it
1907
returns how many (possibly zero) characters of @var{string} the pattern
1910
An example: suppose @var{pattern_buffer} points to a pattern buffer
1911
containing the compiled pattern for @samp{a*}, and @var{string} points
1912
to @samp{aaaaab} (whereupon @var{size} should be 6). Then if @var{start}
1913
is 2, @code{re_match} returns 3, i.e., @samp{a*} would have matched the
1914
last three @samp{a}s in @var{string}. If @var{start} is 0,
1915
@code{re_match} returns 5, i.e., @samp{a*} would have matched all the
1916
@samp{a}s in @var{string}. If @var{start} is either 5 or 6, it returns
1919
If @var{start} is not between zero and @var{size}, then
1920
@code{re_match} returns @math{-1}.
1923
@node GNU Searching, Matching/Searching with Split Data, GNU Matching, GNU Regex Functions
1924
@subsection GNU Searching
1926
@cindex searching with GNU functions
1928
@dfn{Searching} means trying to match starting at successive positions
1929
within a string. The function @code{re_search} does this.
1931
Before calling @code{re_search}, you must compile your regular
1932
expression. @xref{GNU Regular Expression Compiling}.
1934
Here is the function declaration:
1939
re_search (struct re_pattern_buffer *@var{pattern_buffer},
1940
const char *@var{string}, const int @var{size},
1941
const int @var{start}, const int @var{range},
1942
struct re_registers *@var{regs})
1946
@vindex start @r{argument to @code{re_search}}
1947
@vindex range @r{argument to @code{re_search}}
1948
whose arguments are the same as those to @code{re_match} (@pxref{GNU
1949
Matching}) except that the two arguments @var{start} and @var{range}
1950
replace @code{re_match}'s argument @var{start}.
1952
If @var{range} is positive, then @code{re_search} attempts a match
1953
starting first at index @var{start}, then at @math{@var{start} + 1} if
1954
that fails, and so on, up to @math{@var{start} + @var{range}}; if
1955
@var{range} is negative, then it attempts a match starting first at
1956
index @var{start}, then at @math{@var{start} -1} if that fails, and so
1959
If @var{start} is not between zero and @var{size}, then @code{re_search}
1960
returns @math{-1}. When @var{range} is positive, @code{re_search}
1961
adjusts @var{range} so that @math{@var{start} + @var{range} - 1} is
1962
between zero and @var{size}, if necessary; that way it won't search
1963
outside of @var{string}. Similarly, when @var{range} is negative,
1964
@code{re_search} adjusts @var{range} so that @math{@var{start} +
1965
@var{range} + 1} is between zero and @var{size}, if necessary.
1967
If the @code{fastmap} field of @var{pattern_buffer} is zero,
1968
@code{re_search} matches starting at consecutive positions; otherwise,
1969
it uses @code{fastmap} to make the search more efficient.
1970
@xref{Searching with Fastmaps}.
1972
If no match is found, @code{re_search} returns @math{-1}. If
1973
a match is found, it returns the index where the match began. If an
1974
internal error happens, it returns @math{-2}.
1977
@node Matching/Searching with Split Data, Searching with Fastmaps, GNU Searching, GNU Regex Functions
1978
@subsection Matching and Searching with Split Data
1980
Using the functions @code{re_match_2} and @code{re_search_2}, you can
1981
match or search in data that is divided into two strings.
1988
re_match_2 (struct re_pattern_buffer *@var{buffer},
1989
const char *@var{string1}, const int @var{size1},
1990
const char *@var{string2}, const int @var{size2},
1991
const int @var{start},
1992
struct re_registers *@var{regs},
1993
const int @var{stop})
1997
is similar to @code{re_match} (@pxref{GNU Matching}) except that you
1998
pass @emph{two} data strings and sizes, and an index @var{stop} beyond
1999
which you don't want the matcher to try matching. As with
2000
@code{re_match}, if it succeeds, @code{re_match_2} returns how many
2001
characters of @var{string} it matched. Regard @var{string1} and
2002
@var{string2} as concatenated when you set the arguments @var{start} and
2003
@var{stop} and use the contents of @var{regs}; @code{re_match_2} never
2004
returns a value larger than @math{@var{size1} + @var{size2}}.
2011
re_search_2 (struct re_pattern_buffer *@var{buffer},
2012
const char *@var{string1}, const int @var{size1},
2013
const char *@var{string2}, const int @var{size2},
2014
const int @var{start}, const int @var{range},
2015
struct re_registers *@var{regs},
2016
const int @var{stop})
2020
is similarly related to @code{re_search}.
2023
@node Searching with Fastmaps, GNU Translate Tables, Matching/Searching with Split Data, GNU Regex Functions
2024
@subsection Searching with Fastmaps
2027
If you're searching through a long string, you should use a fastmap.
2028
Without one, the searcher tries to match at consecutive positions in the
2029
string. Generally, most of the characters in the string could not start
2030
a match. It takes much longer to try matching at a given position in the
2031
string than it does to check in a table whether or not the character at
2032
that position could start a match. A @dfn{fastmap} is such a table.
2034
More specifically, a fastmap is an array indexed by the characters in
2035
your character set. Under the @sc{ascii} encoding, therefore, a fastmap
2036
has 256 elements. If you want the searcher to use a fastmap with a
2037
given pattern buffer, you must allocate the array and assign the array's
2038
address to the pattern buffer's @code{fastmap} field. You either can
2039
compile the fastmap yourself or have @code{re_search} do it for you;
2040
when @code{fastmap} is nonzero, it automatically compiles a fastmap the
2041
first time you search using a particular compiled pattern.
2043
To compile a fastmap yourself, use:
2045
@findex re_compile_fastmap
2048
re_compile_fastmap (struct re_pattern_buffer *@var{pattern_buffer})
2052
@var{pattern_buffer} is the address of a pattern buffer. If the
2053
character @var{c} could start a match for the pattern,
2054
@code{re_compile_fastmap} makes
2055
@code{@var{pattern_buffer}->fastmap[@var{c}]} nonzero. It returns
2056
@math{0} if it can compile a fastmap and @math{-2} if there is an
2057
internal error. For example, if @samp{|} is the alternation operator
2058
and @var{pattern_buffer} holds the compiled pattern for @samp{a|b}, then
2059
@code{re_compile_fastmap} sets @code{fastmap['a']} and
2060
@code{fastmap['b']} (and no others).
2062
@code{re_search} uses a fastmap as it moves along in the string: it
2063
checks the string's characters until it finds one that's in the fastmap.
2064
Then it tries matching at that character. If the match fails, it
2065
repeats the process. So, by using a fastmap, @code{re_search} doesn't
2066
waste time trying to match at positions in the string that couldn't
2069
If you don't want @code{re_search} to use a fastmap,
2070
store zero in the @code{fastmap} field of the pattern buffer before
2071
calling @code{re_search}.
2073
Once you've initialized a pattern buffer's @code{fastmap} field, you
2074
need never do so again---even if you compile a new pattern in
2075
it---provided the way the field is set still reflects whether or not you
2076
want a fastmap. @code{re_search} will still either do nothing if
2077
@code{fastmap} is null or, if it isn't, compile a new fastmap for the
2080
@node GNU Translate Tables, Using Registers, Searching with Fastmaps, GNU Regex Functions
2081
@subsection GNU Translate Tables
2083
If you set the @code{translate} field of a pattern buffer to a translate
2084
table, then the @sc{gnu} Regex functions to which you've passed that
2085
pattern buffer use it to apply a simple transformation
2086
to all the regular expression and string characters at which they look.
2088
A @dfn{translate table} is an array indexed by the characters in your
2089
character set. Under the @sc{ascii} encoding, therefore, a translate
2090
table has 256 elements. The array's elements are also characters in
2091
your character set. When the Regex functions see a character @var{c},
2092
they use @code{translate[@var{c}]} in its place, with one exception: the
2093
character after a @samp{\} is not translated. (This ensures that, the
2094
operators, e.g., @samp{\B} and @samp{\b}, are always distinguishable.)
2096
For example, a table that maps all lowercase letters to the
2097
corresponding uppercase ones would cause the matcher to ignore
2098
differences in case.@footnote{A table that maps all uppercase letters to
2099
the corresponding lowercase ones would work just as well for this
2100
purpose.} Such a table would map all characters except lowercase letters
2101
to themselves, and lowercase letters to the corresponding uppercase
2102
ones. Under the @sc{ascii} encoding, here's how you could initialize
2103
such a table (we'll call it @code{case_fold}):
2106
for (i = 0; i < 256; i++)
2108
for (i = 'a'; i <= 'z'; i++)
2109
case_fold[i] = i - ('a' - 'A');
2112
You tell Regex to use a translate table on a given pattern buffer by
2113
assigning that table's address to the @code{translate} field of that
2114
buffer. If you don't want Regex to do any translation, put zero into
2115
this field. You'll get weird results if you change the table's contents
2116
anytime between compiling the pattern buffer, compiling its fastmap, and
2117
matching or searching with the pattern buffer.
2119
@node Using Registers, Freeing GNU Pattern Buffers, GNU Translate Tables, GNU Regex Functions
2120
@subsection Using Registers
2122
A group in a regular expression can match a (posssibly empty) substring
2123
of the string that regular expression as a whole matched. The matcher
2124
remembers the beginning and end of the substring matched by
2127
To find out what they matched, pass a nonzero @var{regs} argument to a
2128
@sc{gnu} matching or searching function (@pxref{GNU Matching} and
2129
@ref{GNU Searching}), i.e., the address of a structure of this type, as
2130
defined in @file{regex.h}:
2132
@c We don't bother to include this directly from regex.h,
2133
@c since it changes so rarely.
2135
@tindex re_registers
2136
@vindex num_regs @r{in @code{struct re_registers}}
2137
@vindex start @r{in @code{struct re_registers}}
2138
@vindex end @r{in @code{struct re_registers}}
2147
Except for (possibly) the @var{num_regs}'th element (see below), the
2148
@var{i}th element of the @code{start} and @code{end} arrays records
2149
information about the @var{i}th group in the pattern. (They're declared
2150
as C pointers, but this is only because not all C compilers accept
2151
zero-length arrays; conceptually, it is simplest to think of them as
2154
The @code{start} and @code{end} arrays are allocated in various ways,
2155
depending on the value of the @code{regs_allocated}
2156
@vindex regs_allocated
2157
field in the pattern buffer passed to the matcher.
2159
The simplest and perhaps most useful is to let the matcher (re)allocate
2160
enough space to record information for all the groups in the regular
2161
expression. If @code{regs_allocated} is @code{REGS_UNALLOCATED},
2162
@vindex REGS_UNALLOCATED
2163
the matcher allocates @math{1 + @var{re_nsub}} (another field in the
2164
pattern buffer; @pxref{GNU Pattern Buffers}). The extra element is set
2165
to @math{-1}, and sets @code{regs_allocated} to @code{REGS_REALLOCATE}.
2166
@vindex REGS_REALLOCATE
2167
Then on subsequent calls with the same pattern buffer and @var{regs}
2168
arguments, the matcher reallocates more space if necessary.
2170
It would perhaps be more logical to make the @code{regs_allocated} field
2171
part of the @code{re_registers} structure, instead of part of the
2172
pattern buffer. But in that case the caller would be forced to
2173
initialize the structure before passing it. Much existing code doesn't
2174
do this initialization, and it's arguably better to avoid it anyway.
2176
@code{re_compile_pattern} sets @code{regs_allocated} to
2177
@code{REGS_UNALLOCATED},
2178
so if you use the GNU regular expression
2179
functions, you get this behavior by default.
2181
xx document re_set_registers
2183
@sc{posix}, on the other hand, requires a different interface: the
2184
caller is supposed to pass in a fixed-length array which the matcher
2185
fills. Therefore, if @code{regs_allocated} is @code{REGS_FIXED}
2187
the matcher simply fills that array.
2189
The following examples illustrate the information recorded in the
2190
@code{re_registers} structure. (In all of them, @samp{(} represents the
2191
open-group and @samp{)} the close-group operator. The first character
2192
in the string @var{string} is at index 0.)
2194
@c xx i'm not sure this is all true anymore.
2199
If the regular expression has an @w{@var{i}-th}
2200
group not contained within another group that matches a
2201
substring of @var{string}, then the function sets
2202
@code{@w{@var{regs}->}start[@var{i}]} to the index in @var{string} where
2203
the substring matched by the @w{@var{i}-th} group begins, and
2204
@code{@w{@var{regs}->}end[@var{i}]} to the index just beyond that
2205
substring's end. The function sets @code{@w{@var{regs}->}start[0]} and
2206
@code{@w{@var{regs}->}end[0]} to analogous information about the entire
2209
For example, when you match @samp{((a)(b))} against @samp{ab}, you get:
2213
0 in @code{@w{@var{regs}->}start[0]} and 2 in @code{@w{@var{regs}->}end[0]}
2216
0 in @code{@w{@var{regs}->}start[1]} and 2 in @code{@w{@var{regs}->}end[1]}
2219
0 in @code{@w{@var{regs}->}start[2]} and 1 in @code{@w{@var{regs}->}end[2]}
2222
1 in @code{@w{@var{regs}->}start[3]} and 2 in @code{@w{@var{regs}->}end[3]}
2226
If a group matches more than once (as it might if followed by,
2227
e.g., a repetition operator), then the function reports the information
2228
about what the group @emph{last} matched.
2230
For example, when you match the pattern @samp{(a)*} against the string
2235
0 in @code{@w{@var{regs}->}start[0]} and 2 in @code{@w{@var{regs}->}end[0]}
2238
1 in @code{@w{@var{regs}->}start[1]} and 2 in @code{@w{@var{regs}->}end[1]}
2242
If the @w{@var{i}-th} group does not participate in a
2243
successful match, e.g., it is an alternative not taken or a
2244
repetition operator allows zero repetitions of it, then the function
2245
sets @code{@w{@var{regs}->}start[@var{i}]} and
2246
@code{@w{@var{regs}->}end[@var{i}]} to @math{-1}.
2248
For example, when you match the pattern @samp{(a)*b} against
2249
the string @samp{b}, you get:
2253
0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]}
2256
@math{-1} in @code{@w{@var{regs}->}start[1]} and @math{-1} in @code{@w{@var{regs}->}end[1]}
2260
If the @w{@var{i}-th} group matches a zero-length string, then the
2261
function sets @code{@w{@var{regs}->}start[@var{i}]} and
2262
@code{@w{@var{regs}->}end[@var{i}]} to the index just beyond that
2265
For example, when you match the pattern @samp{(a*)b} against the string
2270
0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]}
2273
0 in @code{@w{@var{regs}->}start[1]} and 0 in @code{@w{@var{regs}->}end[1]}
2277
The function sets @code{@w{@var{regs}->}start[0]} and
2278
@code{@w{@var{regs}->}end[0]} to analogous information about the entire
2281
For example, when you match the pattern @samp{(a*)} against the empty
2286
0 in @code{@w{@var{regs}->}start[0]} and 0 in @code{@w{@var{regs}->}end[0]}
2289
0 in @code{@w{@var{regs}->}start[1]} and 0 in @code{@w{@var{regs}->}end[1]}
2294
If an @w{@var{i}-th} group contains a @w{@var{j}-th} group
2295
in turn not contained within any other group within group @var{i} and
2296
the function reports a match of the @w{@var{i}-th} group, then it
2297
records in @code{@w{@var{regs}->}start[@var{j}]} and
2298
@code{@w{@var{regs}->}end[@var{j}]} the last match (if it matched) of
2299
the @w{@var{j}-th} group.
2301
For example, when you match the pattern @samp{((a*)b)*} against the
2302
string @samp{abb}, @w{group 2} last matches the empty string, so you
2303
get what it previously matched:
2307
0 in @code{@w{@var{regs}->}start[0]} and 3 in @code{@w{@var{regs}->}end[0]}
2310
2 in @code{@w{@var{regs}->}start[1]} and 3 in @code{@w{@var{regs}->}end[1]}
2313
2 in @code{@w{@var{regs}->}start[2]} and 2 in @code{@w{@var{regs}->}end[2]}
2316
When you match the pattern @samp{((a)*b)*} against the string
2317
@samp{abb}, @w{group 2} doesn't participate in the last match, so you
2322
0 in @code{@w{@var{regs}->}start[0]} and 3 in @code{@w{@var{regs}->}end[0]}
2325
2 in @code{@w{@var{regs}->}start[1]} and 3 in @code{@w{@var{regs}->}end[1]}
2328
0 in @code{@w{@var{regs}->}start[2]} and 1 in @code{@w{@var{regs}->}end[2]}
2332
If an @w{@var{i}-th} group contains a @w{@var{j}-th} group
2333
in turn not contained within any other group within group @var{i}
2334
and the function sets
2335
@code{@w{@var{regs}->}start[@var{i}]} and
2336
@code{@w{@var{regs}->}end[@var{i}]} to @math{-1}, then it also sets
2337
@code{@w{@var{regs}->}start[@var{j}]} and
2338
@code{@w{@var{regs}->}end[@var{j}]} to @math{-1}.
2340
For example, when you match the pattern @samp{((a)*b)*c} against the
2341
string @samp{c}, you get:
2345
0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]}
2348
@math{-1} in @code{@w{@var{regs}->}start[1]} and @math{-1} in @code{@w{@var{regs}->}end[1]}
2351
@math{-1} in @code{@w{@var{regs}->}start[2]} and @math{-1} in @code{@w{@var{regs}->}end[2]}
2356
@node Freeing GNU Pattern Buffers, , Using Registers, GNU Regex Functions
2357
@subsection Freeing GNU Pattern Buffers
2359
To free any allocated fields of a pattern buffer, you can use the
2360
@sc{posix} function described in @ref{Freeing POSIX Pattern Buffers},
2361
since the type @code{regex_t}---the type for @sc{posix} pattern
2362
buffers---is equivalent to the type @code{re_pattern_buffer}. After
2363
freeing a pattern buffer, you need to again compile a regular expression
2364
in it (@pxref{GNU Regular Expression Compiling}) before passing it to
2365
a matching or searching function.
2368
@node POSIX Regex Functions, BSD Regex Functions, GNU Regex Functions, Programming with Regex
2369
@section POSIX Regex Functions
2371
If you're writing code that has to be @sc{posix} compatible, you'll need
2372
to use these functions. Their interfaces are as specified by @sc{posix},
2376
* POSIX Pattern Buffers:: The regex_t type.
2377
* POSIX Regular Expression Compiling:: regcomp ()
2378
* POSIX Matching:: regexec ()
2379
* Reporting Errors:: regerror ()
2380
* Using Byte Offsets:: The regmatch_t type.
2381
* Freeing POSIX Pattern Buffers:: regfree ()
2385
@node POSIX Pattern Buffers, POSIX Regular Expression Compiling, , POSIX Regex Functions
2386
@subsection POSIX Pattern Buffers
2388
To compile or match a given regular expression the @sc{posix} way, you
2389
must supply a pattern buffer exactly the way you do for @sc{gnu}
2390
(@pxref{GNU Pattern Buffers}). @sc{posix} pattern buffers have type
2391
@code{regex_t}, which is equivalent to the @sc{gnu} pattern buffer
2392
type @code{re_pattern_buffer}.
2395
@node POSIX Regular Expression Compiling, POSIX Matching, POSIX Pattern Buffers, POSIX Regex Functions
2396
@subsection POSIX Regular Expression Compiling
2398
With @sc{posix}, you can only search for a given regular expression; you
2399
can't match it. To do this, you must first compile it in a
2400
pattern buffer, using @code{regcomp}.
2403
Before calling @code{regcomp}, you must initialize this pattern buffer
2404
as you do for @sc{gnu} (@pxref{GNU Regular Expression Compiling}). See
2405
below, however, for how to choose a syntax with which to compile.
2408
To compile a pattern buffer, use:
2413
regcomp (regex_t *@var{preg}, const char *@var{regex}, int @var{cflags})
2417
@var{preg} is the initialized pattern buffer's address, @var{regex} is
2418
the regular expression's address, and @var{cflags} is the compilation
2419
flags, which Regex considers as a collection of bits. Here are the
2420
valid bits, as defined in @file{regex.h}:
2425
@vindex REG_EXTENDED
2426
says to use @sc{posix} Extended Regular Expression syntax; if this isn't
2427
set, then says to use @sc{posix} Basic Regular Expression syntax.
2428
@code{regcomp} sets @var{preg}'s @code{syntax} field accordingly.
2432
@cindex ignoring case
2433
says to ignore case; @code{regcomp} sets @var{preg}'s @code{translate}
2434
field to a translate table which ignores case, replacing anything you've
2439
says to set @var{preg}'s @code{no_sub} field; @pxref{POSIX Matching},
2440
for what this means.
2449
match-any-character operator (@pxref{Match-any-character
2450
Operator}) doesn't match a newline.
2453
nonmatching list not containing a newline (@pxref{List
2454
Operators}) matches a newline.
2457
match-beginning-of-line operator (@pxref{Match-beginning-of-line
2458
Operator}) matches the empty string immediately after a newline,
2459
regardless of how @code{REG_NOTBOL} is set (@pxref{POSIX Matching}, for
2460
an explanation of @code{REG_NOTBOL}).
2463
match-end-of-line operator (@pxref{Match-beginning-of-line
2464
Operator}) matches the empty string immediately before a newline,
2465
regardless of how @code{REG_NOTEOL} is set (@pxref{POSIX Matching},
2466
for an explanation of @code{REG_NOTEOL}).
2472
If @code{regcomp} successfully compiles the regular expression, it
2473
returns zero and sets @code{*@var{pattern_buffer}} to the compiled
2474
pattern. Except for @code{syntax} (which it sets as explained above), it
2475
also sets the same fields the same way as does the @sc{gnu} compiling
2476
function (@pxref{GNU Regular Expression Compiling}).
2478
If @code{regcomp} can't compile the regular expression, it returns one
2479
of the error codes listed here. (Except when noted differently, the
2480
syntax of in all examples below is basic regular expression syntax.)
2484
@comment repetitions
2486
For example, the consecutive repetition operators @samp{**} in
2487
@samp{a**} are invalid. As another example, if the syntax is extended
2488
regular expression syntax, then the repetition operator @samp{*} with
2489
nothing on which to operate in @samp{*} is invalid.
2492
For example, the @var{count} @samp{-1} in @samp{a\@{-1} is invalid.
2495
For example, @samp{a\@{1} is missing a close-interval operator.
2499
For example, @samp{[a} is missing a close-list operator.
2502
For example, the range ending point @samp{z} that collates lower than
2503
does its starting point @samp{a} in @samp{[z-a]} is invalid. Also, the
2504
range with the character class @samp{[:alpha:]} as its starting point in
2505
@samp{[[:alpha:]-|]}.
2508
For example, the character class name @samp{foo} in @samp{[[:foo:]} is
2513
For example, @samp{a\)} is missing an open-group operator and @samp{\(a}
2514
is missing a close-group operator.
2517
For example, the back reference @samp{\2} that refers to a nonexistent
2518
subexpression in @samp{\(a\)\2} is invalid.
2520
@comment unfinished business
2523
Returned when a regular expression causes no other more specific error.
2526
For example, the trailing backslash @samp{\} in @samp{a\} is invalid, as is the
2529
@comment kitchen sink
2531
For example, in the extended regular expression syntax, the empty group
2532
@samp{()} in @samp{a()b} is invalid.
2536
Returned when a regular expression needs a pattern buffer larger than
2540
Returned when a regular expression makes Regex to run out of memory.
2545
@node POSIX Matching, Reporting Errors, POSIX Regular Expression Compiling, POSIX Regex Functions
2546
@subsection POSIX Matching
2548
Matching the @sc{posix} way means trying to match a null-terminated
2549
string starting at its first character. Once you've compiled a pattern
2550
into a pattern buffer (@pxref{POSIX Regular Expression Compiling}), you
2551
can ask the matcher to match that pattern against a string using:
2556
regexec (const regex_t *@var{preg}, const char *@var{string},
2557
size_t @var{nmatch}, regmatch_t @var{pmatch}[], int @var{eflags})
2561
@var{preg} is the address of a pattern buffer for a compiled pattern.
2562
@var{string} is the string you want to match.
2564
@xref{Using Byte Offsets}, for an explanation of @var{pmatch}. If you
2565
pass zero for @var{nmatch} or you compiled @var{preg} with the
2566
compilation flag @code{REG_NOSUB} set, then @code{regexec} will ignore
2567
@var{pmatch}; otherwise, you must allocate it to have at least
2568
@var{nmatch} elements. @code{regexec} will record @var{nmatch} byte
2569
offsets in @var{pmatch}, and set to @math{-1} any unused elements up to
2570
@math{@var{pmatch}@code{[@var{nmatch}]} - 1}.
2572
@var{eflags} specifies @dfn{execution flags}---namely, the two bits
2573
@code{REG_NOTBOL} and @code{REG_NOTEOL} (defined in @file{regex.h}). If
2574
you set @code{REG_NOTBOL}, then the match-beginning-of-line operator
2575
(@pxref{Match-beginning-of-line Operator}) always fails to match.
2576
This lets you match against pieces of a line, as you would need to if,
2577
say, searching for repeated instances of a given pattern in a line; it
2578
would work correctly for patterns both with and without
2579
match-beginning-of-line operators. @code{REG_NOTEOL} works analogously
2580
for the match-end-of-line operator (@pxref{Match-end-of-line
2581
Operator}); it exists for symmetry.
2583
@code{regexec} tries to find a match for @var{preg} in @var{string}
2584
according to the syntax in @var{preg}'s @code{syntax} field.
2585
(@xref{POSIX Regular Expression Compiling}, for how to set it.) The
2586
function returns zero if the compiled pattern matches @var{string} and
2587
@code{REG_NOMATCH} (defined in @file{regex.h}) if it doesn't.
2589
@node Reporting Errors, Using Byte Offsets, POSIX Matching, POSIX Regex Functions
2590
@subsection Reporting Errors
2592
If either @code{regcomp} or @code{regexec} fail, they return a nonzero
2593
error code, the possibilities for which are defined in @file{regex.h}.
2594
@xref{POSIX Regular Expression Compiling}, and @ref{POSIX Matching}, for
2595
what these codes mean. To get an error string corresponding to these
2601
regerror (int @var{errcode},
2602
const regex_t *@var{preg},
2604
size_t @var{errbuf_size})
2608
@var{errcode} is an error code, @var{preg} is the address of the pattern
2609
buffer which provoked the error, @var{errbuf} is the error buffer, and
2610
@var{errbuf_size} is @var{errbuf}'s size.
2612
@code{regerror} returns the size in bytes of the error string
2613
corresponding to @var{errcode} (including its terminating null). If
2614
@var{errbuf} and @var{errbuf_size} are nonzero, it also returns in
2615
@var{errbuf} the first @math{@var{errbuf_size} - 1} characters of the
2616
error string, followed by a null.
2617
@var{errbuf_size} must be a nonnegative number less than or equal to the
2618
size in bytes of @var{errbuf}.
2620
You can call @code{regerror} with a null @var{errbuf} and a zero
2621
@var{errbuf_size} to determine how large @var{errbuf} need be to
2622
accommodate @code{regerror}'s error string.
2624
@node Using Byte Offsets, Freeing POSIX Pattern Buffers, Reporting Errors, POSIX Regex Functions
2625
@subsection Using Byte Offsets
2627
In @sc{posix}, variables of type @code{regmatch_t} hold analogous
2628
information, but are not identical to, @sc{gnu}'s registers (@pxref{Using
2629
Registers}). To get information about registers in @sc{posix}, pass to
2630
@code{regexec} a nonzero @var{pmatch} of type @code{regmatch_t}, i.e.,
2631
the address of a structure of this type, defined in
2643
When reading in @ref{Using Registers}, about how the matching function
2644
stores the information into the registers, substitute @var{pmatch} for
2645
@var{regs}, @code{@w{@var{pmatch}[@var{i}]->}rm_so} for
2646
@code{@w{@var{regs}->}start[@var{i}]} and
2647
@code{@w{@var{pmatch}[@var{i}]->}rm_eo} for
2648
@code{@w{@var{regs}->}end[@var{i}]}.
2650
@node Freeing POSIX Pattern Buffers, , Using Byte Offsets, POSIX Regex Functions
2651
@subsection Freeing POSIX Pattern Buffers
2653
To free any allocated fields of a pattern buffer, use:
2658
regfree (regex_t *@var{preg})
2662
@var{preg} is the pattern buffer whose allocated fields you want freed.
2663
@code{regfree} also sets @var{preg}'s @code{allocated} and @code{used}
2664
fields to zero. After freeing a pattern buffer, you need to again
2665
compile a regular expression in it (@pxref{POSIX Regular Expression
2666
Compiling}) before passing it to the matching function (@pxref{POSIX
2670
@node BSD Regex Functions, , POSIX Regex Functions, Programming with Regex
2671
@section BSD Regex Functions
2673
If you're writing code that has to be Berkeley @sc{unix} compatible,
2674
you'll need to use these functions whose interfaces are the same as those
2675
in Berkeley @sc{unix}.
2678
* BSD Regular Expression Compiling:: re_comp ()
2679
* BSD Searching:: re_exec ()
2682
@node BSD Regular Expression Compiling, BSD Searching, , BSD Regex Functions
2683
@subsection BSD Regular Expression Compiling
2685
With Berkeley @sc{unix}, you can only search for a given regular
2686
expression; you can't match one. To search for it, you must first
2687
compile it. Before you compile it, you must indicate the regular
2688
expression syntax you want it compiled according to by setting the
2689
variable @code{re_syntax_options} (declared in @file{regex.h} to some
2690
syntax (@pxref{Regular Expression Syntax}).
2692
To compile a regular expression use:
2697
re_comp (char *@var{regex})
2701
@var{regex} is the address of a null-terminated regular expression.
2702
@code{re_comp} uses an internal pattern buffer, so you can use only the
2703
most recently compiled pattern buffer. This means that if you want to
2704
use a given regular expression that you've already compiled---but it
2705
isn't the latest one you've compiled---you'll have to recompile it. If
2706
you call @code{re_comp} with the null string (@emph{not} the empty
2707
string) as the argument, it doesn't change the contents of the pattern
2710
If @code{re_comp} successfully compiles the regular expression, it
2711
returns zero. If it can't compile the regular expression, it returns
2712
an error string. @code{re_comp}'s error messages are identical to those
2713
of @code{re_compile_pattern} (@pxref{GNU Regular Expression
2716
@node BSD Searching, , BSD Regular Expression Compiling, BSD Regex Functions
2717
@subsection BSD Searching
2719
Searching the Berkeley @sc{unix} way means searching in a string
2720
starting at its first character and trying successive positions within
2721
it to find a match. Once you've compiled a pattern using @code{re_comp}
2722
(@pxref{BSD Regular Expression Compiling}), you can ask Regex
2723
to search for that pattern in a string using:
2728
re_exec (char *@var{string})
2732
@var{string} is the address of the null-terminated string in which you
2735
@code{re_exec} returns either 1 for success or 0 for failure. It
2736
automatically uses a @sc{gnu} fastmap (@pxref{Searching with Fastmaps}).
2739
@node Copying, Index, Programming with Regex, Top
2740
@appendix GNU GENERAL PUBLIC LICENSE
2741
@center Version 2, June 1991
2744
Copyright @copyright{} 1989, 1991 Free Software Foundation, Inc.
2745
675 Mass Ave, Cambridge, MA 02139, USA
2747
Everyone is permitted to copy and distribute verbatim copies
2748
of this license document, but changing it is not allowed.
2751
@unnumberedsec Preamble
2753
The licenses for most software are designed to take away your
2754
freedom to share and change it. By contrast, the GNU General Public
2755
License is intended to guarantee your freedom to share and change free
2756
software---to make sure the software is free for all its users. This
2757
General Public License applies to most of the Free Software
2758
Foundation's software and to any other program whose authors commit to
2759
using it. (Some other Free Software Foundation software is covered by
2760
the GNU Library General Public License instead.) You can apply it to
2763
When we speak of free software, we are referring to freedom, not
2764
price. Our General Public Licenses are designed to make sure that you
2765
have the freedom to distribute copies of free software (and charge for
2766
this service if you wish), that you receive source code or can get it
2767
if you want it, that you can change the software or use pieces of it
2768
in new free programs; and that you know you can do these things.
2770
To protect your rights, we need to make restrictions that forbid
2771
anyone to deny you these rights or to ask you to surrender the rights.
2772
These restrictions translate to certain responsibilities for you if you
2773
distribute copies of the software, or if you modify it.
2775
For example, if you distribute copies of such a program, whether
2776
gratis or for a fee, you must give the recipients all the rights that
2777
you have. You must make sure that they, too, receive or can get the
2778
source code. And you must show them these terms so they know their
2781
We protect your rights with two steps: (1) copyright the software, and
2782
(2) offer you this license which gives you legal permission to copy,
2783
distribute and/or modify the software.
2785
Also, for each author's protection and ours, we want to make certain
2786
that everyone understands that there is no warranty for this free
2787
software. If the software is modified by someone else and passed on, we
2788
want its recipients to know that what they have is not the original, so
2789
that any problems introduced by others will not reflect on the original
2790
authors' reputations.
2792
Finally, any free program is threatened constantly by software
2793
patents. We wish to avoid the danger that redistributors of a free
2794
program will individually obtain patent licenses, in effect making the
2795
program proprietary. To prevent this, we have made it clear that any
2796
patent must be licensed for everyone's free use or not licensed at all.
2798
The precise terms and conditions for copying, distribution and
2799
modification follow.
2802
@unnumberedsec TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
2805
@center TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
2810
This License applies to any program or other work which contains
2811
a notice placed by the copyright holder saying it may be distributed
2812
under the terms of this General Public License. The ``Program'', below,
2813
refers to any such program or work, and a ``work based on the Program''
2814
means either the Program or any derivative work under copyright law:
2815
that is to say, a work containing the Program or a portion of it,
2816
either verbatim or with modifications and/or translated into another
2817
language. (Hereinafter, translation is included without limitation in
2818
the term ``modification''.) Each licensee is addressed as ``you''.
2820
Activities other than copying, distribution and modification are not
2821
covered by this License; they are outside its scope. The act of
2822
running the Program is not restricted, and the output from the Program
2823
is covered only if its contents constitute a work based on the
2824
Program (independent of having been made by running the Program).
2825
Whether that is true depends on what the Program does.
2828
You may copy and distribute verbatim copies of the Program's
2829
source code as you receive it, in any medium, provided that you
2830
conspicuously and appropriately publish on each copy an appropriate
2831
copyright notice and disclaimer of warranty; keep intact all the
2832
notices that refer to this License and to the absence of any warranty;
2833
and give any other recipients of the Program a copy of this License
2834
along with the Program.
2836
You may charge a fee for the physical act of transferring a copy, and
2837
you may at your option offer warranty protection in exchange for a fee.
2840
You may modify your copy or copies of the Program or any portion
2841
of it, thus forming a work based on the Program, and copy and
2842
distribute such modifications or work under the terms of Section 1
2843
above, provided that you also meet all of these conditions:
2847
You must cause the modified files to carry prominent notices
2848
stating that you changed the files and the date of any change.
2851
You must cause any work that you distribute or publish, that in
2852
whole or in part contains or is derived from the Program or any
2853
part thereof, to be licensed as a whole at no charge to all third
2854
parties under the terms of this License.
2857
If the modified program normally reads commands interactively
2858
when run, you must cause it, when started running for such
2859
interactive use in the most ordinary way, to print or display an
2860
announcement including an appropriate copyright notice and a
2861
notice that there is no warranty (or else, saying that you provide
2862
a warranty) and that users may redistribute the program under
2863
these conditions, and telling the user how to view a copy of this
2864
License. (Exception: if the Program itself is interactive but
2865
does not normally print such an announcement, your work based on
2866
the Program is not required to print an announcement.)
2869
These requirements apply to the modified work as a whole. If
2870
identifiable sections of that work are not derived from the Program,
2871
and can be reasonably considered independent and separate works in
2872
themselves, then this License, and its terms, do not apply to those
2873
sections when you distribute them as separate works. But when you
2874
distribute the same sections as part of a whole which is a work based
2875
on the Program, the distribution of the whole must be on the terms of
2876
this License, whose permissions for other licensees extend to the
2877
entire whole, and thus to each and every part regardless of who wrote it.
2879
Thus, it is not the intent of this section to claim rights or contest
2880
your rights to work written entirely by you; rather, the intent is to
2881
exercise the right to control the distribution of derivative or
2882
collective works based on the Program.
2884
In addition, mere aggregation of another work not based on the Program
2885
with the Program (or with a work based on the Program) on a volume of
2886
a storage or distribution medium does not bring the other work under
2887
the scope of this License.
2890
You may copy and distribute the Program (or a work based on it,
2891
under Section 2) in object code or executable form under the terms of
2892
Sections 1 and 2 above provided that you also do one of the following:
2896
Accompany it with the complete corresponding machine-readable
2897
source code, which must be distributed under the terms of Sections
2898
1 and 2 above on a medium customarily used for software interchange; or,
2901
Accompany it with a written offer, valid for at least three
2902
years, to give any third party, for a charge no more than your
2903
cost of physically performing source distribution, a complete
2904
machine-readable copy of the corresponding source code, to be
2905
distributed under the terms of Sections 1 and 2 above on a medium
2906
customarily used for software interchange; or,
2909
Accompany it with the information you received as to the offer
2910
to distribute corresponding source code. (This alternative is
2911
allowed only for noncommercial distribution and only if you
2912
received the program in object code or executable form with such
2913
an offer, in accord with Subsection b above.)
2916
The source code for a work means the preferred form of the work for
2917
making modifications to it. For an executable work, complete source
2918
code means all the source code for all modules it contains, plus any
2919
associated interface definition files, plus the scripts used to
2920
control compilation and installation of the executable. However, as a
2921
special exception, the source code distributed need not include
2922
anything that is normally distributed (in either source or binary
2923
form) with the major components (compiler, kernel, and so on) of the
2924
operating system on which the executable runs, unless that component
2925
itself accompanies the executable.
2927
If distribution of executable or object code is made by offering
2928
access to copy from a designated place, then offering equivalent
2929
access to copy the source code from the same place counts as
2930
distribution of the source code, even though third parties are not
2931
compelled to copy the source along with the object code.
2934
You may not copy, modify, sublicense, or distribute the Program
2935
except as expressly provided under this License. Any attempt
2936
otherwise to copy, modify, sublicense or distribute the Program is
2937
void, and will automatically terminate your rights under this License.
2938
However, parties who have received copies, or rights, from you under
2939
this License will not have their licenses terminated so long as such
2940
parties remain in full compliance.
2943
You are not required to accept this License, since you have not
2944
signed it. However, nothing else grants you permission to modify or
2945
distribute the Program or its derivative works. These actions are
2946
prohibited by law if you do not accept this License. Therefore, by
2947
modifying or distributing the Program (or any work based on the
2948
Program), you indicate your acceptance of this License to do so, and
2949
all its terms and conditions for copying, distributing or modifying
2950
the Program or works based on it.
2953
Each time you redistribute the Program (or any work based on the
2954
Program), the recipient automatically receives a license from the
2955
original licensor to copy, distribute or modify the Program subject to
2956
these terms and conditions. You may not impose any further
2957
restrictions on the recipients' exercise of the rights granted herein.
2958
You are not responsible for enforcing compliance by third parties to
2962
If, as a consequence of a court judgment or allegation of patent
2963
infringement or for any other reason (not limited to patent issues),
2964
conditions are imposed on you (whether by court order, agreement or
2965
otherwise) that contradict the conditions of this License, they do not
2966
excuse you from the conditions of this License. If you cannot
2967
distribute so as to satisfy simultaneously your obligations under this
2968
License and any other pertinent obligations, then as a consequence you
2969
may not distribute the Program at all. For example, if a patent
2970
license would not permit royalty-free redistribution of the Program by
2971
all those who receive copies directly or indirectly through you, then
2972
the only way you could satisfy both it and this License would be to
2973
refrain entirely from distribution of the Program.
2975
If any portion of this section is held invalid or unenforceable under
2976
any particular circumstance, the balance of the section is intended to
2977
apply and the section as a whole is intended to apply in other
2980
It is not the purpose of this section to induce you to infringe any
2981
patents or other property right claims or to contest validity of any
2982
such claims; this section has the sole purpose of protecting the
2983
integrity of the free software distribution system, which is
2984
implemented by public license practices. Many people have made
2985
generous contributions to the wide range of software distributed
2986
through that system in reliance on consistent application of that
2987
system; it is up to the author/donor to decide if he or she is willing
2988
to distribute software through any other system and a licensee cannot
2991
This section is intended to make thoroughly clear what is believed to
2992
be a consequence of the rest of this License.
2995
If the distribution and/or use of the Program is restricted in
2996
certain countries either by patents or by copyrighted interfaces, the
2997
original copyright holder who places the Program under this License
2998
may add an explicit geographical distribution limitation excluding
2999
those countries, so that distribution is permitted only in or among
3000
countries not thus excluded. In such case, this License incorporates
3001
the limitation as if written in the body of this License.
3004
The Free Software Foundation may publish revised and/or new versions
3005
of the General Public License from time to time. Such new versions will
3006
be similar in spirit to the present version, but may differ in detail to
3007
address new problems or concerns.
3009
Each version is given a distinguishing version number. If the Program
3010
specifies a version number of this License which applies to it and ``any
3011
later version'', you have the option of following the terms and conditions
3012
either of that version or of any later version published by the Free
3013
Software Foundation. If the Program does not specify a version number of
3014
this License, you may choose any version ever published by the Free Software
3018
If you wish to incorporate parts of the Program into other free
3019
programs whose distribution conditions are different, write to the author
3020
to ask for permission. For software which is copyrighted by the Free
3021
Software Foundation, write to the Free Software Foundation; we sometimes
3022
make exceptions for this. Our decision will be guided by the two goals
3023
of preserving the free status of all derivatives of our free software and
3024
of promoting the sharing and reuse of software generally.
3027
@heading NO WARRANTY
3034
BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
3035
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
3036
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
3037
PROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
3038
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
3039
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
3040
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
3041
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
3042
REPAIR OR CORRECTION.
3045
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
3046
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
3047
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
3048
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
3049
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
3050
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
3051
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
3052
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
3053
POSSIBILITY OF SUCH DAMAGES.
3057
@heading END OF TERMS AND CONDITIONS
3060
@center END OF TERMS AND CONDITIONS
3064
@unnumberedsec Appendix: How to Apply These Terms to Your New Programs
3066
If you develop a new program, and you want it to be of the greatest
3067
possible use to the public, the best way to achieve this is to make it
3068
free software which everyone can redistribute and change under these terms.
3070
To do so, attach the following notices to the program. It is safest
3071
to attach them to the start of each source file to most effectively
3072
convey the exclusion of warranty; and each file should have at least
3073
the ``copyright'' line and a pointer to where the full notice is found.
3076
@var{one line to give the program's name and a brief idea of what it does.}
3077
Copyright (C) 19@var{yy} @var{name of author}
3079
This program is free software; you can redistribute it and/or modify
3080
it under the terms of the GNU General Public License as published by
3081
the Free Software Foundation; either version 2 of the License, or
3082
(at your option) any later version.
3084
This program is distributed in the hope that it will be useful,
3085
but WITHOUT ANY WARRANTY; without even the implied warranty of
3086
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
3087
GNU General Public License for more details.
3089
You should have received a copy of the GNU General Public License
3090
along with this program; if not, write to the Free Software
3091
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
3094
Also add information on how to contact you by electronic and paper mail.
3096
If the program is interactive, make it output a short notice like this
3097
when it starts in an interactive mode:
3100
Gnomovision version 69, Copyright (C) 19@var{yy} @var{name of author}
3101
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
3102
This is free software, and you are welcome to redistribute it
3103
under certain conditions; type `show c' for details.
3106
The hypothetical commands @samp{show w} and @samp{show c} should show
3107
the appropriate parts of the General Public License. Of course, the
3108
commands you use may be called something other than @samp{show w} and
3109
@samp{show c}; they could even be mouse-clicks or menu items---whatever
3112
You should also get your employer (if you work as a programmer) or your
3113
school, if any, to sign a ``copyright disclaimer'' for the program, if
3114
necessary. Here is a sample; alter the names:
3117
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
3118
`Gnomovision' (which makes passes at compilers) written by James Hacker.
3120
@var{signature of Ty Coon}, 1 April 1989
3121
Ty Coon, President of Vice
3124
This General Public License does not permit incorporating your program into
3125
proprietary programs. If your program is a subroutine library, you may
3126
consider it more useful to permit linking proprietary applications with the
3127
library. If this is what you want to do, use the GNU Library General
3128
Public License instead of this License.
3131
@node Index, , Copying, Top