924
926
* Array Functions:: Functions for working with arrays.
925
927
* Flattening Arrays:: How to flatten arrays.
926
928
* Creating Arrays:: How to create and populate arrays.
927
* Redirection API:: How to access and manipulate redirections.
929
* Redirection API:: How to access and manipulate
928
931
* Extension API Variables:: Variables provided by the API.
929
932
* Extension Versioning:: API Version information.
930
933
* Extension API Informational Variables:: Variables providing information about
988
991
* Configuration Philosophy:: How it's all supposed to work.
989
992
* Non-Unix Installation:: Installation on Other Operating
991
* PC Installation:: Installing and Compiling @command{gawk} on
994
* PC Installation:: Installing and Compiling
995
@command{gawk} on Microsoft Windows.
993
996
* PC Binary Installation:: Installing a prepared distribution.
994
* PC Compiling:: Compiling @command{gawk} for Windows32.
997
* PC Compiling:: Compiling @command{gawk} for
995
999
* PC Using:: Running @command{gawk} on Windows32.
996
1000
* Cygwin:: Building and running @command{gawk}
6265
6268
compatibility mode (@pxref{Options}).
6266
6269
Case is always significant in compatibility mode.
6268
@node Strong Regexp Constants
6269
@section Strongly Typed Regexp Constants
6271
This @value{SECTION} describes a @command{gawk}-specific feature.
6273
Regexp constants (@code{/@dots{}/}) hold a strange position in the
6274
@command{awk} language. In most contexts, they act like an expression:
6275
@samp{$0 ~ /@dots{}/}. In other contexts, they denote only a regexp to
6276
be matched. In no case are they really a ``first class citizen'' of the
6277
language. That is, you cannot define a scalar variable whose type is
6278
``regexp'' in the same sense that you can define a variable to be a
6282
num = 42 @ii{Numeric variable}
6283
str = "hi" @ii{String variable}
6284
re = /foo/ @ii{Wrong!} re @ii{is the result of} $0 ~ /foo/
6287
6271
@node Regexp Summary
6288
6272
@section Summary
10926
10910
@node Using Constant Regexps
10927
10911
@subsection Using Regular Expression Constants
10913
Regular expression constants consist of text describing
10914
a regular expression enclosed in slashes (such as @code{/the +answer/}).
10915
This @value{SECTION} describes how such constants work in
10916
POSIX @command{awk} and @command{gawk}, and then goes on to describe
10917
@dfn{strongly typed regexp constants}, which are a @command{gawk} extension.
10920
* Standard Regexp Constants:: Regexp constants in standard @command{awk}.
10921
* Strong Regexp Constants:: Strongly typed regexp constants.
10924
@node Standard Regexp Constants
10925
@subsubsection Standard Regular Expression Constants
10929
10927
@cindex dark corner, regexp constants
10930
10928
When used on the righthand side of the @samp{~} or @samp{!~}
10931
10929
operators, a regexp constant merely stands for the regexp that is to be
11033
11031
a parameter to a user-defined function, because passing a truth value in
11034
11032
this way is probably not what was intended.
11034
@node Strong Regexp Constants
11035
@subsubsection Strongly Typed Regexp Constants
11037
This @value{SECTION} describes a @command{gawk}-specific feature.
11039
As we saw in the previous @value{SECTION},
11040
regexp constants (@code{/@dots{}/}) hold a strange position in the
11041
@command{awk} language. In most contexts, they act like an expression:
11042
@samp{$0 ~ /@dots{}/}. In other contexts, they denote only a regexp to
11043
be matched. In no case are they really a ``first class citizen'' of the
11044
language. That is, you cannot define a scalar variable whose type is
11045
``regexp'' in the same sense that you can define a variable to be a
11046
number or a string:
11049
num = 42 @ii{Numeric variable}
11050
str = "hi" @ii{String variable}
11051
re = /foo/ @ii{Wrong!} re @ii{is the result of} $0 ~ /foo/
11054
For a number of more advanced use cases,
11055
it would be nice to have regexp constants that
11056
are @dfn{strongly typed}; in other words, that denote a regexp useful
11057
for matching, and not an expression.
11059
@command{gawk} provides this feature. A strongly typed regexp constant
11060
looks almost like a regular regexp constant, except that it is preceded
11061
by an @samp{@@} sign:
11064
re = @@/foo/ @ii{Regexp variable}
11067
Strongly typed regexp constants @emph{cannot} be used everywhere that a
11068
regular regexp constant can, because this would make the language even more
11069
confusing. Instead, you may use them only in certain contexts:
11073
On the righthand side of the @samp{~} and @samp{!~} operators: @samp{some_var ~ @@/foo/}
11074
(@pxref{Regexp Usage}).
11077
In the @code{case} part of a @code{switch} statement
11078
(@pxref{Switch Statement}).
11081
As an argument to one of the built-in functions that accept regexp constants:
11089
(@pxref{String Functions}).
11092
As a parameter in a call to a user-defined function
11093
(@pxref{User-defined}).
11096
On the righthand side of an assignment to a variable: @samp{some_var = @@/foo/}.
11097
In this case, the type of @code{some_var} is regexp. Additionally, @code{some_var}
11098
can be used with @samp{~} and @samp{!~}, passed to one of the built-in functions
11099
listed above, or passed as a parameter to a user-defined function.
11102
You may use the @code{typeof()} built-in function
11103
(@pxref{Type Functions})
11104
to determine if a variable or function parameter is
11107
The true power of this feature comes from the ability to create variables that
11108
have regexp type. Such variables can be passed on to user-defined functions,
11109
without the confusing aspects of computed regular expressions created from
11110
strings or string constants. They may also be passed through indirect function
11111
calls (@pxref{Indirect Calls})
11112
and on to the built-in functions that accept regexp constants.
11114
When used in numeric conversions, strongly typed regexp variables convert
11115
to zero. When used in string conversions, they convert to the string
11116
value of the original regexp text.
11036
11118
@node Variables
11037
11119
@subsection Variables
12213
12295
program runs, from @dfn{untyped} before any use,@footnote{@command{gawk}
12214
12296
calls this @dfn{unassigned}, as the following example shows.} to string
12215
12297
or number, and then from string to number or number to string, as the
12216
program progresses.
12298
program progresses. (@command{gawk} also provides regexp-typed scalars,
12299
but let's ignore that for now; @pxref{Strong Regexp Constants}.)
12218
12301
You can't do much with untyped variables, other than tell that they
12219
12302
are untyped. The following program tests @code{a} against @code{""}
32380
32468
@quotation NOTE
32381
32469
String values passed to an extension by @command{gawk} are always
32382
@sc{NUL}-terminated. Thus it is safe to pass such string values to
32470
@sc{nul}-terminated. Thus it is safe to pass such string values to
32383
32471
standard library and system routines. However, because
32384
@command{gawk} allows embedded @sc{NUL} characters in string data,
32472
@command{gawk} allows embedded @sc{nul} characters in string data,
32385
32473
you should check that @samp{strlen(@var{some_string})} matches
32386
32474
the length for that string passed to the extension before using
32387
32475
it as a regular C string.