919
921
* Array Functions:: Functions for working with arrays.
920
922
* Flattening Arrays:: How to flatten arrays.
921
923
* Creating Arrays:: How to create and populate arrays.
922
* Redirection API:: How to access and manipulate redirections.
924
* Redirection API:: How to access and manipulate
923
926
* Extension API Variables:: Variables provided by the API.
924
927
* Extension Versioning:: API Version information.
925
928
* Extension API Informational Variables:: Variables providing information about
983
986
* Configuration Philosophy:: How it's all supposed to work.
984
987
* Non-Unix Installation:: Installation on Other Operating
986
* PC Installation:: Installing and Compiling @command{gawk} on
989
* PC Installation:: Installing and Compiling
990
@command{gawk} on Microsoft Windows.
988
991
* PC Binary Installation:: Installing a prepared distribution.
989
* PC Compiling:: Compiling @command{gawk} for Windows32.
992
* PC Compiling:: Compiling @command{gawk} for
990
994
* PC Using:: Running @command{gawk} on Windows32.
991
995
* Cygwin:: Building and running @command{gawk}
6049
6052
compatibility mode (@pxref{Options}).
6050
6053
Case is always significant in compatibility mode.
6052
@node Strong Regexp Constants
6053
@section Strongly Typed Regexp Constants
6055
This @value{SECTION} describes a @command{gawk}-specific feature.
6057
Regexp constants (@code{/@dots{}/}) hold a strange position in the
6058
@command{awk} language. In most contexts, they act like an expression:
6059
@samp{$0 ~ /@dots{}/}. In other contexts, they denote only a regexp to
6060
be matched. In no case are they really a ``first class citizen'' of the
6061
language. That is, you cannot define a scalar variable whose type is
6062
``regexp'' in the same sense that you can define a variable to be a
6066
num = 42 @ii{Numeric variable}
6067
str = "hi" @ii{String variable}
6068
re = /foo/ @ii{Wrong!} re @ii{is the result of} $0 ~ /foo/
6071
6055
@node Regexp Summary
6072
6056
@section Summary
10384
10368
@node Using Constant Regexps
10385
10369
@subsection Using Regular Expression Constants
10371
Regular expression constants consist of text describing
10372
a regular expression enclosed in slashes (such as @code{/the +answer/}).
10373
This @value{SECTION} describes how such constants work in
10374
POSIX @command{awk} and @command{gawk}, and then goes on to describe
10375
@dfn{strongly typed regexp constants}, which are a @command{gawk} extension.
10378
* Standard Regexp Constants:: Regexp constants in standard @command{awk}.
10379
* Strong Regexp Constants:: Strongly typed regexp constants.
10382
@node Standard Regexp Constants
10383
@subsubsection Standard Regular Expression Constants
10387
10385
@cindex dark corner, regexp constants
10388
10386
When used on the righthand side of the @samp{~} or @samp{!~}
10389
10387
operators, a regexp constant merely stands for the regexp that is to be
10491
10489
a parameter to a user-defined function, because passing a truth value in
10492
10490
this way is probably not what was intended.
10492
@node Strong Regexp Constants
10493
@subsubsection Strongly Typed Regexp Constants
10495
This @value{SECTION} describes a @command{gawk}-specific feature.
10497
As we saw in the previous @value{SECTION},
10498
regexp constants (@code{/@dots{}/}) hold a strange position in the
10499
@command{awk} language. In most contexts, they act like an expression:
10500
@samp{$0 ~ /@dots{}/}. In other contexts, they denote only a regexp to
10501
be matched. In no case are they really a ``first class citizen'' of the
10502
language. That is, you cannot define a scalar variable whose type is
10503
``regexp'' in the same sense that you can define a variable to be a
10504
number or a string:
10507
num = 42 @ii{Numeric variable}
10508
str = "hi" @ii{String variable}
10509
re = /foo/ @ii{Wrong!} re @ii{is the result of} $0 ~ /foo/
10512
For a number of more advanced use cases,
10513
it would be nice to have regexp constants that
10514
are @dfn{strongly typed}; in other words, that denote a regexp useful
10515
for matching, and not an expression.
10517
@command{gawk} provides this feature. A strongly typed regexp constant
10518
looks almost like a regular regexp constant, except that it is preceded
10519
by an @samp{@@} sign:
10522
re = @@/foo/ @ii{Regexp variable}
10525
Strongly typed regexp constants @emph{cannot} be used everywhere that a
10526
regular regexp constant can, because this would make the language even more
10527
confusing. Instead, you may use them only in certain contexts:
10531
On the righthand side of the @samp{~} and @samp{!~} operators: @samp{some_var ~ @@/foo/}
10532
(@pxref{Regexp Usage}).
10535
In the @code{case} part of a @code{switch} statement
10536
(@pxref{Switch Statement}).
10539
As an argument to one of the built-in functions that accept regexp constants:
10547
(@pxref{String Functions}).
10550
As a parameter in a call to a user-defined function
10551
(@pxref{User-defined}).
10554
On the righthand side of an assignment to a variable: @samp{some_var = @@/foo/}.
10555
In this case, the type of @code{some_var} is regexp. Additionally, @code{some_var}
10556
can be used with @samp{~} and @samp{!~}, passed to one of the built-in functions
10557
listed above, or passed as a parameter to a user-defined function.
10560
You may use the @code{typeof()} built-in function
10561
(@pxref{Type Functions})
10562
to determine if a variable or function parameter is
10565
The true power of this feature comes from the ability to create variables that
10566
have regexp type. Such variables can be passed on to user-defined functions,
10567
without the confusing aspects of computed regular expressions created from
10568
strings or string constants. They may also be passed through indirect function
10569
calls (@pxref{Indirect Calls})
10570
and on to the built-in functions that accept regexp constants.
10572
When used in numeric conversions, strongly typed regexp variables convert
10573
to zero. When used in string conversions, they convert to the string
10574
value of the original regexp text.
10494
10576
@node Variables
10495
10577
@subsection Variables
11532
11614
program runs, from @dfn{untyped} before any use,@footnote{@command{gawk}
11533
11615
calls this @dfn{unassigned}, as the following example shows.} to string
11534
11616
or number, and then from string to number or number to string, as the
11535
program progresses.
11617
program progresses. (@command{gawk} also provides regexp-typed scalars,
11618
but let's ignore that for now; @pxref{Strong Regexp Constants}.)
11537
11620
You can't do much with untyped variables, other than tell that they
11538
11621
are untyped. The following program tests @code{a} against @code{""}
31394
31482
@quotation NOTE
31395
31483
String values passed to an extension by @command{gawk} are always
31396
@sc{NUL}-terminated. Thus it is safe to pass such string values to
31484
@sc{nul}-terminated. Thus it is safe to pass such string values to
31397
31485
standard library and system routines. However, because
31398
@command{gawk} allows embedded @sc{NUL} characters in string data,
31486
@command{gawk} allows embedded @sc{nul} characters in string data,
31399
31487
you should check that @samp{strlen(@var{some_string})} matches
31400
31488
the length for that string passed to the extension before using
31401
31489
it as a regular C string.