1291
1298
Perform simple network communications
1294
Profile and debug @command{awk} programs.
1301
Profile and debug @command{awk} programs
1297
Extend the language with functions written in C or C++.
1304
Extend the language with functions written in C or C++
1300
1307
This @value{DOCUMENT} teaches you about the @command{awk} language and
1301
1308
how you can use it effectively. You should already be familiar with basic
1302
system commands, such as @command{cat} and @command{ls},@footnote{These commands
1309
system commands, such as @command{cat} and @command{ls},@footnote{These utilities
1303
1310
are available on POSIX-compliant systems, as well as on traditional
1304
1311
Unix-based systems. If you are using some other operating system, you still need to
1305
1312
be familiar with the ideas of I/O redirection and pipes.} as well as basic shell
1650
1656
@uref{http://www.gnu.org/software/gawk/manual/html_node/Notes.html,
1651
1657
The appendix on implementation notes}
1652
describes how to disable @command{gawk}'s extensions, as
1653
well as how to contribute new code to @command{gawk},
1654
and some possible future directions for @command{gawk} development.
1658
describes how to disable @command{gawk}'s extensions, how to contribute
1659
new code to @command{gawk}, where to find information on some possible
1660
future directions for @command{gawk} development, and the design decisions
1661
behind the extension API.
1656
1663
@uref{http://www.gnu.org/software/gawk/manual/html_node/Basic-Concepts.html,
1657
1664
The appendix on basic concepts}
2373
2380
reading any input. If there are no other statements in your program,
2374
2381
as is the case here, @command{awk} just stops, instead of trying to read
2375
2382
input it doesn't know how to process.
2376
The @samp{\47} is a magic way of getting a single quote into
2383
The @samp{\47} is a magic way (explained later) of getting a single quote into
2377
2384
the program, without having to engage in ugly shell quoting tricks.
2379
2386
@quotation NOTE
2380
As a side note, if you use Bash as your shell, you should execute the
2387
If you use Bash as your shell, you should execute the
2381
2388
command @samp{set +H} before running this program interactively, to
2382
2389
disable the C shell-style command history, which treats @samp{!} as a
2383
2390
special character. We recommend putting this command into your personal
2521
2530
interpreter with the given argument and the full argument list of the
2522
2531
executed program. The first argument in the list is the full @value{FN}
2523
2532
of the @command{awk} program. The rest of the argument list contains
2524
either options to @command{awk}, or @value{DF}s, or both. Note that on
2533
either options to @command{awk}, or @value{DF}s, or both. (Note that on
2525
2534
many systems @command{awk} may be found in @file{/usr/bin} instead of
2526
in @file{/bin}. Caveat Emptor.
2528
2537
Some systems limit the length of the interpreter name to 32 characters.
2529
2538
Often, this can be dealt with by using a symbolic link.
2571
2580
interpreter with the given argument and the full argument list of the
2572
2581
executed program. The first argument in the list is the full @value{FN}
2573
2582
of the @command{awk} program. The rest of the argument list contains
2574
either options to @command{awk}, or @value{DF}s, or both. Note that on
2583
either options to @command{awk}, or @value{DF}s, or both. (Note that on
2575
2584
many systems @command{awk} may be found in @file{/usr/bin} instead of
2576
in @file{/bin}. Caveat Emptor.
2578
2587
Some systems limit the length of the interpreter name to 32 characters.
2579
2588
Often, this can be dealt with by using a symbolic link.
3230
3248
@cindex line continuations, with C shell
3231
3249
The first field contains read-write permissions, the second field contains
3232
the number of links to the file, and the third field identifies the owner of
3233
the file. The fourth field identifies the group of the file.
3234
The fifth field contains the size of the file in bytes. The
3250
the number of links to the file, and the third field identifies the file's owner.
3251
The fourth field identifies the file's group.
3252
The fifth field contains the file's size in bytes. The
3235
3253
sixth, seventh, and eighth fields contain the month, day, and time,
3236
3254
respectively, that the file was last modified. Finally, the ninth field
3237
contains the @value{FN}.@footnote{The @samp{LC_ALL=C} is
3238
needed to produce this traditional-style output from @command{ls}.}
3255
contains the @value{FN}.
3240
3257
@c @cindex automatic initialization
3241
3258
@cindex initialization, automatic
4133
4153
input files to be processed in the order specified. However, an
4134
4154
argument that has the form @code{@var{var}=@var{value}}, assigns
4135
4155
the value @var{value} to the variable @var{var}---it does not specify a
4138
@ref{Assignment Options}.)
4156
file at all. (See @ref{Assignment Options}.) In the following example,
4157
@var{count=1} is a variable assignment, not a @value{FN}:
4160
awk -f program.awk file1 count=1 file2
4140
4163
@cindex @command{gawk}, @code{ARGIND} variable in
4141
4164
@cindex @code{ARGIND} variable, command-line arguments
4142
4165
@cindex @code{ARGV} array, indexing into
4143
4166
@cindex @code{ARGC}/@code{ARGV} variables, command-line arguments
4144
All these arguments are made available to your @command{awk} program in the
4167
All the command-line arguments are made available to your @command{awk} program in the
4145
4168
@code{ARGV} array (@pxref{Built-in Variables}). Command-line options
4146
4169
and the program text (if present) are omitted from @code{ARGV}.
4147
4170
All other arguments, including variable assignments, are
4272
4295
@samp{.:/usr/local/share/awk}.@footnote{Your version of @command{gawk}
4273
4296
may use a different directory; it
4274
4297
will depend upon how @command{gawk} was built and installed. The actual
4275
directory is the value of @samp{$(datadir)} generated when
4298
directory is the value of @code{$(datadir)} generated when
4276
4299
@command{gawk} was configured. You probably don't need to worry about this,
4279
4302
The search path feature is particularly helpful for building libraries
4280
4303
of useful @command{awk} functions. The library files can be placed in a
4281
4304
standard directory in the default path and then specified on
4282
the command line with a short @value{FN}. Otherwise, the full @value{FN}
4283
would have to be typed for each file.
4305
the command line with a short @value{FN}. Otherwise, you would have to
4306
type the full @value{FN} for each file.
4285
4308
By using the @option{-i} option, or the @option{-e} and @option{-f} options, your command-line
4286
4309
@command{awk} programs can use facilities in @command{awk} library files
4289
4312
This is true for both @option{--traditional} and @option{--posix}.
4290
4313
@xref{Options}.
4292
If the source code is not found after the initial search, the path is searched
4315
If the source code file is not found after the initial search, the path is searched
4293
4316
again after adding the default @samp{.awk} suffix to the @value{FN}.
4297
@c using @samp{.} to get quotes, since @file{} no longer supplies them.
4299
the current directory in the path, either place
4300
@samp{.} explicitly in the path or write a null entry in the
4301
path. (A null entry is indicated by starting or ending the path with a
4302
colon or by placing two colons next to each other [@samp{::}].)
4303
This path search mechanism is similar
4318
@command{gawk}'s path search mechanism is similar
4304
4319
to the shell's.
4305
4320
(See @uref{http://www.gnu.org/software/bash/manual/,
4306
@cite{The Bourne-Again SHell manual}.})
4321
@cite{The Bourne-Again SHell manual}}.)
4322
It treats a null entry in the path as indicating the current
4324
(A null entry is indicated by starting or ending the path with a
4325
colon or by placing two colons next to each other [@samp{::}].)
4308
However, @command{gawk} always looks in the current directory @emph{before}
4309
searching @env{AWKPATH}, so there is no real reason to include
4310
the current directory in the search path.
4328
@command{gawk} always looks in the current directory @emph{before}
4329
searching @env{AWKPATH}. Thus, while you can include the current directory
4330
in the search path, either explicitly or with a null entry, there is no
4331
real reason to do so.
4311
4332
@c Prior to 4.0, gawk searched the current directory after the
4312
4333
@c path search, but it's not worth documenting it.
4397
4418
where I/O is performed in records, not in blocks.
4399
4420
@item GAWK_MSG_SRC
4400
If this variable exists, @command{gawk} includes the source file
4401
name and line number from which warning and/or fatal messages
4421
If this variable exists, @command{gawk} includes the file
4422
name and line number within the @command{gawk} source code
4423
from which warning and/or fatal messages
4402
4424
are generated. Its purpose is to help isolate the source of a
4403
message, since there can be multiple places which produce the
4425
message, since there are multiple places which produce the
4404
4426
same warning or error message.
4406
4428
@item GAWK_NO_DFA
5596
5619
echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}'
5599
This example uses the @code{sub()} function (which we haven't discussed yet;
5600
@pxref{String Functions})
5601
to make a change to the input record. Here, the regexp @code{/a+/}
5602
indicates ``one or more @samp{a} characters,'' and the replacement
5622
This example uses the @code{sub()} function to make a change to the input
5623
record. (@code{sub()} replaces the first instance of any text matched
5624
by the first argument with the string provided as the second argument;
5625
@pxref{String Functions}). Here, the regexp @code{/a+/} indicates ``one
5626
or more @samp{a} characters,'' and the replacement text is @samp{<A>}.
5605
5628
The input contains four @samp{a} characters.
5606
5629
@command{awk} (and POSIX) regular expressions always match
6457
6481
It happens that recent versions of @command{mawk} can use the @value{NUL}
6458
6482
character as a record separator. However, this is a special case:
6459
6483
@command{mawk} does not allow embedded @value{NUL} characters in strings.
6484
(This may change in a future version of @command{mawk}.)
6461
6486
@cindex records, treating files as
6462
6487
@cindex treating files, as single records
6463
@xref{Readfile Function}, for an interesting, portable way to read
6488
@xref{Readfile Function}, for an interesting way to read
6464
6489
whole files. If you are using @command{gawk}, see @ref{Extension Sample
6465
6490
Readfile}, for another option.
6507
6532
It happens that recent versions of @command{mawk} can use the @value{NUL}
6508
6533
character as a record separator. However, this is a special case:
6509
6534
@command{mawk} does not allow embedded @value{NUL} characters in strings.
6535
(This may change in a future version of @command{mawk}.)
6511
6537
@cindex records, treating files as
6512
6538
@cindex treating files, as single records
6513
@xref{Readfile Function}, for an interesting, portable way to read
6539
@xref{Readfile Function}, for an interesting way to read
6514
6540
whole files. If you are using @command{gawk}, see @ref{Extension Sample
6515
6541
Readfile}, for another option.
6594
6620
This example prints each record in the file @file{mail-list} whose first
6595
field contains the string @samp{li}. The operator @samp{~} is called a
6596
@dfn{matching operator}
6597
(@pxref{Regexp Usage});
6598
it tests whether a string (here, the field @code{$1}) matches a given regular
6621
field contains the string @samp{li}.
6601
By contrast, the following example
6602
looks for @samp{li} in @emph{the entire record} and prints the first
6603
field and the last field for each matching input record:
6623
By contrast, the following example looks for @samp{li} in @emph{the
6624
entire record} and prints the first and last fields for each matching
6606
6628
$ @kbd{awk '/li/ @{ print $1, $NF @}' mail-list}
7086
7108
The first @code{print} statement prints the record as it was read,
7087
7109
with leading whitespace intact. The assignment to @code{$2} rebuilds
7088
7110
@code{$0} by concatenating @code{$1} through @code{$NF} together,
7089
separated by the value of @code{OFS}. Because the leading whitespace
7090
was ignored when finding @code{$1}, it is not part of the new @code{$0}.
7091
Finally, the last @code{print} statement prints the new @code{$0}.
7111
separated by the value of @code{OFS} (which is a space by default).
7112
Because the leading whitespace was ignored when finding @code{$1},
7113
it is not part of the new @code{$0}. Finally, the last @code{print}
7114
statement prints the new @code{$0}.
7093
7116
@cindex @code{FS}, containing @code{^}
7094
7117
@cindex @code{^} (caret), in @code{FS}
7174
7197
sets @code{FS} to the @samp{,} character. Notice that the option uses
7175
7198
an uppercase @samp{F} instead of a lowercase @samp{f}. The latter
7176
option (@option{-f}) specifies a file
7177
containing an @command{awk} program. Case is significant in command-line
7179
the @option{-F} and @option{-f} options have nothing to do with each other.
7180
You can use both options at the same time to set the @code{FS} variable
7181
@emph{and} get an @command{awk} program from a file.
7199
option (@option{-f}) specifies a file containing an @command{awk} program.
7183
7201
The value used for the argument to @option{-F} is processed in exactly the
7184
7202
same way as assignments to the built-in variable @code{FS}.
8507
8519
@float Table,table-getline-variants
8508
8520
@caption{@code{getline} Variants and What They Set}
8509
8521
@multitable @columnfractions .33 .38 .27
8510
@headitem Variant @tab Effect @tab Standard / Extension
8511
@item @code{getline} @tab Sets @code{$0}, @code{NF}, @code{FNR}, @code{NR}, and @code{RT} @tab Standard
8512
@item @code{getline} @var{var} @tab Sets @var{var}, @code{FNR}, @code{NR}, and @code{RT} @tab Standard
8513
@item @code{getline <} @var{file} @tab Sets @code{$0}, @code{NF}, and @code{RT} @tab Standard
8514
@item @code{getline @var{var} < @var{file}} @tab Sets @var{var} and @code{RT} @tab Standard
8515
@item @var{command} @code{| getline} @tab Sets @code{$0}, @code{NF}, and @code{RT} @tab Standard
8516
@item @var{command} @code{| getline} @var{var} @tab Sets @var{var} and @code{RT} @tab Standard
8517
@item @var{command} @code{|& getline} @tab Sets @code{$0}, @code{NF}, and @code{RT} @tab Extension
8518
@item @var{command} @code{|& getline} @var{var} @tab Sets @var{var} and @code{RT} @tab Extension
8522
@headitem Variant @tab Effect @tab @command{awk} / @command{gawk}
8523
@item @code{getline} @tab Sets @code{$0}, @code{NF}, @code{FNR}, @code{NR}, and @code{RT} @tab @command{awk}
8524
@item @code{getline} @var{var} @tab Sets @var{var}, @code{FNR}, @code{NR}, and @code{RT} @tab @command{awk}
8525
@item @code{getline <} @var{file} @tab Sets @code{$0}, @code{NF}, and @code{RT} @tab @command{awk}
8526
@item @code{getline @var{var} < @var{file}} @tab Sets @var{var} and @code{RT} @tab @command{awk}
8527
@item @var{command} @code{| getline} @tab Sets @code{$0}, @code{NF}, and @code{RT} @tab @command{awk}
8528
@item @var{command} @code{| getline} @var{var} @tab Sets @var{var} and @code{RT} @tab @command{awk}
8529
@item @var{command} @code{|& getline} @tab Sets @code{$0}, @code{NF}, and @code{RT} @tab @command{gawk}
8530
@item @var{command} @code{|& getline} @var{var} @tab Sets @var{var} and @code{RT} @tab @command{gawk}
8519
8531
@end multitable
8521
8533
@c ENDOFRANGE getl
8692
8708
Field splitting is more complicated than record splitting.
8694
@multitable @columnfractions .40 .40 .20
8710
@multitable @columnfractions .40 .45 .15
8695
8711
@headitem Field separator value @tab Fields are split @dots{} @tab @command{awk} / @command{gawk}
8696
8712
@item @code{FS == " "} @tab On runs of whitespace @tab @command{awk}
8697
8713
@item @code{FS == @var{any single character}} @tab On that character @tab @command{awk}
8698
8714
@item @code{FS == @var{regexp}} @tab On text matching the regexp @tab @command{awk}
8699
8715
@item @code{FS == ""} @tab Each individual character is a separate field @tab @command{gawk}
8700
8716
@item @code{FIELDWIDTHS == @var{list of columns}} @tab Based on character position @tab @command{gawk}
8701
@item @code{FPAT == @var{regexp}} @tab On text around text matching the regexp @tab @command{gawk}
8717
@item @code{FPAT == @var{regexp}} @tab On the text surrounding text matching the regexp @tab @command{gawk}
8702
8718
@end multitable
8704
8721
Using @samp{FS = "\n"} causes the entire record to be a single field
8705
8722
(assuming that newlines separate records).
8837
8855
single @code{print} statement can make any number of lines this way.
8839
8857
@cindex newlines, printing
8840
The following is an example of printing a string that contains embedded newlines
8841
(the @samp{\n} is an escape sequence, used to represent the newline
8842
character; @pxref{Escape Sequences}):
8858
The following is an example of printing a string that contains embedded
8861
(the @samp{\n} is an escape sequence, used to represent the newline
8862
character; @pxref{Escape Sequences}):
8866
(the @samp{\n} is an escape sequence, used to represent the newline
8867
character; @pxref{Escape Sequences}):
8845
8876
$ @kbd{awk 'BEGIN @{ print "line one\nline two\nline three" @}'}
9019
9050
@cindexawkfunc{sprintf}
9020
9051
@cindex @code{OFMT} variable
9021
9052
@cindex output, format specifier@comma{} @code{OFMT}
9022
The built-in variable @code{OFMT} contains the default format specification
9053
The built-in variable @code{OFMT} contains the format specification
9023
9054
that @code{print} uses with @code{sprintf()} when it wants to convert a
9024
9055
number to a string for printing.
9025
9056
The default value of @code{OFMT} is @code{"%.6g"}.
9026
9057
The way @code{print} prints numbers can be changed
9027
by supplying different format specifications
9028
as the value of @code{OFMT}, as shown in the following example:
9058
by supplying a different format specification
9059
for the value of @code{OFMT}, as shown in the following example:
9031
9062
$ @kbd{awk 'BEGIN @{}
9080
The entire list of arguments may optionally be enclosed in parentheses. The
9081
parentheses are necessary if any of the item expressions use the @samp{>}
9082
relational operator; otherwise, it can be confused with an output redirection
9083
(@pxref{Redirection}).
9109
As print @code{print}, the entire list of arguments may optionally be
9110
enclosed in parentheses. Here too, the parentheses are necessary if any
9111
of the item expressions use the @samp{>} relational operator; otherwise,
9112
it can be confused with an output redirection (@pxref{Redirection}).
9085
9114
@cindex format specifiers
9086
9115
The difference between @code{printf} and @code{print} is the @var{format}
9195
9224
(@pxref{Math Definitions}).
9197
9226
@item @code{%F}
9198
Like @samp{%f} but the infinity and ``not a number'' values are spelled
9227
Like @code{%f} but the infinity and ``not a number'' values are spelled
9199
9228
using uppercase letters.
9201
The @samp{%F} format is a POSIX extension to ISO C; not all systems
9202
support it. On those that don't, @command{gawk} uses @samp{%f} instead.
9230
The @code{%F} format is a POSIX extension to ISO C; not all systems
9231
support it. On those that don't, @command{gawk} uses @code{%f} instead.
9204
9233
@item @code{%g}, @code{%G}
9205
9234
Print a number in either scientific notation or in floating-point
9206
9235
notation, whichever uses fewer characters; if the result is printed in
9207
scientific notation, @samp{%G} uses @samp{E} instead of @samp{e}.
9236
scientific notation, @code{%G} uses @samp{E} instead of @samp{e}.
9209
9238
@item @code{%o}
9210
9239
Print an unsigned octal integer
9312
9341
Use an ``alternate form'' for certain control letters.
9313
For @samp{%o}, supply a leading zero.
9314
For @samp{%x} and @samp{%X}, supply a leading @samp{0x} or @samp{0X} for
9342
For @code{%o}, supply a leading zero.
9343
For @code{%x} and @code{%X}, supply a leading @code{0x} or @samp{0X} for
9315
9344
a nonzero result.
9316
For @samp{%e}, @samp{%E}, @samp{%f}, and @samp{%F}, the result always
9345
For @code{%e}, @code{%E}, @code{%f}, and @code{%F}, the result always
9317
9346
contains a decimal point.
9318
For @samp{%g} and @samp{%G}, trailing zeros are not removed from the result.
9347
For @code{%g} and @code{%G}, trailing zeros are not removed from the result.
9321
A leading @samp{0} (zero) acts as a flag that indicates that output should be
9350
A leading @samp{0} (zero) acts as a flag indicating that output should be
9322
9351
padded with zeros instead of spaces.
9323
9352
This applies only to the numeric output formats.
9324
9353
This flag only has an effect when the field width is wider than the
9649
9678
report = "mail bug-system"
9650
print "Awk script failed:", $0 | report
9651
m = ("at record number " FNR " of " FILENAME)
9679
print("Awk script failed:", $0) | report
9680
print("at record number", FNR, "of", FILENAME) | report
9656
The message is built using string concatenation and saved in the variable
9657
@code{m}. It's then sent down the pipeline to the @command{mail} program.
9658
(The parentheses group the items to concatenate---see
9659
@ref{Concatenation}.)
9661
9684
The @code{close()} function is called here because it's a good idea to close
9662
9685
the pipe as soon as all the intended output has been sent to it.
9663
9686
@xref{Close Files And Pipes},
9827
9835
@cindex files, descriptors, See file descriptors
9829
9837
Running programs conventionally have three input and output streams
9830
already available to them for reading and writing. These are known as
9831
the @dfn{standard input}, @dfn{standard output}, and @dfn{standard error
9832
output}. These streams are, by default, connected to your keyboard and screen, but
9838
already available to them for reading and writing. These are known
9839
as the @dfn{standard input}, @dfn{standard output}, and @dfn{standard
9840
error output}. These open streams (and any other open file or pipe)
9841
are often referred to by the technical term @dfn{file descriptors}.
9843
These streams are, by default, connected to your keyboard and screen, but
9833
9844
they are often redirected with the shell, via the @samp{<}, @samp{<<},
9834
9845
@samp{>}, @samp{>>}, @samp{>&}, and @samp{|} operators. Standard error
9835
9846
is typically used for writing error messages; the reason there are two separate
9864
9875
``terminal,''@footnote{The ``tty'' in @file{/dev/tty} stands for
9865
9876
``Teletype,'' a serial terminal.} which on modern systems is a keyboard
9866
9877
and screen, not a serial console.)
9867
This usually has the same effect but not always: although the
9878
This generally has the same effect but not always: although the
9868
9879
standard error stream is usually the screen, it can be redirected; when
9869
9880
that happens, writing to the screen is not correct. In fact, if
9870
9881
@command{awk} is run from a background job, it may not have a
9871
9882
terminal at all.
9872
9883
Then opening @file{/dev/tty} fails.
9874
@command{gawk} provides special @value{FN}s for accessing the three standard
9875
streams. @value{COMMONEXT} It also provides syntax for accessing
9876
any other inherited open files. If the @value{FN} matches
9877
one of these special names when @command{gawk} redirects input or output,
9878
then it directly uses the stream that the @value{FN} stands for.
9879
These special @value{FN}s work for all operating systems that @command{gawk}
9885
@command{gawk}, BWK @command{awk} and @command{mawk} provide
9886
special @value{FN}s for accessing the three standard streams.
9887
If the @value{FN} matches one of these special names when @command{gawk}
9888
(or one of the others) redirects input or output, then it directly uses
9889
the descriptor that the @value{FN} stands for. These special
9890
@value{FN}s work for all operating systems that @command{gawk}
9880
9891
has been ported to, not just those that are POSIX-compliant:
9882
9893
@cindex common extensions, @code{/dev/stdin} special file
9899
9910
@item /dev/stderr
9900
9911
The standard error output (file descriptor 2).
9914
With these facilities,
9915
the proper way to write an error message then becomes:
9918
print "Serious error detected!" > "/dev/stderr"
9921
@cindex troubleshooting, quotes with file names
9922
Note the use of quotes around the @value{FN}.
9923
Like any other redirection, the value must be a string.
9924
It is a common error to omit the quotes, which leads
9925
to confusing results.
9927
@command{gawk} does not treat these @value{FN}s as special when
9928
in POSIX compatibility mode. However, since BWK @command{awk}
9929
supports them, @command{gawk} does support them even when
9930
invoked with the @option{--traditional} option (@pxref{Options}).
9933
@section Special @value{FFN}s in @command{gawk}
9935
@cindex @command{gawk}, file names in
9937
Besides access to standard input, stanard output, and standard error,
9938
@command{gawk} provides access to any open file descriptor.
9939
Additionally, there are special @value{FN}s reserved for
9943
* Other Inherited Files:: Accessing other open files with
9945
* Special Network:: Special files for network communications.
9946
* Special Caveats:: Things to watch out for.
9949
@node Other Inherited Files
9950
@subsection Accessing Other Open Files With @command{gawk}
9952
Besides the @code{/dev/stdin}, @code{/dev/stdout}, and @code{/dev/stderr}
9953
special @value{FN}s mentioned earlier, @command{gawk} provides syntax
9954
for accessing any other inherited open file:
9902
9957
@item /dev/fd/@var{N}
9903
9958
The file associated with file descriptor @var{N}. Such a file must
9904
9959
be opened by the program initiating the @command{awk} execution (typically
9909
9964
The @value{FN}s @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr}
9910
are aliases for @file{/dev/fd/0}, @file{/dev/fd/1}, and @file{/dev/fd/2},
9911
respectively. However, they are more self-explanatory.
9912
The proper way to write an error message in a @command{gawk} program
9913
is to use @file{/dev/stderr}, like this:
9916
print "Serious error detected!" > "/dev/stderr"
9919
@cindex troubleshooting, quotes with file names
9920
Note the use of quotes around the @value{FN}.
9921
Like any other redirection, the value must be a string.
9922
It is a common error to omit the quotes, which leads
9923
to confusing results.
9925
Finally, using the @code{close()} function on a @value{FN} of the
9965
are essentially aliases for @file{/dev/fd/0}, @file{/dev/fd/1}, and
9966
@file{/dev/fd/2}, respectively. However, those names are more self-explanatory.
9968
Note that using @code{close()} on a @value{FN} of the
9926
9969
form @code{"/dev/fd/@var{N}"}, for file descriptor numbers
9927
9970
above two, does actually close the given file descriptor.
9929
The @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr}
9930
special files are also recognized internally by several other
9931
versions of @command{awk}.
9933
9972
@node Special Network
9934
9973
@subsection Special Files for Network Communications
9935
9974
@cindex networks, support for
9958
9997
@node Special Caveats
9959
9998
@subsection Special @value{FFN} Caveats
9961
Here is a list of things to bear in mind when using the
10000
Here are some things to bear in mind when using the
9962
10001
special @value{FN}s that @command{gawk} provides:
9964
10003
@itemize @value{BULLET}
9965
10004
@cindex compatibility mode (@command{gawk}), file names
9966
10005
@cindex file names, in compatibility mode
9968
Recognition of these special @value{FN}s is disabled if @command{gawk} is in
9969
compatibility mode (@pxref{Options}).
10007
Recognition of the @value{FN}s for the three standard pre-opened
10008
files is disabled only in POSIX mode.
10011
Recognition of the other special @value{FN}s is disabled if @command{gawk} is in
10012
compatibility mode (either @option{--traditional} or @option{--posix};
9972
10016
@command{gawk} @emph{always}
10287
10332
Output from both @code{print} and @code{printf} may be redirected to
10288
files, pipes, and co-processes.
10333
files, pipes, and coprocesses.
10291
10336
@command{gawk} provides special file names for access to standard input,
10292
10337
output and error, and for network communications.
10295
Use @code{close()} to close open file, pipe and co-process redirections.
10296
For co-processes, it is possible to close only one direction of the
10340
Use @code{close()} to close open file, pipe and coprocess redirections.
10341
For coprocesses, it is possible to close only one direction of the
10297
10342
communications.
12048
12092
The following examples print @samp{1} when the comparison between
12049
12093
the two different constants is true, @samp{0} otherwise:
12095
@c 22.9.2014: Tested with mawk and BWK awk, got same results.
12052
$ @kbd{echo ' +3.14' | gawk '@{ print $0 == " +3.14" @}'} @ii{True}
12054
$ @kbd{echo ' +3.14' | gawk '@{ print $0 == "+3.14" @}'} @ii{False}
12056
$ @kbd{echo ' +3.14' | gawk '@{ print $0 == "3.14" @}'} @ii{False}
12058
$ @kbd{echo ' +3.14' | gawk '@{ print $0 == 3.14 @}'} @ii{True}
12060
$ @kbd{echo ' +3.14' | gawk '@{ print $1 == " +3.14" @}'} @ii{False}
12062
$ @kbd{echo ' +3.14' | gawk '@{ print $1 == "+3.14" @}'} @ii{True}
12064
$ @kbd{echo ' +3.14' | gawk '@{ print $1 == "3.14" @}'} @ii{False}
12066
$ @kbd{echo ' +3.14' | gawk '@{ print $1 == 3.14 @}'} @ii{True}
12097
$ @kbd{echo ' +3.14' | awk '@{ print($0 == " +3.14") @}'} @ii{True}
12099
$ @kbd{echo ' +3.14' | awk '@{ print($0 == "+3.14") @}'} @ii{False}
12101
$ @kbd{echo ' +3.14' | awk '@{ print($0 == "3.14") @}'} @ii{False}
12103
$ @kbd{echo ' +3.14' | awk '@{ print($0 == 3.14) @}'} @ii{True}
12105
$ @kbd{echo ' +3.14' | awk '@{ print($1 == " +3.14") @}'} @ii{False}
12107
$ @kbd{echo ' +3.14' | awk '@{ print($1 == "+3.14") @}'} @ii{True}
12109
$ @kbd{echo ' +3.14' | awk '@{ print($1 == "3.14") @}'} @ii{False}
12111
$ @kbd{echo ' +3.14' | awk '@{ print($1 == 3.14) @}'} @ii{True}
12799
12846
Locales can affect how dates and times are formatted (@pxref{Time
12800
12847
Functions}). For example, a common way to abbreviate the date September
12801
12848
4, 2015 in the United States is ``9/4/15.'' In many countries in
12802
Europe, however, it is abbreviated ``4.9.15.'' Thus, the @samp{%x}
12849
Europe, however, it is abbreviated ``4.9.15.'' Thus, the @code{%x}
12803
12850
specification in a @code{"US"} locale might produce @samp{9/4/15},
12804
12851
while in a @code{"EUROPE"} locale, it might produce @samp{4.9.15}.
12842
12889
@command{awk} provides the usual arithmetic operators (addition,
12843
12890
subtraction, multiplication, division, modulus), and unary plus and minus.
12844
It also provides comparison operators, boolean operators, and regexp
12891
It also provides comparison operators, boolean operators, array membership
12892
testing, and regexp
12845
12893
matching operators. String concatenation is accomplished by placing
12846
12894
two expressions next to each other; there is no explicit operator.
12847
12895
The three-operand @samp{?:} operator provides an ``if-else'' test within
13026
13074
@cindex regexp constants, as patterns
13027
13075
@cindex patterns, regexp constants as
13028
13076
A regexp constant as a pattern is also a special case of an expression
13029
pattern. The expression @code{/li/} has the value one if @samp{li}
13030
appears in the current input record. Thus, as a pattern, @code{/li/}
13077
pattern. The expression @samp{/li/} has the value one if @samp{li}
13078
appears in the current input record. Thus, as a pattern, @samp{/li/}
13031
13079
matches any record containing @samp{li}.
13033
13081
@cindex Boolean expressions, as patterns
13289
13337
rule. It contains the number of fields from the last input record.
13290
13338
Most probably due to an oversight, the standard does not say that @code{$0}
13291
13339
is also preserved, although logically one would think that it should be.
13292
In fact, @command{gawk} does preserve the value of @code{$0} for use in
13293
@code{END} rules. Be aware, however, that BWK @command{awk}, and possibly
13294
other implementations, do not.
13340
In fact, all of BWK @command{awk}, @command{mawk}, and @command{gawk}
13341
preserve the value of @code{$0} for use in @code{END} rules. Be aware,
13342
however, that some other implementations and many older versions
13343
of Unix @command{awk} do not.
13296
13345
The third point follows from the first two. The meaning of @samp{print}
13297
13346
inside a @code{BEGIN} or @code{END} rule is the same as always:
13708
13757
is not zero and not a null string.)
13710
13759
After @var{body} has been executed,
13711
@var{condition} is tested again, and if it is still true, @var{body} is
13712
executed again. This process repeats until the @var{condition} is no longer
13713
true. If the @var{condition} is initially false, the body of the loop is
13714
never executed and @command{awk} continues with the statement following
13760
@var{condition} is tested again, and if it is still true, @var{body}
13761
executes again. This process repeats until the @var{condition} is no longer
13762
true. If the @var{condition} is initially false, the body of the loop
13763
never executes and @command{awk} continues with the statement following
13716
13765
This example prints the first three fields of each record, one per line:
13725
13775
@}' inventory-shipped
14163
14214
@cindex functions, user-defined, @code{next}/@code{nextfile} statements and
14164
14215
According to the POSIX standard, the behavior is undefined if the
14165
14216
@code{next} statement is used in a @code{BEGIN} or @code{END} rule.
14166
@command{gawk} treats it as a syntax error. Although POSIX permits it,
14217
@command{gawk} treats it as a syntax error. Although POSIX does not disallow it,
14167
14218
most other @command{awk} implementations don't allow the @code{next}
14168
14219
statement inside function bodies (@pxref{User-defined}). Just as with any
14169
14220
other @code{next} statement, a @code{next} statement inside a function
14227
14278
@cindex @code{nextfile} statement, user-defined functions and
14228
14279
@cindex Brian Kernighan's @command{awk}
14229
14280
@cindex @command{mawk} utility
14230
The current version of BWK @command{awk}, and @command{mawk} (@pxref{Other
14231
Versions}) also support @code{nextfile}. However, they don't allow the
14281
The current version of BWK @command{awk}, and @command{mawk}
14282
also support @code{nextfile}. However, they don't allow the
14232
14283
@code{nextfile} statement inside function bodies (@pxref{User-defined}).
14233
14284
@command{gawk} does; a @code{nextfile} inside a function body reads the
14234
14285
next record and starts processing it with the first rule in the program,
14900
14951
@itemize @value{BULLET}
14953
It may be used to provide a timeout when reading from any
14954
open input file, pipe, or coprocess.
14955
@xref{Read Timeout}, for more information.
14902
14958
It may be used to cause coprocesses to communicate over pseudo-ttys
14903
14959
instead of through two-way pipes; this is discussed further in
14904
14960
@ref{Two-way I/O}.
14907
It may be used to provide a timeout when reading from any
14908
open input file, pipe, or coprocess.
14909
@xref{Read Timeout}, for more information.
14912
14963
@cindex @code{RLENGTH} variable
15245
15302
from an action (or function body) it transfers control to the
15246
15303
@code{END} statements. From an @code{END} statement body, it exits
15247
15304
immediately. You may pass an optional numeric value to be used
15248
at @command{awk}'s exit status.
15305
as @command{awk}'s exit status.
15251
15308
Some built-in variables provide control over @command{awk}, mainly for I/O.
15252
15309
Other variables convey information from @command{awk} to your program.
15312
@code{ARGC} and @code{ARGV} make the command-line arguments available
15313
to your program. Manipulating them from a @code{BEGIN} rule lets you
15314
control how @command{awk} will process the provided @value{DF}s.
15271
15333
for sorting arrays, and ends with a brief description of @command{gawk}'s
15272
15334
ability to support true arrays of arrays.
15274
@cindex variables, names of
15275
@cindex functions, names of
15276
@cindex arrays, names of, and names of functions/variables
15277
@cindex names, arrays/variables
15278
@cindex namespace issues
15279
@command{awk} maintains a single set
15280
of names that may be used for naming variables, arrays, and functions
15281
(@pxref{User-defined}).
15282
Thus, you cannot have a variable and an array with the same name in the
15283
same @command{awk} program.
15286
15337
* Array Basics:: The basics of arrays.
15338
* Numeric Array Subscripts:: How to use numbers as subscripts in
15340
* Uninitialized Subscripts:: Using Uninitialized variables as subscripts.
15287
15341
* Delete:: The @code{delete} statement removes an element
15288
15342
from an array.
15289
* Numeric Array Subscripts:: How to use numbers as subscripts in
15291
* Uninitialized Subscripts:: Using Uninitialized variables as subscripts.
15292
15343
* Multidimensional:: Emulating multidimensional arrays in
15293
15344
@command{awk}.
15294
15345
* Arrays of Arrays:: True multidimensional arrays.
16035
16088
In addition, @command{gawk} provides built-in functions for
16036
16089
sorting arrays; see @ref{Array Sorting Functions}.
16039
@section The @code{delete} Statement
16040
@cindex @code{delete} statement
16041
@cindex deleting elements in arrays
16042
@cindex arrays, elements, deleting
16043
@cindex elements in arrays, deleting
16045
To remove an individual element of an array, use the @code{delete}
16049
delete @var{array}[@var{index-expression}]
16052
Once an array element has been deleted, any value the element once
16053
had is no longer available. It is as if the element had never
16054
been referred to or been given a value.
16055
The following is an example of deleting elements in an array:
16058
for (i in frequencies)
16059
delete frequencies[i]
16063
This example removes all the elements from the array @code{frequencies}.
16064
Once an element is deleted, a subsequent @code{for} statement to scan the array
16065
does not report that element and the @code{in} operator to check for
16066
the presence of that element returns zero (i.e., false):
16071
print "This will never be printed"
16074
@cindex null strings, and deleting array elements
16075
It is important to note that deleting an element is @emph{not} the
16076
same as assigning it a null value (the empty string, @code{""}).
16082
print "This is printed, even though foo[4] is empty"
16085
@cindex lint checking, array elements
16086
It is not an error to delete an element that does not exist.
16087
However, if @option{--lint} is provided on the command line
16089
@command{gawk} issues a warning message when an element that
16090
is not in the array is deleted.
16092
@cindex common extensions, @code{delete} to delete entire arrays
16093
@cindex extensions, common@comma{} @code{delete} to delete entire arrays
16094
@cindex arrays, deleting entire contents
16095
@cindex deleting entire arrays
16096
@cindex @code{delete} @var{array}
16097
@cindex differences in @command{awk} and @command{gawk}, array elements, deleting
16098
All the elements of an array may be deleted with a single statement
16099
by leaving off the subscript in the @code{delete} statement,
16107
Using this version of the @code{delete} statement is about three times
16108
more efficient than the equivalent loop that deletes each element one
16111
@cindex Brian Kernighan's @command{awk}
16114
using @code{delete} without a subscript was a @command{gawk} extension.
16115
As of September, 2012, it was accepted for
16116
inclusion into the POSIX standard. See @uref{http://austingroupbugs.net/view.php?id=544,
16117
the Austin Group website}. This form of the @code{delete} statement is also supported
16118
by BWK @command{awk} and @command{mawk}, as well as
16119
by a number of other implementations (@pxref{Other Versions}).
16122
@cindex portability, deleting array elements
16123
@cindex Brennan, Michael
16124
The following statement provides a portable but nonobvious way to clear
16125
out an array:@footnote{Thanks to Michael Brennan for pointing this out.}
16131
@cindex @code{split()} function, array elements@comma{} deleting
16132
The @code{split()} function
16133
(@pxref{String Functions})
16134
clears out the target array first. This call asks it to split
16135
apart the null string. Because there is no data to split out, the
16136
function simply clears the array and then returns.
16139
Deleting an array does not change its type; you cannot
16140
delete an array and then use the array's name as a scalar
16141
(i.e., a regular variable). For example, the following does not work:
16150
16091
@node Numeric Array Subscripts
16151
16092
@section Using Numbers to Subscript Arrays
16274
16215
if @option{--lint} is provided
16275
16216
on the command line (@pxref{Options}).
16219
@section The @code{delete} Statement
16220
@cindex @code{delete} statement
16221
@cindex deleting elements in arrays
16222
@cindex arrays, elements, deleting
16223
@cindex elements in arrays, deleting
16225
To remove an individual element of an array, use the @code{delete}
16229
delete @var{array}[@var{index-expression}]
16232
Once an array element has been deleted, any value the element once
16233
had is no longer available. It is as if the element had never
16234
been referred to or been given a value.
16235
The following is an example of deleting elements in an array:
16238
for (i in frequencies)
16239
delete frequencies[i]
16243
This example removes all the elements from the array @code{frequencies}.
16244
Once an element is deleted, a subsequent @code{for} statement to scan the array
16245
does not report that element and the @code{in} operator to check for
16246
the presence of that element returns zero (i.e., false):
16251
print "This will never be printed"
16254
@cindex null strings, and deleting array elements
16255
It is important to note that deleting an element is @emph{not} the
16256
same as assigning it a null value (the empty string, @code{""}).
16262
print "This is printed, even though foo[4] is empty"
16265
@cindex lint checking, array elements
16266
It is not an error to delete an element that does not exist.
16267
However, if @option{--lint} is provided on the command line
16269
@command{gawk} issues a warning message when an element that
16270
is not in the array is deleted.
16272
@cindex common extensions, @code{delete} to delete entire arrays
16273
@cindex extensions, common@comma{} @code{delete} to delete entire arrays
16274
@cindex arrays, deleting entire contents
16275
@cindex deleting entire arrays
16276
@cindex @code{delete} @var{array}
16277
@cindex differences in @command{awk} and @command{gawk}, array elements, deleting
16278
All the elements of an array may be deleted with a single statement
16279
by leaving off the subscript in the @code{delete} statement,
16287
Using this version of the @code{delete} statement is about three times
16288
more efficient than the equivalent loop that deletes each element one
16291
This form of the @code{delete} statement is also supported
16292
by BWK @command{awk} and @command{mawk}, as well as
16293
by a number of other implementations.
16295
@cindex Brian Kernighan's @command{awk}
16297
For many years, using @code{delete} without a subscript was a common
16298
extension. In September, 2012, it was accepted for inclusion into the
16299
POSIX standard. See @uref{http://austingroupbugs.net/view.php?id=544,
16300
the Austin Group website}.
16303
@cindex portability, deleting array elements
16304
@cindex Brennan, Michael
16305
The following statement provides a portable but nonobvious way to clear
16306
out an array:@footnote{Thanks to Michael Brennan for pointing this out.}
16312
@cindex @code{split()} function, array elements@comma{} deleting
16313
The @code{split()} function
16314
(@pxref{String Functions})
16315
clears out the target array first. This call asks it to split
16316
apart the null string. Because there is no data to split out, the
16317
function simply clears the array and then returns.
16320
Deleting all the elements from an array does not change its type; you cannot
16321
clear an array and then use the array's name as a scalar
16322
(i.e., a regular variable). For example, the following does not work:
16277
16331
@node Multidimensional
16278
16332
@section Multidimensional Arrays
16460
16514
Each subarray and the main array can be of different length. In fact, the
16461
16515
elements of an array or its subarray do not all have to have the same
16462
16516
type. This means that the main array and any of its subarrays can be
16463
non-rectangular, or jagged in structure. One can assign a scalar value to
16464
the index @code{4} of the main array @code{a}:
16517
non-rectangular, or jagged in structure. You can assign a scalar value to
16518
the index @code{4} of the main array @code{a}, even though @code{a[1]}
16519
is itself an array and not a scalar:
16467
16522
a[4] = "An element in a jagged array"
17347
17407
@code{@var{array}[1]}, the second piece in @code{@var{array}[2]}, and so
17348
17408
forth. The string value of the third argument, @var{fieldsep}, is
17349
17409
a regexp describing where to split @var{string} (much as @code{FS} can
17350
be a regexp describing where to split input records;
17351
@pxref{Regexp Field Splitting}).
17410
be a regexp describing where to split input records).
17352
17411
If @var{fieldsep} is omitted, the value of @code{FS} is used.
17353
17412
@code{split()} returns the number of elements created.
17354
17413
@var{seps} is a @command{gawk} extension with @code{@var{seps}[@var{i}]}
17643
17702
@code{toupper("MiXeD cAsE 123")} returns @code{"MIXED CASE 123"}.
17705
@cindex sidebar, Matching the Null String
17708
<sidebar><title>Matching the Null String</title>
17711
@cindex matching, null strings
17712
@cindex null strings, matching
17713
@cindex @code{*} (asterisk), @code{*} operator, null strings@comma{} matching
17714
@cindex asterisk (@code{*}), @code{*} operator, null strings@comma{} matching
17716
In @command{awk}, the @samp{*} operator can match the null string.
17717
This is particularly important for the @code{sub()}, @code{gsub()},
17718
and @code{gensub()} functions. For example:
17721
$ @kbd{echo abc | awk '@{ gsub(/m*/, "X"); print @}'}
17726
Although this makes a certain amount of sense, it can be surprising.
17735
@center @b{Matching the Null String}
17738
@cindex matching, null strings
17739
@cindex null strings, matching
17740
@cindex @code{*} (asterisk), @code{*} operator, null strings@comma{} matching
17741
@cindex asterisk (@code{*}), @code{*} operator, null strings@comma{} matching
17743
In @command{awk}, the @samp{*} operator can match the null string.
17744
This is particularly important for the @code{sub()}, @code{gsub()},
17745
and @code{gensub()} functions. For example:
17748
$ @kbd{echo abc | awk '@{ gsub(/m*/, "X"); print @}'}
17753
Although this makes a certain amount of sense, it can be surprising.
17646
17758
@node Gory Details
17647
17759
@subsubsection More About @samp{\} and @samp{&} with @code{sub()}, @code{gsub()}, and @code{gensub()}
17947
18059
we recommend the use of @command{gawk} and @code{gensub()} when you have
17948
18060
to do substitutions.
17950
@cindex sidebar, Matching the Null String
17953
<sidebar><title>Matching the Null String</title>
17956
@cindex matching, null strings
17957
@cindex null strings, matching
17958
@cindex @code{*} (asterisk), @code{*} operator, null strings@comma{} matching
17959
@cindex asterisk (@code{*}), @code{*} operator, null strings@comma{} matching
17961
In @command{awk}, the @samp{*} operator can match the null string.
17962
This is particularly important for the @code{sub()}, @code{gsub()},
17963
and @code{gensub()} functions. For example:
17966
$ @kbd{echo abc | awk '@{ gsub(/m*/, "X"); print @}'}
17971
Although this makes a certain amount of sense, it can be surprising.
17980
@center @b{Matching the Null String}
17983
@cindex matching, null strings
17984
@cindex null strings, matching
17985
@cindex @code{*} (asterisk), @code{*} operator, null strings@comma{} matching
17986
@cindex asterisk (@code{*}), @code{*} operator, null strings@comma{} matching
17988
In @command{awk}, the @samp{*} operator can match the null string.
17989
This is particularly important for the @code{sub()}, @code{gsub()},
17990
and @code{gensub()} functions. For example:
17993
$ @kbd{echo abc | awk '@{ gsub(/m*/, "X"); print @}'}
17998
Although this makes a certain amount of sense, it can be surprising.
18002
18062
@node I/O Functions
18003
18063
@subsection Input/Output Functions
18004
18064
@cindex input/output functions
18052
18112
@cindex extensions, common@comma{} @code{fflush()} function
18053
18113
@cindex Brian Kernighan's @command{awk}
18054
@code{fflush()} was added to BWK @command{awk} in
18055
April of 1992. For two decades, it was not part of the POSIX standard.
18056
As of December, 2012, it was accepted for inclusion into the POSIX
18114
Brian Kernighan added @code{fflush()} to his @command{awk} in April
18115
of 1992. For two decades, it was a common extension. In December,
18116
2012, it was accepted for inclusion into the POSIX standard.
18058
18117
See @uref{http://austingroupbugs.net/view.php?id=634, the Austin Group website}.
18060
18119
POSIX standardizes @code{fflush()} as follows: If there
18910
18969
right by three bits, you end up with @samp{00010111}.@footnote{This example
18911
18970
shows that 0's come in on the left side. For @command{gawk}, this is
18912
18971
always true, but in some languages, it's possible to have the left side
18913
fill with 1's. Caveat emptor.}
18914
18973
@c Purposely decided to use 0's and 1's here. 2/2001.
18916
again with @samp{10111001} and shift it left by three bits, you end up
18917
with @samp{11001000}.
18918
@command{gawk} provides built-in functions that implement the
18919
bitwise operations just described. They are:
18974
If you start over again with @samp{10111001} and shift it left by three
18975
bits, you end up with @samp{11001000}. The following list describes
18976
@command{gawk}'s built-in functions that implement the bitwise operations.
18977
Optional parameters are enclosed in square brackets ([ ]):
18921
18979
@cindex @command{gawk}, bitwise operations in
18923
18981
@cindexgawkfunc{and}
18924
18982
@cindex bitwise AND
18925
@item @code{and(@var{v1}, @var{v2}} [@code{,} @dots{}]@code{)}
18983
@item @code{and(}@var{v1}@code{,} @var{v2} [@code{,} @dots{}]@code{)}
18926
18984
Return the bitwise AND of the arguments. There must be at least two.
18928
18986
@cindexgawkfunc{compl}
19086
19144
(not discussed yet; @pxref{User-defined}), to test if a parameter is an
19087
19145
array or not.
19089
Note, however, that using @code{isarray()} at the global level to test
19148
Using @code{isarray()} at the global level to test
19090
19149
variables makes no sense. Since you are the one writing the program, you
19091
19150
are supposed to know if your variables are arrays or not. And in fact,
19092
19151
due to the way @command{gawk} works, if you pass the name of a variable
19093
19152
that has not been previously used to @code{isarray()}, @command{gawk}
19094
will end up turning it into a scalar.
19153
ends up turning it into a scalar.
19096
19156
@node I18N Functions
19097
19157
@subsection String-Translation Functions
20216
20282
POSIX @command{awk} provides three kinds of built-in functions: numeric,
20217
string, and I/O. @command{gawk} provides functions that work with values
20218
representing time, do bit manipulation, sort arrays, and internationalize
20219
and localize programs. @command{gawk} also provides several extensions to
20220
some of standard functions, typically in the form of additional arguments.
20283
string, and I/O. @command{gawk} provides functions that sort arrays, work
20284
with values representing time, do bit manipulation, determine variable
20285
type (array vs.@: scalar), and internationalize and localize programs.
20286
@command{gawk} also provides several extensions to some of standard
20287
functions, typically in the form of additional arguments.
20223
20290
Functions accept zero or more arguments and return a value. The
20591
20659
The function first looks for C-style octal numbers (base 8).
20592
20660
If the input string matches a regular expression describing octal
20593
20661
numbers, then @code{mystrtonum()} loops through each character in the
20594
string. It sets @code{k} to the index in @code{"01234567"} of the current
20595
octal digit. Since the return value is one-based, the @samp{k--}
20596
adjusts @code{k} so it can be used in computing the return value.
20662
string. It sets @code{k} to the index in @code{"1234567"} of the current
20664
The return value will either be the same number as the digit, or zero
20665
if the character is not there, which will be true for a @samp{0}.
20666
This is safe, since the regexp test in the @code{if} ensures that
20667
only octal values are converted.
20598
20669
Similar logic applies to the code that checks for and converts a
20599
20670
hexadecimal value, which starts with @samp{0x} or @samp{0X}.
21431
This code relies on the @code{ARGIND} variable
21432
(@pxref{Auto-set}),
21433
which is specific to @command{gawk}.
21434
If you are not using
21435
@command{gawk}, you can use ideas presented in
21437
the previous @value{SECTION}
21440
@ref{Filetrans Function},
21442
to either update @code{ARGIND} on your own
21443
or modify this code as appropriate.
21445
The @code{rewind()} function also relies on the @code{nextfile} keyword
21446
(@pxref{Nextfile Statement}). Because of this, you should not call it
21447
from an @code{ENDFILE} rule. (This isn't necessary anyway, since as soon
21448
as an @code{ENDFILE} rule finishes @command{gawk} goes to the next file!)
21504
The @code{rewind()} function relies on the @code{ARGIND} variable
21505
(@pxref{Auto-set}), which is specific to @command{gawk}. It also
21506
relies on the @code{nextfile} keyword (@pxref{Nextfile Statement}).
21507
Because of this, you should not call it from an @code{ENDFILE} rule.
21508
(This isn't necessary anyway, since as soon as an @code{ENDFILE} rule
21509
finishes @command{gawk} goes to the next file!)
21450
21511
@node File Checking
21451
21512
@subsection Checking for Readable @value{DDF}s
21964
22030
etc., as its own options.
21966
22032
@quotation NOTE
21967
After @code{getopt()} is through, it is the responsibility of the
21968
user level code to clear out all the elements of @code{ARGV} from 1
22033
After @code{getopt()} is through,
22034
user level code must clear out all the elements of @code{ARGV} from 1
21969
22035
to @code{Optind}, so that @command{awk} does not try to process the
21970
22036
command-line options as @value{FN}s.
21971
22037
@end quotation
22039
Using @samp{#!} with the @option{-E} option may help avoid
22040
conflicts between your program's options and @command{gawk}'s options,
22041
since @option{-E} causes @command{gawk} to abandon processing of
22043
(@pxref{Executable Scripts}, and @pxref{Options}).
21973
22045
Several of the sample programs presented in
21974
22046
@ref{Sample Programs},
21975
22047
use @code{getopt()} to process their arguments.
22214
22286
routine, we have chosen to put it in @file{/usr/local/libexec/awk};
22215
22287
however, you might want it to be in a different directory on your system.
22217
The function @code{_pw_init()} keeps three copies of the user information
22218
in three associative arrays. The arrays are indexed by username
22289
The function @code{_pw_init()} fills three copies of the user information
22290
into three associative arrays. The arrays are indexed by username
22219
22291
(@code{_pw_byname}), by user ID number (@code{_pw_byuid}), and by order of
22220
22292
occurrence (@code{_pw_bycount}).
22221
22293
The variable @code{_pw_inited} is used for efficiency, since @code{_pw_init()}
22222
22294
needs to be called only once.
22296
@cindex @code{PROCINFO} array, testing the field splitting
22224
22297
@cindex @code{getline} command, @code{_pw_init()} function
22225
22298
Because this function uses @code{getline} to read information from
22226
22299
@command{pwcat}, it first saves the values of @code{FS}, @code{RS}, and @code{$0}.
22228
22301
with @code{FIELDWIDTHS} is in effect or not.
22229
22302
Doing so is necessary, since these functions could be called
22230
22303
from anywhere within a user's program, and the user may have his
22232
own way of splitting records and fields.
22234
@cindex @code{PROCINFO} array, testing the field splitting
22235
The @code{using_fw} variable checks @code{PROCINFO["FS"]}, which
22236
is @code{"FIELDWIDTHS"} if field splitting is being done with
22237
@code{FIELDWIDTHS}. This makes it possible to restore the correct
22304
or her own way of splitting records and fields.
22305
This makes it possible to restore the correct
22238
22306
field-splitting mechanism later. The test can only be true for
22239
22307
@command{gawk}. It is false if using @code{FS} or @code{FPAT},
22240
22308
or on some other @command{awk} implementation.
23064
23131
# Requires getopt() and join() library functions
23067
function usage( e1, e2)
23069
e1 = "usage: cut [-f list] [-d c] [-s] [files...]"
23070
e2 = "usage: cut [-c list] [files...]"
23071
print e1 > "/dev/stderr"
23072
print e2 > "/dev/stderr"
23136
print("usage: cut [-f list] [-d c] [-s] [files...]") > "/dev/stderr"
23137
print("usage: cut [-c list] [files...]") > "/dev/stderr"
23080
The variables @code{e1} and @code{e2} are used so that the function
23081
fits nicely on the @value{PAGE}.
23083
23144
@cindex @code{BEGIN} pattern, running @command{awk} programs and
23084
23145
@cindex @code{FS} variable, running @command{awk} programs and
23085
23146
Next comes a @code{BEGIN} rule that parses the command-line options.
24160
24210
The following function, @code{are_equal()}, compares the current line,
24162
previous line, @code{last}. It handles skipping fields and characters.
24163
If no field count and no character count are specified, @code{are_equal()}
24164
simply returns one or zero depending upon the result of a simple string
24165
comparison of @code{last} and @code{$0}. Otherwise, things get more
24167
If fields have to be skipped, each line is broken into an array using
24169
(@pxref{String Functions});
24170
the desired fields are then joined back into a line using @code{join()}.
24171
The joined lines are stored in @code{clast} and @code{cline}.
24172
If no fields are skipped, @code{clast} and @code{cline} are set to
24173
@code{last} and @code{$0}, respectively.
24174
Finally, if characters are skipped, @code{substr()} is used to strip off the
24175
leading @code{charcount} characters in @code{clast} and @code{cline}. The
24176
two strings are then compared and @code{are_equal()} returns the result:
24211
@code{$0}, to the previous line, @code{last}. It handles skipping fields
24212
and characters. If no field count and no character count are specified,
24213
@code{are_equal()} returns one or zero depending upon the result of a
24214
simple string comparison of @code{last} and @code{$0}.
24216
Otherwise, things get more complicated. If fields have to be skipped,
24217
each line is broken into an array using @code{split()} (@pxref{String
24218
Functions}); the desired fields are then joined back into a line
24219
using @code{join()}. The joined lines are stored in @code{clast} and
24220
@code{cline}. If no fields are skipped, @code{clast} and @code{cline}
24221
are set to @code{last} and @code{$0}, respectively. Finally, if
24222
characters are skipped, @code{substr()} is used to strip off the leading
24223
@code{charcount} characters in @code{clast} and @code{cline}. The two
24224
strings are then compared and @code{are_equal()} returns the result:
24179
24227
@c file eg/prog/uniq.awk
24961
25016
Most of the work is done in the @code{printpage()} function.
24962
25017
The label lines are stored sequentially in the @code{line} array. But they
24963
25018
have to print horizontally; @code{line[1]} next to @code{line[6]},
24964
@code{line[2]} next to @code{line[7]}, and so on. Two loops are used to
25019
@code{line[2]} next to @code{line[7]}, and so on. Two loops
24965
25020
accomplish this. The outer loop, controlled by @code{i}, steps through
24966
25021
every 10 lines of data; this is each row of labels. The inner loop,
24967
25022
controlled by @code{j}, goes through the lines within the row.
25292
25347
This @value{DOCUMENT} is written in @uref{http://www.gnu.org/software/texinfo/, Texinfo},
25293
25348
the GNU project's document formatting language.
25294
25349
A single Texinfo source file can be used to produce both
25295
printed and online documentation.
25350
printed documentation, with @TeX{}, and online documentation.
25297
Texinfo is fully documented in the book
25352
(Texinfo is fully documented in the book
25298
25353
@cite{Texinfo---The GNU Documentation Format},
25299
25354
available from the Free Software Foundation,
25300
and also available @uref{http://www.gnu.org/software/texinfo/manual/texinfo/, online}.
25355
and also available @uref{http://www.gnu.org/software/texinfo/manual/texinfo/, online}.)
25301
25356
@end ifnotinfo
25303
The Texinfo language is described fully, starting with
25304
@inforef{Top, , Texinfo, texinfo,Texinfo---The GNU Documentation Format}.
25358
(The Texinfo language is described fully, starting with
25359
@inforef{Top, , Texinfo, texinfo,Texinfo---The GNU Documentation Format}.)
25307
25362
For our purposes, it is enough to know three things about Texinfo input
26119
26170
the same letters
26120
26171
(for example, ``babbling'' and ``blabbing'').
26122
An elegant algorithm is presented in Column 2, Problem C of
26123
Jon Bentley's @cite{Programming Pearls}, second edition.
26124
The idea is to give words that are anagrams a common signature,
26125
sort all the words together by their signature, and then print them.
26126
Dr.@: Bentley observes that taking the letters in each word and
26127
sorting them produces that common signature.
26173
Column 2, Problem C of Jon Bentley's @cite{Programming Pearls}, second
26174
edition, presents an elegant algorithm. The idea is to give words that
26175
are anagrams a common signature, sort all the words together by their
26176
signature, and then print them. Dr.@: Bentley observes that taking the
26177
letters in each word and sorting them produces that common signature.
26129
26179
The following program uses arrays of arrays to bring together
26130
26180
words with the same signature and array sorting to print the words
26635
26685
@cindex constants, nondecimal
26637
26687
If you run @command{gawk} with the @option{--non-decimal-data} option,
26638
you can have nondecimal constants in your input data:
26688
you can have nondecimal values in your input data:
26640
@c line break here for small book format
26642
26691
$ @kbd{echo 0123 123 0x123 |}
26643
> @kbd{gawk --non-decimal-data '@{ printf "%d, %d, %d\n",}
26644
> @kbd{$1, $2, $3 @}'}
26692
> @kbd{gawk --non-decimal-data '@{ printf "%d, %d, %d\n", $1, $2, $3 @}'}
26645
26693
@print{} 83, 123, 291