4
.TH GAWK 1 "Dec 28 1995" "Free Software Foundation" "Utility Commands"
6
gawk \- pattern scanning and processing language
9
[ POSIX or GNU style options ]
17
[ POSIX or GNU style options ]
25
is the GNU Project's implementation of the AWK programming language.
26
It conforms to the definition of the language in
27
the \*(PX 1003.2 Command Language And Utilities Standard.
28
This version in turn is based on the description in
29
.IR "The AWK Programming Language" ,
30
by Aho, Kernighan, and Weinberger,
31
with the additional features found in the System V Release 4 version
35
also provides more recent Bell Labs
37
extensions, and some GNU-specific extensions.
39
The command line consists of options to
41
itself, the AWK program text (if not supplied via the
45
options), and values to be made
50
pre-defined AWK variables.
54
options may be either the traditional \*(PX one letter options,
55
or the GNU style long options. \*(PX options start with a single ``\-'',
56
while long options start with ``\-\^\-''.
57
Long options are provided for both GNU-specific features and
58
for \*(PX mandated features.
60
Following the \*(PX standard,
62
options are supplied via arguments to the
66
options may be supplied, or multiple arguments may be supplied together
67
if they are separated by commas, or enclosed in quotes and separated
69
Case is ignored in arguments to the
74
option has a corresponding long option, as detailed below.
75
Arguments to long options are either joined with the option
78
sign, with no intervening spaces, or they may be provided in the
79
next command line argument.
80
Long options may be abbreviated, as long as the abbreviation
85
accepts the following options.
91
.BI \-\^\-field-separator " fs"
94
for the input field separator (the value of the
100
\fB\-v\fI var\fB\^=\^\fIval\fR
103
\fB\-\^\-assign \fIvar\fB\^=\^\fIval\fR
108
before execution of the program begins.
109
Such variable values are available to the
111
block of an AWK program.
114
.BI \-f " program-file"
117
.BI \-\^\-file " program-file"
118
Read the AWK program source from the file
120
instead of from the first command line argument.
132
Set various memory limits to the value
136
flag sets the maximum number of fields, and the
138
flag sets the maximum record size. These two flags and the
140
option are from the Bell Labs research version of \*(UX
146
has no pre-defined limits.
161
mode. In compatibility mode,
163
behaves identically to \*(UX
165
none of the GNU-specific extensions are recognized.
168
is preferred over the other forms of this option.
170
.BR "GNU EXTENSIONS" ,
171
below, for more information.
184
Print the short version of the GNU copyright information message on
198
Print a relatively short summary of the available options on
201
.IR "GNU Coding Standards" ,
202
these options cause an immediate, successful exit.)
209
Provide warnings about constructs that are
210
dubious or non-portable to other AWK implementations.
217
Provide warnings about constructs that are
218
not portable to the original version of Unix
221
.\" This option is left undocumented, on purpose.
228
Provide a moment of nostalgia for long time
240
mode, with the following additional restrictions:
245
escape sequences are not recognized.
259
cannot be used in place of
267
function is not available.
271
.B "\-W re\-interval"
274
.B \-\^\-re\-interval
276
.I "interval expressions"
277
in regular expression matching
279
.BR "Regular Expressions" ,
281
Interval expressions were not traditionally available in the
282
AWK language. The POSIX standard added them, to make
286
consistent with each other.
287
However, their use is likely
288
to break old AWK programs, so
290
only provides them if they are requested with this option, or when
295
.BI "\-W source " program-text
298
.BI \-\^\-source " program-text"
301
as AWK program source code.
302
This option allows the easy intermixing of library functions (used via the
306
options) with source code entered on the command line.
307
It is intended primarily for medium to large AWK programs used
312
form of this option uses the rest of the command line argument for
316
will be recognized in the same argument.
323
Print version information for this particular copy of
326
This is useful mainly for knowing if the current copy of
329
is up to date with respect to whatever the Free Software Foundation
331
This is also useful when reporting bugs.
333
.IR "GNU Coding Standards" ,
334
these options cause an immediate, successful exit.)
337
Signal the end of options. This is useful to allow further arguments to the
338
AWK program itself to start with a ``\-''.
339
This is mainly for consistency with the argument parsing convention used
340
by most other \*(PX programs.
342
In compatibility mode,
343
any other options are flagged as illegal, but are otherwise ignored.
344
In normal operation, as long as program text has been supplied, unknown
345
options are passed on to the AWK program in the
347
array for processing. This is particularly useful for running AWK
348
programs via the ``#!'' executable interpreter mechanism.
349
.SH AWK PROGRAM EXECUTION
351
An AWK program consists of a sequence of pattern-action statements
352
and optional function definitions.
355
\fIpattern\fB { \fIaction statements\fB }\fR
357
\fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements\fB }\fR
361
first reads the program source from the
366
or from the first non-option argument on the command line.
371
options may be used multiple times on the command line.
373
will read the program text as if all the
375
and command line source texts
376
had been concatenated together. This is useful for building libraries
377
of AWK functions, without having to include them in each new AWK
378
program that uses them. It also provides the ability to mix library
379
functions with command line programs.
381
The environment variable
383
specifies a search path to use when finding source files named with
386
option. If this variable does not exist, the default path is
387
\fB".:/usr/local/share/awk"\fR.
388
(The actual directory may vary, depending upon how
390
was built and installed.)
391
If a file name given to the
393
option contains a ``/'' character, no path search is performed.
396
executes AWK programs in the following order.
398
all variable assignments specified via the
400
option are performed.
403
compiles the program into an internal form.
406
executes the code in the
409
and then proceeds to read
410
each file named in the
413
If there are no files named on the command line,
415
reads the standard input.
417
If a filename on the command line has the form
419
it is treated as a variable assignment. The variable
421
will be assigned the value
423
(This happens after any
425
block(s) have been run.)
426
Command line variable assignment
427
is most useful for dynamically assigning values to the variables
428
AWK uses to control how input is broken into fields and records. It
429
is also useful for controlling state if multiple passes are needed over
432
If the value of a particular element of
438
For each record in the input,
440
tests to see if it matches any
443
For each pattern that the record matches, the associated
446
The patterns are tested in the order they occur in the program.
448
Finally, after all the input is exhausted,
450
executes the code in the
453
.SH VARIABLES, RECORDS AND FIELDS
454
AWK variables are dynamic; they come into existence when they are
455
first used. Their values are either floating-point numbers or strings,
457
depending upon how they are used. AWK also has one dimensional
458
arrays; arrays with multiple dimensions may be simulated.
459
Several pre-defined variables are set as a program
460
runs; these will be described as needed and summarized below.
462
Normally, records are separated by newline characters. You can control how
463
records are separated by assigning values to the built-in variable
467
is any single character, that character separates records.
470
is a regular expression. Text in the input that matches this
471
regular expression will separate the record.
472
However, in compatibility mode,
473
only the first character of its string
474
value is used for separating records.
477
is set to the null string, then records are separated by
481
is set to the null string, the newline character always acts as
482
a field separator, in addition to whatever value
487
As each input record is read,
489
splits the record into
491
using the value of the
493
variable as the field separator.
496
is a single character, fields are separated by that character.
499
is the null string, then each individual character becomes a
503
is expected to be a full regular expression.
504
In the special case that
506
is a single space, fields are separated
507
by runs of spaces and/or tabs.
508
Note that the value of
510
(see below) will also affect how fields are split when
512
is a regular expression, and how records are separated when
514
is a regular expression.
518
variable is set to a space separated list of numbers, each field is
519
expected to have fixed width, and
521
will split up the record using the specified widths. The value of
524
Assigning a new value to
528
and restores the default behavior.
530
Each field in the input record may be referenced by its position,
535
is the whole record. The value of a field may be assigned to as well.
536
Fields need not be referenced by constants:
546
prints the fifth field in the input record.
549
is set to the total number of fields in the input record.
551
References to non-existent fields (i.e. fields after
553
produce the null-string. However, assigning to a non-existent field
556
will increase the value of
558
create any intervening fields with the null string as their value, and
561
to be recomputed, with the fields being separated by the value of
563
References to negative numbered fields cause a fatal error.
564
.SS Built-in Variables
567
built-in variables are:
569
.TP \w'\fBFIELDWIDTHS\fR'u+1n
571
The number of command line arguments (does not include options to
573
or the program source).
578
of the current file being processed.
581
Array of command line arguments. The array is indexed from
585
Dynamically changing the contents of
587
can control the files used for data.
590
The conversion format for numbers, \fB"%.6g"\fR, by default.
593
An array containing the values of the current environment.
594
The array is indexed by the environment variables, each element being
595
the value of that variable (e.g., \fBENVIRON["HOME"]\fP might be
597
Changing this array does not affect the environment seen by programs which
599
spawns via redirection or the
602
(This may change in a future version of
604
.\" but don't hold your breath...
607
If a system error occurs either doing a redirection for
616
a string describing the error.
619
A white-space separated list of fieldwidths. When set,
621
parses the input into fields of fixed width, instead of using the
624
variable as the field separator.
625
The fixed field width facility is still experimental; the
626
semantics may change as
631
The name of the current input file.
632
If no files are specified on the command line, the value of
637
is undefined inside the
642
The input record number in the current input file.
645
The input field separator, a space by default. See
650
Controls the case-sensitivity of all regular expression
651
and string operations. If
653
has a non-zero value, then string comparisons and
654
pattern matching in rules,
657
record separating with
672
pre-defined functions will all ignore case when doing regular expression
675
is not equal to zero,
677
matches all of the strings \fB"ab"\fP, \fB"aB"\fP, \fB"Ab"\fP,
679
As with all AWK variables, the initial value of
681
is zero, so all regular expression and string
682
operations are normally case-sensitive.
683
Under Unix, the full ISO 8859-1 Latin-1 character set is used
690
only affected regular expression operations. It now affects string
694
The number of fields in the current input record.
697
The total number of input records seen so far.
700
The output format for numbers, \fB"%.6g"\fR, by default.
703
The output field separator, a space by default.
706
The output record separator, by default a newline.
709
The input record separator, by default a newline.
712
The record terminator.
716
to the input text that matched the character or regular expression
721
The index of the first character matched by
726
The length of the string matched by
731
The character used to separate multiple subscripts in array
732
elements, by default \fB"\e034"\fR.
735
Arrays are subscripted with an expression between square brackets
737
If the expression is an expression list
738
.RI ( expr ", " expr " ...)"
739
then the array subscript is a string consisting of the
740
concatenation of the (string) value of each expression,
741
separated by the value of the
744
This facility is used to simulate multiply dimensioned
749
i = "A";\^ j = "B";\^ k = "C"
751
x[i, j, k] = "hello, world\en"
755
assigns the string \fB"hello, world\en"\fR to the element of the array
757
which is indexed by the string \fB"A\e034B\e034C"\fR. All arrays in AWK
758
are associative, i.e. indexed by string values.
766
statement to see if an array has an index consisting of a particular
778
If the array has multiple subscripts, use
779
.BR "(i, j) in array" .
783
construct may also be used in a
785
loop to iterate over all the elements of an array.
787
An element may be deleted from an array using the
792
statement may also be used to delete the entire contents of an array,
793
just by specifying the array name without a subscript.
794
.SS Variable Typing And Conversion
797
may be (floating point) numbers, or strings, or both. How the
798
value of a variable is interpreted depends upon its context. If used in
799
a numeric expression, it will be treated as a number, if used as a string
800
it will be treated as a string.
802
To force a variable to be treated as a number, add 0 to it; to force it
803
to be treated as a string, concatenate it with the null string.
805
When a string must be converted to a number, the conversion is accomplished
808
A number is converted to a string by using the value of
810
as a format string for
812
with the numeric value of the variable as the argument.
813
However, even though all numbers in AWK are floating-point,
816
converted as integers. Thus, given
830
has a string value of \fB"12"\fR and not \fB"12.00"\fR.
833
performs comparisons as follows:
834
If two variables are numeric, they are compared numerically.
835
If one value is numeric and the other has a string value that is a
836
``numeric string,'' then comparisons are also done numerically.
837
Otherwise, the numeric value is converted to a string and a string
838
comparison is performed.
839
Two strings are compared, of course, as strings.
840
According to the \*(PX standard, even if two strings are
841
numeric strings, a numeric comparison is performed. However, this is
842
clearly incorrect, and
846
Note that string constants, such as \fB"57"\fP, are
848
numeric strings, they are string constants. The idea of ``numeric string''
849
only applies to fields,
856
elements and the elements of an array created by
858
that are numeric strings.
859
The basic idea is that
861
and only user input, that looks numeric,
862
should be treated that way.
864
Uninitialized variables have the numeric value 0 and the string value ""
865
(the null, or empty, string).
866
.SH PATTERNS AND ACTIONS
867
AWK is a line oriented language. The pattern comes first, and then the
868
action. Action statements are enclosed in
872
Either the pattern may be missing, or the action may be missing, but,
873
of course, not both. If the pattern is missing, the action will be
874
executed for every single record of input.
875
A missing action is equivalent to
881
which prints the entire record.
883
Comments begin with the ``#'' character, and continue until the
885
Blank lines may be used to separate statements.
886
Normally, a statement ends with a newline, however, this is not the
887
case for lines ending in
899
also have their statements automatically continued on the following line.
900
In other cases, a line can be continued by ending it with a ``\e'',
901
in which case the newline will be ignored.
903
Multiple statements may
904
be put on one line by separating them with a ``;''.
905
This applies to both the statements within the action part of a
906
pattern-action pair (the usual case),
907
and to the pattern-action statements themselves.
909
AWK patterns may be one of the following:
915
.BI / "regular expression" /
916
.I "relational expression"
917
.IB pattern " && " pattern
918
.IB pattern " || " pattern
919
.IB pattern " ? " pattern " : " pattern
922
.IB pattern1 ", " pattern2
929
are two special kinds of patterns which are not tested against
931
The action parts of all
933
patterns are merged as if all the statements had
934
been written in a single
936
block. They are executed before any
937
of the input is read. Similarly, all the
940
and executed when all the input is exhausted (or when an
942
statement is executed).
946
patterns cannot be combined with other patterns in pattern expressions.
950
patterns cannot have missing action parts.
953
.BI / "regular expression" /
954
patterns, the associated statement is executed for each input record that matches
955
the regular expression.
956
Regular expressions are the same as those in
958
and are summarized below.
961
.I "relational expression"
962
may use any of the operators defined below in the section on actions.
963
These generally test whether certain fields match certain regular expressions.
970
operators are logical AND, logical OR, and logical NOT, respectively, as in C.
971
They do short-circuit evaluation, also as in C, and are used for combining
972
more primitive pattern expressions. As in most languages, parentheses
973
may be used to change the order of evaluation.
977
operator is like the same operator in C. If the first pattern is true
978
then the pattern used for testing is the second pattern, otherwise it is
979
the third. Only one of the second and third patterns is evaluated.
982
.IB pattern1 ", " pattern2
983
form of an expression is called a
984
.IR "range pattern" .
985
It matches all input records starting with a record that matches
987
and continuing until a record that matches
989
inclusive. It does not combine with any other sort of pattern expression.
990
.SS Regular Expressions
991
Regular expressions are the extended kind found in
993
They are composed of characters as follows:
994
.TP \w'\fB[^\fIabc...\fB]\fR'u+2n
996
matches the non-metacharacter
1000
matches the literal character
1004
matches any character
1009
matches the beginning of a string.
1012
matches the end of a string.
1015
character list, matches any of the characters
1019
negated character list, matches any character except
1024
alternation: matches either
1030
concatenation: matches
1040
matches zero or more
1059
One or two numbers inside braces denote an
1060
.IR "interval expression" .
1061
If there is one number in the braces, the preceding regexp
1065
times. If there are two numbers separated by a comma,
1072
If there is one number followed by a comma, then
1074
is repeated at least
1078
Interval expressions are only available if either
1081
.B \-\^\-re\-interval
1082
is specified on the command line.
1085
matches the empty string at either the beginning or the
1089
matches the empty string within a word.
1092
matches the empty string at the beginning of a word.
1095
matches the empty string at the end of a word.
1098
matches any word-constituent character (letter, digit, or underscore).
1101
matches any character that is not word-constituent.
1104
matches the empty string at the beginning of a buffer (string).
1107
matches the empty string at the end of a buffer.
1109
The escape sequences that are valid in string constants (see below)
1110
are also legal in regular expressions.
1112
.I "Character classes"
1113
are a new feature introduced in the POSIX standard.
1114
A character class is a special notation for describing
1115
lists of characters that have a specific attribute, but where the
1116
actual characters themselves can vary from country to country and/or
1117
from character set to character set. For example, the notion of what
1118
is an alphabetic character differs in the USA and in France.
1120
A character class is only valid in a regexp
1122
the brackets of a character list. Character classes consist of
1124
a keyword denoting the class, and
1126
Here are the character
1127
classes defined by the POSIX standard.
1130
Alphanumeric characters.
1133
Alphabetic characters.
1136
Space or tab characters.
1145
Characters that are both printable and visible.
1146
(A space is printable, but not visible, while an
1151
Lower-case alphabetic characters.
1154
Printable characters (characters that are not control characters.)
1157
Punctuation characters (characters that are not letter, digits,
1158
control characters, or space characters).
1161
Space characters (such as space, tab, and formfeed, to name a few).
1164
Upper-case alphabetic characters.
1167
Characters that are hexadecimal digits.
1169
For example, before the POSIX standard, to match alphanumeric
1170
characters, you would have had to write
1171
.BR /[A\-Za\-z0\-9]/ .
1172
If your character set had other alphabetic characters in it, this would not
1173
match them. With the POSIX character classes, you can write
1177
the alphabetic and numeric characters in your character set.
1179
Two additional special sequences can appear in character lists.
1180
These apply to non-ASCII character sets, which can have single symbols
1182
.IR "collating elements" )
1183
that are represented with more than one
1184
character, as well as several characters that are equivalent for
1186
or sorting, purposes. (E.g., in French, a plain ``e''
1187
and a grave-accented e\` are equivalent.)
1190
A collating symbols is a multi-character collating element enclosed in
1196
is a collating element, then
1198
is a regexp that matches this collating element, while
1200
is a regexp that matches either
1206
An equivalence class is a list of equivalent characters enclosed in
1212
is regexp that matches either
1217
These features are very valuable in non-English speaking locales.
1218
The library functions that
1220
uses for regular expression matching
1221
currently only recognize POSIX character classes; they do not recognize
1222
collating symbols or equivalence classes.
1234
operators are specific to
1236
they are extensions based on facilities in the GNU regexp libraries.
1238
The various command line options
1241
interprets characters in regexps.
1244
In the default case,
1246
provide all the facilities of
1247
POSIX regexps and the GNU regexp operators described above.
1248
However, interval expressions are not supported.
1251
Only POSIX regexps are supported, the GNU operators are not special.
1256
Interval expressions are allowed.
1258
.B \-\^\-traditional
1261
regexps are matched. The GNU operators
1262
are not special, interval expressions are not available, and neither
1263
are the POSIX character classes
1266
Characters described by octal and hexadecimal escape sequences are
1267
treated literally, even if they represent regexp metacharacters.
1269
.B \-\^\-re\-interval
1270
Allow interval expressions in regexps, even if
1271
.B \-\^\-traditional
1274
Action statements are enclosed in braces,
1278
Action statements consist of the usual assignment, conditional, and looping
1279
statements found in most languages. The operators, control statements,
1280
and input/output statements
1281
available are patterned after those in C.
1284
The operators in AWK, in order of decreasing precedence, are
1286
.TP "\w'\fB*= /= %= ^=\fR'u+1n"
1294
Increment and decrement, both prefix and postfix.
1297
Exponentiation (\fB**\fR may also be used, and \fB**=\fR for
1298
the assignment operator).
1301
Unary plus, unary minus, and logical negation.
1304
Multiplication, division, and modulus.
1307
Addition and subtraction.
1310
String concatenation.
1320
The regular relational operators.
1323
Regular expression match, negated match.
1325
Do not use a constant regular expression
1327
on the left-hand side of a
1331
Only use one on the right-hand side. The expression
1333
has the same meaning as \fB(($0 ~ /foo/) ~ \fIexp\fB)\fR.
1348
The C conditional expression. This has the form
1349
.IB expr1 " ? " expr2 " : " expr3\c
1352
is true, the value of the expression is
1367
Assignment. Both absolute assignment
1368
.BI ( var " = " value )
1369
and operator-assignment (the other forms) are supported.
1370
.SS Control Statements
1372
The control statements are
1377
\fBif (\fIcondition\fB) \fIstatement\fR [ \fBelse\fI statement \fR]
1378
\fBwhile (\fIcondition\fB) \fIstatement \fR
1379
\fBdo \fIstatement \fBwhile (\fIcondition\fB)\fR
1380
\fBfor (\fIexpr1\fB; \fIexpr2\fB; \fIexpr3\fB) \fIstatement\fR
1381
\fBfor (\fIvar \fBin\fI array\fB) \fIstatement\fR
1384
\fBdelete \fIarray\^\fB[\^\fIindex\^\fB]\fR
1385
\fBdelete \fIarray\^\fR
1386
\fBexit\fR [ \fIexpression\fR ]
1387
\fB{ \fIstatements \fB}
1390
.SS "I/O Statements"
1392
The input/output statements are as follows:
1394
.TP "\w'\fBprintf \fIfmt, expr-list\fR'u+1n"
1396
Close file (or pipe, see below).
1401
from next input record; set
1406
.BI "getline <" file
1417
from next input record; set
1421
.BI getline " var" " <" file
1428
Stop processing the current input record. The next input record
1429
is read and processing starts over with the first pattern in the
1430
AWK program. If the end of the input data is reached, the
1432
block(s), if any, are executed.
1435
Stop processing the current input file. The next input record read
1436
comes from the next input file.
1442
is reset to 1, and processing starts over with the first pattern in the
1443
AWK program. If the end of the input data is reached, the
1445
block(s), if any, are executed.
1447
Earlier versions of gawk used
1449
as two words. While this usage is still recognized, it generates a
1450
warning message and will eventually be removed.
1453
Prints the current record.
1454
The output record is terminated with the value of the
1458
.BI print " expr-list"
1460
Each expression is separated by the value of the
1463
The output record is terminated with the value of the
1467
.BI print " expr-list" " >" file
1468
Prints expressions on
1470
Each expression is separated by the value of the
1472
variable. The output record is terminated with the value of the
1476
.BI printf " fmt, expr-list"
1479
.BI printf " fmt, expr-list" " >" file
1483
.BI system( cmd-line )
1486
and return the exit status.
1487
(This may not be available on non-\*(PX systems.)
1489
\&\fBfflush(\fR[\fIfile\^\fR]\fB)\fR
1490
Flush any buffers associated with the open output file or pipe
1494
is missing, then standard output is flushed.
1498
then all open output files and pipes
1499
have their buffers flushed.
1501
Other input/output redirections are also allowed. For
1506
appends output to the
1511
In a similar fashion,
1512
.IB command " | getline"
1517
command will return 0 on end of file, and \-1 on an error.
1518
.SS The \fIprintf\fP\^ Statement
1520
The AWK versions of the
1526
accept the following conversion specification formats:
1529
An \s-1ASCII\s+1 character.
1530
If the argument used for
1532
is numeric, it is treated as a character and printed.
1533
Otherwise, the argument is assumed to be a string, and the only first
1534
character of that string is printed.
1541
A decimal number (the integer part).
1548
A floating point number of the form
1549
.BR [\-]d.dddddde[+\^\-]dd .
1558
A floating point number of the form
1559
.BR [\-]ddd.dddddd .
1570
conversion, whichever is shorter, with nonsignificant zeros suppressed.
1579
An unsigned octal number (again, an integer).
1589
An unsigned hexadecimal number (an integer).
1600
character; no argument is converted.
1602
There are optional, additional parameters that may lie between the
1604
and the control letter:
1607
The expression should be left-justified within its field.
1610
For numeric conversions, prefix positive values with a space, and
1611
negative values with a minus sign.
1614
The plus sign, used before the width modifier (see below),
1615
says to always supply a sign for numeric conversions, even if the data
1616
to be formatted is positive. The
1618
overrides the space modifier.
1621
Use an ``alternate form'' for certain control letters.
1624
supply a leading zero.
1640
the result will always contain a
1646
trailing zeros are not removed from the result.
1651
(zero) acts as a flag, that indicates output should be
1652
padded with zeroes instead of spaces.
1653
This applies even to non-numeric output formats.
1654
This flag only has an effect when the field width is wider than the
1655
value to be printed.
1658
The field should be padded to this width. The field is normally padded
1661
flag has been used, it is padded with zeroes.
1664
A number that specifies the precision to use when printing.
1670
formats, this specifies the
1671
number of digits you want printed to the right of the decimal point.
1676
formats, it specifies the maximum number
1677
of significant digits. For the
1685
formats, it specifies the minimum number of
1686
digits to print. For a string, it specifies the maximum number of
1687
characters from the string that should be printed.
1693
capabilities of the \*(AN C
1695
routines are supported.
1698
in place of either the
1702
specifications will cause their values to be taken from
1703
the argument list to
1707
.SS Special File Names
1709
When doing I/O redirection from either
1718
recognizes certain special filenames internally. These filenames
1719
allow access to open file descriptors inherited from
1721
parent process (usually the shell).
1722
Other special filenames provide access to information about the running
1726
.TP \w'\fB/dev/stdout\fR'u+1n
1728
Reading this file returns the process ID of the current process,
1729
in decimal, terminated with a newline.
1732
Reading this file returns the parent process ID of the current process,
1733
in decimal, terminated with a newline.
1736
Reading this file returns the process group ID of the current process,
1737
in decimal, terminated with a newline.
1740
Reading this file returns a single record terminated with a newline.
1741
The fields are separated with spaces.
1758
If there are any additional fields, they are the group IDs returned by
1760
Multiple groups may not be supported on all systems.
1766
The standard output.
1769
The standard error output.
1772
The file associated with the open file descriptor
1775
These are particularly useful for error messages. For example:
1779
print "You blew it!" > "/dev/stderr"
1783
whereas you would otherwise have to use
1787
print "You blew it!" | "cat 1>&2"
1791
These file names may also be used on the command line to name data files.
1792
.SS Numeric Functions
1794
AWK has the following pre-defined arithmetic functions:
1796
.TP \w'\fBsrand(\fR[\fIexpr\^\fR]\fB)\fR'u+1n
1797
.BI atan2( y , " x" )
1798
returns the arctangent of
1803
returns the cosine in radians.
1806
the exponential function.
1809
truncates to integer.
1812
the natural logarithm function.
1815
returns a random number between 0 and 1.
1818
returns the sine in radians.
1821
the square root function.
1823
\&\fBsrand(\fR[\fIexpr\^\fR]\fB)\fR
1826
as a new seed for the random number generator. If no
1828
is provided, the time of day will be used.
1829
The return value is the previous seed for the random
1831
.SS String Functions
1834
has the following pre-defined string functions:
1836
.TP "\w'\fBsprintf(\^\fIfmt\fB\^, \fIexpr-list\^\fB)\fR'u+1n"
1837
\fBgensub(\fIr\fB, \fIs\fB, \fIh \fR[\fB, \fIt\fR]\fB)\fR
1838
search the target string
1840
for matches of the regular expression
1844
is a string beginning with
1848
then replace all matches of
1854
is a number indicating which match of
1862
Within the replacement text
1868
is a digit from 1 to 9, may be used to indicate just the text that
1871
parenthesized subexpression. The sequence
1873
represents the entire matched text, as does the character
1879
the modified string is returned as the result of the function,
1880
and the original target string is
1883
.TP "\w'\fBsprintf(\^\fIfmt\fB\^, \fIexpr-list\^\fB)\fR'u+1n"
1884
\fBgsub(\fIr\fB, \fIs \fR[\fB, \fIt\fR]\fB)\fR
1885
for each substring matching the regular expression
1889
substitute the string
1891
and return the number of substitutions.
1894
is not supplied, use
1898
in the replacement text is replaced with the text that was actually matched.
1904
.I "AWK Language Programming"
1905
for a fuller discussion of the rules for
1907
and backslashes in the replacement text of
1913
.BI index( s , " t" )
1914
returns the index of the string
1922
\fBlength(\fR[\fIs\fR]\fB)
1923
returns the length of the string
1931
.BI match( s , " r" )
1932
returns the position in
1934
where the regular expression
1938
is not present, and sets the values of
1943
\fBsplit(\fIs\fB, \fIa \fR[\fB, \fIr\fR]\fB)\fR
1948
on the regular expression
1950
and returns the number of fields. If
1958
Splitting behaves identically to field splitting, described above.
1960
.BI sprintf( fmt , " expr-list" )
1965
and returns the resulting string.
1967
\fBsub(\fIr\fB, \fIs \fR[\fB, \fIt\fR]\fB)\fR
1970
but only the first matching substring is replaced.
1972
\fBsubstr(\fIs\fB, \fIi \fR[\fB, \fIn\fR]\fB)\fR
1981
is omitted, the rest of
1986
returns a copy of the string
1988
with all the upper-case characters in
1990
translated to their corresponding lower-case counterparts.
1991
Non-alphabetic characters are left unchanged.
1994
returns a copy of the string
1996
with all the lower-case characters in
1998
translated to their corresponding upper-case counterparts.
1999
Non-alphabetic characters are left unchanged.
2002
Since one of the primary uses of AWK programs is processing log files
2003
that contain time stamp information,
2005
provides the following two functions for obtaining time stamps and
2008
.TP "\w'\fBsystime()\fR'u+1n"
2010
returns the current time of day as the number of seconds since the Epoch
2011
(Midnight UTC, January 1, 1970 on \*(PX systems).
2013
\fBstrftime(\fR[\fIformat \fR[\fB, \fItimestamp\fR]]\fB)\fR
2016
according to the specification in
2020
should be of the same form as returned by
2024
is missing, the current time of day is used.
2027
is missing, a default format equivalent to the output of
2030
See the specification for the
2032
function in \*(AN C for the format conversions that are
2033
guaranteed to be available.
2034
A public-domain version of
2036
and a man page for it come with
2038
if that version was used to build
2040
then all of the conversions described in that man page are available to
2042
.SS String Constants
2044
String constants in AWK are sequences of characters enclosed
2045
between double quotes (\fB"\fR). Within strings, certain
2046
.I "escape sequences"
2047
are recognized, as in C. These are:
2049
.TP \w'\fB\e\^\fIddd\fR'u+1n
2051
A literal backslash.
2054
The ``alert'' character; usually the \s-1ASCII\s+1 \s-1BEL\s+1 character.
2074
.BI \ex "\^hex digits"
2075
The character represented by the string of hexadecimal digits following
2078
As in \*(AN C, all following hexadecimal digits are considered part of
2079
the escape sequence.
2080
(This feature should tell us something about language design by committee.)
2081
E.g., \fB"\ex1B"\fR is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
2084
The character represented by the 1-, 2-, or 3-digit sequence of octal
2085
digits. E.g. \fB"\e033"\fR is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
2088
The literal character
2091
The escape sequences may also be used inside constant regular expressions
2093
.B "/[\ \et\ef\en\er\ev]/"
2094
matches whitespace characters).
2096
In compatibility mode, the characters represented by octal and
2097
hexadecimal escape sequences are treated literally when used in
2098
regexp constants. Thus,
2103
Functions in AWK are defined as follows:
2106
\fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements \fB}\fR
2109
Functions are executed when they are called from within expressions
2110
in either patterns or actions. Actual parameters supplied in the function
2111
call are used to instantiate the formal parameters declared in the function.
2112
Arrays are passed by reference, other variables are passed by value.
2114
Since functions were not originally part of the AWK language, the provision
2115
for local variables is rather clumsy: They are declared as extra parameters
2116
in the parameter list. The convention is to separate local variables from
2117
real parameters by extra spaces in the parameter list. For example:
2122
function f(p, q, a, b) # a & b are local
2127
/abc/ { ... ; f(1, 2) ; ... }
2132
The left parenthesis in a function call is required
2133
to immediately follow the function name,
2134
without any intervening white space.
2135
This is to avoid a syntactic ambiguity with the concatenation operator.
2136
This restriction does not apply to the built-in functions listed above.
2138
Functions may call each other and may be recursive.
2139
Function parameters used as local variables are initialized
2140
to the null string and the number zero upon function invocation.
2146
will warn about calls to undefined functions at parse time,
2147
instead of at run time.
2148
Calling an undefined function at run time is a fatal error.
2152
may be used in place of
2156
Print and sort the login names of all users:
2160
{ print $1 | "sort" }
2163
Count lines in a file:
2167
END { print nlines }
2170
Precede each line by its number in the file:
2176
Concatenate and line number (a variation on a theme):
2193
.IR "The AWK Programming Language" ,
2194
Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger,
2195
Addison-Wesley, 1988. ISBN 0-201-07981-X.
2197
.IR "AWK Language Programming" ,
2198
Edition 1.0, published by the Free Software Foundation, 1995.
2199
.SH POSIX COMPATIBILITY
2202
is compatibility with the \*(PX standard, as well as with the
2203
latest version of \*(UX
2207
incorporates the following user visible
2208
features which are not described in the AWK book,
2209
but are part of the Bell Labs version of
2211
and are in the \*(PX standard.
2215
option for assigning variables before program execution starts is new.
2216
The book indicates that command line variable assignment happens when
2218
would otherwise open the argument as a file, which is after the
2220
block is executed. However, in earlier implementations, when such an
2221
assignment appeared before any file names, the assignment would happen
2225
block was run. Applications came to depend on this ``feature.''
2228
was changed to match its documentation, this option was added to
2229
accommodate applications that depended upon the old behavior.
2230
(This feature was agreed upon by both the AT&T and GNU developers.)
2234
option for implementation specific features is from the \*(PX standard.
2236
When processing arguments,
2238
uses the special option ``\fB\-\^\-\fP'' to signal the end of
2240
In compatibility mode, it will warn about, but otherwise ignore,
2242
In normal operation, such arguments are passed on to the AWK program for
2245
The AWK book does not define the return value of
2248
has it return the seed it was using, to allow keeping track
2249
of random number sequences. Therefore
2253
also returns its current seed.
2255
Other new features are:
2266
escape sequences (done originally in
2268
and fed back into AT&T's); the
2272
built-in functions (from AT&T); and the \*(AN C conversion specifications in
2274
(done first in AT&T's version).
2277
has a number of extensions to \*(PX
2279
They are described in this section. All the extensions described here
2284
.B \-\^\-traditional
2287
The following features of
2289
are not available in
2317
The special file names available for I/O redirection are not recognized.
2325
variables are not special.
2330
variable and its side-effects are not available.
2335
variable and fixed-width field splitting.
2340
as a regular expression.
2343
The ability to split out individual characters using the null string
2346
and as the third argument to
2350
No path search is performed for files named via the
2352
option. Therefore the
2354
environment variable is not special.
2359
to abandon processing of the current input file.
2364
to delete the entire contents of an array.
2367
The AWK book does not define the return value of the
2372
returns the value from
2376
when closing a file or pipe, respectively.
2381
.B \-\^\-traditional
2387
option is ``t'', then
2389
will be set to the tab character.
2390
Since this is a rather ugly special case, it is not the default behavior.
2391
This behavior also does not occur if
2398
was compiled for debugging, it will
2399
accept the following additional options:
2410
debugging output during program parsing.
2411
This option should only be of interest to the
2413
maintainers, and may not even be compiled into
2416
.SH HISTORICAL FEATURES
2417
There are two features of historical AWK implementations that
2420
First, it is possible to call the
2422
built-in function not only with no argument, but even without parentheses!
2427
a = length # Holy Algol 60, Batman!
2431
is the same as either of
2441
This feature is marked as ``deprecated'' in the \*(PX standard, and
2443
will issue a warning about its use if
2445
is specified on the command line.
2447
The other feature is the use of either the
2451
statements outside the body of a
2456
loop. Traditional AWK implementations have treated such usage as
2461
will support this usage if
2462
.B \-\^\-traditional
2464
.SH ENVIRONMENT VARIABLES
2467
exists in the environment, then
2469
behaves exactly as if
2471
had been specified on the command line.
2476
will issue a warning message to this effect.
2480
environment variable can be used to provide a list of directories that
2482
will search when looking for files named via the
2490
option is not necessary given the command line variable assignment feature;
2491
it remains only for backwards compatibility.
2493
If your system actually has support for
2500
files, you may get different output from
2502
than you would get on a system without those files. When
2504
interprets these files internally, it synchronizes output to the standard
2505
output with output to
2507
while on a system with those files, the output is actually to different
2511
Syntactically invalid single character programs tend to overflow
2512
the parse stack, generating a rather unhelpful message. Such programs
2513
are surprisingly difficult to diagnose in the completely general case,
2514
and the effort to do so really is not worth it.
2516
The word ``GNU'' is incorrectly capitalized in at least one file
2518
.SH VERSION INFORMATION
2519
This man page documents
2523
The original version of \*(UX
2525
was designed and implemented by Alfred Aho,
2526
Peter Weinberger, and Brian Kernighan of AT&T Bell Labs. Brian Kernighan
2527
continues to maintain and enhance it.
2529
Paul Rubin and Jay Fenlason,
2530
of the Free Software Foundation, wrote
2532
to be compatible with the original version of
2534
distributed in Seventh Edition \*(UX.
2535
John Woods contributed a number of bug fixes.
2536
David Trueman, with contributions
2537
from Arnold Robbins, made
2539
compatible with the new version of \*(UX
2541
Arnold Robbins is the current maintainer.
2543
The initial DOS port was done by Conrad Kwok and Scott Garfinkle.
2544
Scott Deifik is the current DOS maintainer. Pat Rankin did the
2545
port to VMS, and Michal Jaegermann did the port to the Atari ST.
2546
The port to OS/2 was done by Kai Uwe Rommel, with contributions and
2547
help from Darrel Hankerson. Fred Fish supplied support for the Amiga.
2549
If you find a bug in
2551
please send electronic mail to
2552
.BR bug-gnu-utils@prep.ai.mit.edu ,
2555
.BR arnold@gnu.ai.mit.edu .
2556
Please include your operating system and its revision, the version of
2558
what C compiler you used to compile it, and a test program
2559
and data that are as small as possible for reproducing the problem.
2561
Before sending a bug report, please do two things. First, verify that
2562
you have the latest version of
2564
Many bugs (usually subtle ones) are fixed at each release, and if
2565
yours is out of date, the problem may already have been solved.
2566
Second, please read this man page and the reference manual carefully to
2567
be sure that what you think is a bug really is, instead of just a quirk
2572
post a bug report in
2576
developers occasionally read this newsgroup, posting bug reports there
2577
is an unreliable way to report bugs. Instead, please use the electronic mail
2578
addresses given above.
2579
.SH ACKNOWLEDGEMENTS
2580
Brian Kernighan of Bell Labs
2581
provided valuable assistance during testing and debugging.