~vcs-imports/gawk/master

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
! Gawk.Hlp
!                                                       Pat Rankin, Jun'90
!                                                          revised, Jun'91
!                                                          revised, Jul'92
!                                                          revised, Jan'95
!                                                          revised, Apr'97
!                                                          revised, Jan'03
!                                                          revised, May'11
!   Online help for GAWK.
!
1 GAWK
 GAWK is GNU awk, the Free Software Foundation's implementation of
 the awk programming language.  awk is an interpretive language which
 can handle many data-reformatting jobs with just a few lines of code.
 It has powerful string manipulation and pattern matching capabilities
 built in.  This version is compatible with POSIX 1003.2 awk.

 The VMS version of GAWK supports both the original UN*X-style command
 interface and a DCL interface.  The only setup requirement for GAWK
 is to define it as a 'foreign' command:  a DCL symbol with a value
 which begins with '$'.
       $ GAWK :== $disk:[directory]GAWK
2 GNU_syntax
 GAWK's UN*X-style interface uses the 'dash' convention for specifying
 options and uses spaces to separate multiple arguments.

 There are two main alternatives, depending on how the awk program is
 to be passed to GAWK.  Both alternatives share most options.

 Usage: $ gawk [-Wopts] [-F fs] [-v var=val] -f progfile [--] file ...
    or  $ gawk [-Wopts] [-F fs] [-v var=val] [--] "program" file ...

 The options are case-sensitive.  On VMS, the DCL command interpreter
 converts unquoted text into uppercase before passing it to the running
 program.  However, GAWK is written in 'C' and the C Run-Time Library
 (VAXCRTL or DECC$SHR) converts unquoted text into *lowercase*.
 Therefore, the -Fval and -W options must be enclosed in quotes.
3 options
 -d[file]        dump variable values into file (default is awkvars.out
                 if not specified) upon program completion
 -e program_text additional program text, as a quoted string, for use
                 in combination with -f
 -f file         use the specified file as the awk program source; if
                 more than one instance of -f is used, each file will
                 be read in succession
 -Fstring        define a value for the FS variable (field separator)
 -O              optimize; of limited use
 -p[file]        write program execution profiling into file (default
                 is awkprof.out if not specified)
 -v var=val      assign a value of 'val' to the variable 'var'
 -W'options'     additional gawk-specific options; multiple values may
                 be separated by commas, or by spaces if they're quoted,
                 or mulitple occurrences of -W may be used.
 -Wcopyright     display an abbreviated version of the GNU copyright
                 information
 -Whelp          list command line options (supersedes -Wusage)
 -Wlint          warn about suspect or non-portable awk program code
 -Wlint=fatal    treat lint warnings as errors
 -Wlint-old      warn about constructs not available in original awk
 -Wposix         traditional mode with additional restrictions
 -Wre-interval   evaluate '{' and '}' as intervals in regular expressions
 -Wtraditional   use awk compatibility mode to disable GAWK extensions
                 and get the behavior of UN*X awk.
 -Wversion       display program version number
 --              don't check further arguments for leading dash
3 program_text
 If the '-f file' option is not used on the command line, then the
 first "non-dash" argument is assumed to be a string of text containing
 the awk source program.  Here is a complete sample program:
       $ gawk -- "BEGIN {print ""\nHello, World!\n""}"
 This program would print a blank line (based on first "\n"), followed
 by a line reading "Hello, World!", followed by another blank line
 (since awk's 'print' statement includes the trailing 'newline').

 On VMS, to include a quote character inside of a quoted string, two
 successive quotes ("") must be used.
3 data_files
 After all dash-options are examined, and after the program text if
 there were no occurrences of the -f option, remaining (space separated)
 command line arguments are considered to be data files for the awk
 program to process.  If any of these actually contains an equals sign
 (=), then it is interpreted as a variable assignment instead of a data
 file.  The syntax is 'variable_name=value'.  For example, the command
       $ gawk -f myprog.awk infile.one flag=2 start=0 infile.two
 would read file 'infile.one' for the program in 'myprog.awk', then it
 would set 'flag' to 2 and 'start' to 0, and finally it would read file
 'infile.two' for the program.  Note that in a case like this, the two
 assignments actually occur after the first file has been processed,
 not at program startup when the command line is first scanned.
3 IO_redirection
 The command parsing in the VMS implementation of GAWK does some
 emulation of a UN*X-style shell, where certain characters on the
 command line have special meaning.  In particular, the symbols '<',
 '>', '|', '*', and '?' receive special handling before the main part
 of the program has a chance to see them.  The symbols '<' and '>'
 perform some file manipulation from the command line:

 <ifile     open file 'ifile' (readonly) as 'stdin' [SYS$INPUT]
 >nfile     create 'nfile' as 'stdout' [SYS$OUTPUT], in stream-lf format
 >>ofile    append to 'ofile' for 'stdout'; create it if necessary
 >&efile    point 'stderr' [SYS$ERROR] at 'efile', but don't open it yet
 >$vfile    create 'vfile' as 'stdout', using RMS attributes appropriate
            for a standard text file (variable length records with
            implied carriage control)
 >+bfile    create 'bfile' as 'stdout' using binary mode
 2>&1       route error messages into the regular output stream
 1>&2       send output data to the error destination
 <<sentinel error; reading stdin until 'sentinel' not supported
 <-, >-     error; closure of stdin or stdout from cmd line not supported
 >>$vfile   incorrect; would be interpreted as file "$vfile" in stream-lf
            format rather than as file "vfile" in RMS 'text' format
 |          error; command line pipes not supported
3 wildcard_expansion
 The command parsing in the VMS implementation of GAWK does some
 emulation of a UN*X-style shell, where certain characters on the
 command line have special meaning.  In particular, the symbols '<',
 '>', '*', '%', and '?' receive special handling before the main part
 of the program has a chance to see them.  The symbols '*', '%' and '?'
 are used as wildcards in filenames.  '*' and '%' have their usual VMS
 meanings of multiple character and single character wildcards,
 respectively, and '?' is also treated as a single character wildcard.
 Wildcard expansion only works for filenames specified in native VMS
 filename syntax (eg, "[-.sibling]*"), not for ones specified pseudo-
 Unix syntax (eg, "../sibling/*").

 When a command line argument that should be a filename contains any
 of the wildcard characters, a directory lookup is attempted for files
 which match the specified pattern.  If one or more matching files are
 found, those filenames are put into the command line in place of the
 original pattern.  If no matching files are found, the original
 pattern is left in place.
2 DCL_syntax
 GAWK's DCL-style interface is more or less a standard DCL command, with
 one required parameter.  Multiple values--when present--are separated
 by commas.

 There are two main alternatives, depending on how the awk program is
 to be passed to GAWK.  Both alternatives share most options.

 Usage:  GAWK  /COMMANDS="awk program text"  data_file[,data_file,...]
    or   GAWK  /INPUT=awk_file  data_file[,"Var=value",data_file,...]
 (  or   GAWK  /INPUT=(awk_file1,awk_file2,...)  data_file[,...]       )
3 Parameter
 data_file[,datafile,...]       (data_file data_file ...)
 data_file[,"Var=value",...,data_file,...]      (data_file Var=value &c)

  Data file(s) for the awk program to process.  If any of these
  actually contains an equals sign (=), then it is interpreted as
  a variable assignment instead of a data file.  The syntax is
  "variable_name=value".  Quotes are required for non-file parameters.

  For example, the command
       $ gawk/input=myprog.awk infile.one,"flag=2","start=0",infile.two
  would read file 'infile.one' for the program in 'myprog.awk', then it
  would set 'flag' to 2 and 'start' to 0, and finally it would read file
  'infile.two' for the program.  Note that in a case like this, the two
  assignments actually occur after the first file has been processed,
  not at program startup when the command line is first scanned.

  Wildcard file lookups are attempted on data file specifications.  See
  subtopic 'GAWK GNU_syntax wildcard_expansion' for details.

  At least one data_file parameter value is required.  An exception is
  made if /usage, /version, or /copyright is specified *and* if GAWK is
  defined as a 'foreign' command rather than a 'native' DCL command.
3 Qualifiers
/COMMANDS
 /COMMANDS="awk program text"   (-- "awk program text")

  For short programs, it is possible to include the complete program
  on the command line.  The quotes are required.  Here is a complete
  sample program:
       $ gawk/commands="BEGIN {print ""\nHello, World!\n""}" NL:
  This program would print a blank line (based on first "\n"), followed
  by a line reading "Hello, World!", followed by another blank line
  (since awk's 'print' statement includes the trailing 'newline').

  To include a quote character inside of a quoted string, two
  successive quotes ("") must be used.

  Either /COMMANDS or /INPUT (but not both) must be supplied.
/INPUT
 /INPUT=(awk_file1,awk_file2)   (-f awk_file1 -f awk_file2)

  Used to specify one or more files containing the source code of
  the awk program.  If more than one file is used, separate them
  with commas and enclose the list in parentheses.

  Multiple source files are processed in order as if they had been
  concatenated together.

  Either /INPUT or /COMMANDS (but not both) must be supplied unless
  one of /VERSION, /COPYRIGHT, and /USAGE is used.
/EXTRA_COMMANDS
 /EXTRA_COMMANDS="awk program text"     (-E "awk program text")

  Add more program text, for use in combination with /INPUT.  Unlike
  Un*x or GNU syntax processing of VMS GAWK where multiple instances of
  -f file and -e text can be interspersed, DCL command processing of
  VMS GAWK allows only one /EXTRA_COMMANDS="text" qualifier and handles
  it before /INPUT=(file,...).
/FIELD_SEPARATOR
 /FIELD_SEPARATOR="FS_value"    (-F"FS_value")

  Assign a value to the built in variable FS (field separator).
/VARIABLES
 /VARIABLES=("Var1=val1","Var2=val2",...)  (-v Var1=val1 -v Var2=val2)

  Assign value(s) to the specified variable(s).
/OPTIMIZE
 /[NO]OPTIMIZE          (-"O" option)

  Perform some relatively minor optimizations on the source code as it
  is read in; primarily constant folding.  Default is /NOOPTIMIZE but
  presently optimization is always enabled and explicitly negating it
  has no effect.  This may change when/if more elaborate optimizations
  are implemented.
/PROFILE
 /PROFILE[=file]        (-p[file])

  Write profiling feedback into the specified file.  If no file name is
  specified, awkprof.out in the current directory is used.
/DUMP_VARIABLES
 /DUMP_VARIABLES[=file] (-d[file])

  Print a sorted list of global variables, their types, and final values
  to the specified file.  If no file name is specified, awkvars.out in
  the current directory is used.
!-/REG_EXPR
!- /REG_EXPR={AWK | EGREP | POSIX}   (-a vs -e options [obsolete])
!-
!-  This qualifier is obsolete and has no effect.
/POSIX
 /[NO]POSIX             (-"Wposix" option)

  Use POSIX compatibility mode (/posix) and suppress GAWK extensions.
  The default is /NOPOSIX.  Slightly more restrictive than /strict.
/TRADITIONAL
 /[NO]TRADITIONAL       (-"Wtraditional" option)

  Use strict awk compatibility mode (/traditional) and suppress GAWK
  extensions.  Supersedes /STRICT.  The default is /NOTRADITIONAL.
/STRICT
 /[NO]STRICT            (-"Wtraditional" option)

  Use strict awk compatibility mode (/strict) and suppress GAWK
  extensions.  Superseded by /TRADITIONAL.  The default is /NOSTRICT.
/RE_INTERVAL
 /RE_INTERVAL           (-"Wre-interval" option)

  Allow interval expressions in regexps (regular expressions).  GAWK
  always accepts intervals in normal mode; /RE_INTERVAL can be used to
  enable them in strict (/TRADITIONAL) compatability mode.
/SANDBOX
 /SANDBOX               (-"Wsandbox" option)

  Disables the system() function, input redirections with getline,
  output redirections with print and printf, and dynamic extensions.
/NON_DECIMAL_DATA
 /NON_DECIMAL_DATA      (-"Wnon-decimal-data" option)

  Enable automatic interpretation of octal and hexadecimal values in
  input data.  Use with care.
/LINT
 /[NO]LINT[=(WARN,OLD,FATAL)]   (-"Wlint" and -"Wlint-old" options)

  Check the awk program cafefully for potential problems that might
  be encountered if it were to be used with other awk implementations,
  and print warnings for anything found.  The default in /NOLINT.

  /LINT without a value is equivalent to /LINE=WARN.  /LINT=OLD warns
  about constructs which wouldn't work with /TRADITIONAL.  /LINT=FATAL
  turns lint warnings into errors which cause GAWK to terminate.
!-  /LINT=INVALID is accepted but isn't documented here.
!three undocumented qualifiers; judged not useful for VMS
!-  /CHARACTERS_AS_BYTES
!-   /CHARACTERS_AS_BYTES     (-"Wcharacters-as-bytes" option)
!-  /USE_LC_NUMERIC
!-   /USE_LC_NUMERIC          (-"Wuse-lc-numeric" option)
!-  /GEN_POT
!-   /GEN_POT                 (-"Wgen-pot" option)
/VERSION
 /VERSION               (-"Wversion" option)

  Print GAWK's version number and then terminate.  Includes copyright
  notice.
/COPYRIGHT
 /COPYRIGHT             (-"Wcopyright" option)

  Print a brief version of GAWK's copyright notice and then terminate.
/USAGE
 /USAGE                 (comparable to -"Whelp" option)

  Print a compact summary of the command line options.

  After the 'usage' message is printed, GAWK terminates regardless
  of any other command line options.
/OUTPUT
 /OUTPUT=out_file       (>$out_file)

  Write program output into 'out_file'.  The default is SYS$OUTPUT.
2 awk_language
 An awk program consists of one or more pattern-action pairs, sometimes
 referred to as "rules".  For each record of an input (data) file, the
 rules are checked sequentially.  Any pattern which matches the input
 record triggers that rule's action.  Actions are instructions which
 resemble statements in the 'C' programming language.  Patterns come
 in several varieties, including field comparisons, regular expression
 matching, and special cases defined by reserved keywords.

 All awk keywords and variables are case-sensitive.  Text matching is
 also sensitive to character case unless the builtin variable IGNORECASE
 is set to a non-zero value.
3 rules
 The syntax for a pattern-action 'rule' is simply
       PATTERN { ACTION }
 where the braces ({}) are required punctuation for the action.
 Semicolons (;) or 'newlines' (ie, having the text on a separate line)
 delimit multiple rules and also multiple actions within a given rule.
 Either the pattern or the action may be omitted; an empty pattern
 matches every record of the input file; a missing action (not an empty
 action inside of braces), is an implicit request to print the current
 record; an empty action (ie, {}) is legal but not very useful.
3 patterns
 There are several types of patterns available for awk rules.

  expression  an 'expression' is something to be evaluated (perhaps
                         a comparison or function call) which will
                         be considered true if non-zero (for numeric
                         results) or if non-null (for strings)
  /regular_expression/ slashes (/) delimit a regular expression
                         which is used as a pattern
  pattern1, pattern2   a pair of patterns separated by a comma (,),
                         which causes a range of records to trigger
                         the associated action; the records which
                         match the patterns are included in the range
  <null>      an omitted pattern (in this text, the  string '<null>'
                         is displayed, but in an awk program, it
                         would really be blank) matches every record
  BEGIN       keyword for specifying a rule to be executed prior to
                         reading the 1st record of the 1st input file
  END         keyword for specifying a rule to be executed after
                         handling the last input record of last file
  BEGINFILE   gawk-specific keyword for specifying a rule to be
                         executed when a file from the command line
                         has just been opened, before attempting to
                         read its first record
  ENDFILE     gawk-specific keyword for specifying a rule to be
                         executed after the last record of a file
                         from the command has been processed by any
                         other patterns and actions
4 BEGINFILE
 Normally a file open attempt which fails will generate an error
 and cause GAWK to terminate.  However, if your program has a
 BEGINFILE rule, failed open attempts will set ERRNO to a non-null
 value and execute the BEGINFILE rule's actions.  You can check
 for that condition and use the 'nextfile' statement to skip files
 which couldn't be opened.  Note that when executing the BEGINFILE
 rule for a failed open attempt, allowing the actions to finish
 without using 'nextfile' will result in an error just like for a
 program which has no BEGINFILE rule.
4 examples
 Some example patterns (mostly with the corresponding actions omitted)

 NF > 0     # comparison expression:  matches non-null records
 $0         # implied comparison:  also matches non-null records
 $2 > 1000 && sum <= 999999     # slightly more elaborate expression
 /x/        # regular expression matching any record with an 'x' in it
 /^ /       # reg-expr matching records beginning with a space
 $1 == "start", $NF == "stop"   # range pattern for input in which
                some data lines begin with 'start' and/or end with
                'stop' in order to collect groups of records
        { sum += $1 }   # null pattern:  it's action (add field #1 to
                variable 'sum') would be executed for every record
 BEGIN  { sum = 0 }     # keyword 'BEGIN':  perform this action before
                reading the input file (note: initialization to 0 is
                unnecessary in awk)
 END    { print "total =", sum }    # keyword 'END':  perform this
                action after the last input record has been processed
 # two different ways to handle the start of an input file:
 FNR == 1 { print FILENAME }  # print name after reading first record
 BEGINFILE { print FILENAME } # print name before reading first record
3 actions
 An 'action' is something to do when a given record has matched the
 corresponding pattern in a rule.  In general, actions resemble 'C'
 statements and expressions.  The action in a rule must be enclosed
 in braces ({}).

 Each action can contain more than one statement or expression to be
 executed, provided that they're separated by semicolons (;) and/or
 on separate lines.

 An omitted action is equivalent to
       { print $0 }
 which prints the current record.
3 operators
 Relational operators
    ==    compare for equality
    !=    compare for inequality
    <, <=, >, >=  numerical or lexical comparison (less than, less or
                    equal, greater than, greater or equal, respectively)
    ~     match against a regular expression
    !~    match against a regular expression, but accept failed matches
            instead of successful ones
 Arithmetic operators
    +     addition
    -     subtraction
    *     multiplication
    /     division
    %     remainder
    ^, ** exponentiation ('**' is a synonym for '^', unless POSIX
            compatibility is specified, in which case it's invalid)
 Boolean operators (aka Logical operators)
          a value is considered false if it's 0 or a null string,
            it is true otherwise; the result of a boolean operation
            (and also of a comparison operation) will be 0 when false
            or 1 when true
    ||    or [expression (a || b) is true if either a is true or b
            is true or both a and b are true; it is false otherwise;
            b is not evaluated unless a is false (ie, short-circuit)]
    &&    and [expression (a && b) is true if both a and b are true;
            it is false otherwise; b is only evaluated if a is true]
    !     not [expression (!a) is true if a is false, false otherwise]
    in    array membership; the keyword 'in' tests whether the value
            on the left represents a current subscript in the array
            named on the right
 Conditional operator
    ? :   the conditional operator takes three operands; the first is
            an expression to evaluate, the second is the expression to
            use if the first was true, the third is the expression to
            use if it was false [simple example (a < b ? b : a) gives
            the maximum of a and b]
 Assignment operators
    =     store the value on the right into the variable or array slot
            on the left [expression (a = b) stores the value of b in a]
    +=, -=, *=, /=, %=, ^=, **=  perform the indicated arithmetic
           operation using the current value of the variable or array
            element of the left side and the expression on the right
            side, then store the result in the left side
    ++    increment by 1 [expression (++a) gets the current value of
            a and adds 1 to it, stores that back in a, and returns the
            new value; expression (a++) gets the current value of a,
            adds 1 to it, stores that back in a, but returns the
            original value of a]
    --    decrement by 1 (analogous to increment)
 String operators
          there is no explicit operator for string concatenation;
            two values and/or variables side-by-side are implicitly
            concatenated into a string (numeric values are first
            converted into their string equivalents)
 Conversion between numeric and string values
          there is no explicit operator for conversion; adding 0
            to a string with force it to be converted to a number
            (the numeric value will be 0 if the string does not
            represent an integer or floating point number); the
            reverse, converting a number into a string, is done by
            concatenating a null string ("") to it [the expression
            (5.75 "") evaluates to "5.75"]
 Field 'operator'
    $     prefixing a number or variable with a dollar sign ($)
            causes the appropriate record field to be returned [($2)
            gives the second field of the record, ($NF) gives the
            last field (since the builtin variable NF is set to the
            number of fields in the current record)]
 Array subscript operator
    ,     multi-dimensional arrays are simulated by using comma (,)
            separated array indices; the actual index is generated
            by replacing commas with the value of builtin SUBSEP,
            then concatenating the expression into a string index
          [comma is also used to separate arguments in function
            calls and user-defined function definitions]
          [comma is *also* used to indicate a range pattern in an
            awk rule]
 Escape 'operator'
    \     In quoted character strings, the backslash (\) character
            causes the following character to be interpreted in a
            special manner [string "one\ntwo" has an embedded newline
            character (linefeed on VMS, but treated as if it were both
            carriage-return and linefeed); string "\033[" has an ASCII
            'escape' character (which has octal value 033) followed by
            a 'right-bracket' character]
          Backslash is also used in regular expressions
 Redirection operators
    <     Read-from -- valid with 'getline'
    >     Write-to (create new file) -- valid with 'print' and 'printf'
    >>    Append-to (create file if it doesn't already exist)
    |     Pipe-from/to -- valid with 'getline', 'print', and 'printf'
4 precedence
 Operator precedence, listed from highest to lowest.  Assignment,
 conditional, and exponentiation operators group from right to left;
 all others group from left to right.  Parentheses may be used to
 override the normal order.

     field ($)
     increment (++), decrement (--)
     exponentiation (^, **)
     unary plus (+), unary minus (-), boolean not (!)
     multiplication (*), division (/), remainder (%)
     addition (+), subtraction (-)
     concatenation (no special symbol; implied by context)
     relational (==, !=, <, >=, etc), and redirection (<, >, >>, |)
       Relational and redirection operators have the same precedence
       and use similar symbols; context distinguishes between them
     matching (~, !~)
     array membership ('in')
     boolean and (&&)
     boolean or (||)
     conditional (? :)
     assignment (=, +=, etc)
4 escaped_characters
 Inside of a quoted string or constant regular expression, the
 backslash (\) character gives special meaning to the character(s)
 after it.  Special character letters are case sensitive.
    \\    results in one backslash in the string
    \a    is an 'alert' (<ctrl/G>. the ASCII <bell> character)
    \b    is a backspace (BS, <ctrl/H>)
    \f    is a form feed (FF, <ctrl/L>)
    \n    'newline' (<ctrl/J> [line feed treated as CR+LF]
    \r    carriage return (CR, <ctrl/M> [re-positions at the
            beginning of the current line]
    \t    tab (HT, <ctrl/I>)
    \v    vertical tab (VT, <ctrl/K>)
    \###  is an arbitrary character, where '###' represents 1 to 3
            octal (ie, 0 thru 7) digits
    \x##  is an alternate arbitrary character, where '##' represents
            1 or more hexadecimal (ie, 0 thru 9 and/or A through E
            and/or a through e) digits; if more than two digits
            follow, the result is undefined; not recognized if POSIX
            compatibility mode is specified.

 When a regular expression is represented in string form ("regex"
 as opposed to /regex/), backslashes need to be paired.  The first
 one quotes the second during string processing, and the second one
 remains to be used to quote whatever follows in regular expression
 processing.  For example, to match variable `xxx' against a period
 character, use (xxx ~ "\\.") or (xxx ~ /\./); if you tried to use
 (xxx ~ "\."), after string processing it would operate as (xxx ~ /./)
 and end up matching any single character rather than just a period.
3 statements
 A statement refers to a unit of instruction found in the action
 part of an awk rule, and also found in the definition of a function.
 The distinction between action, statement, and expression usually
 won't matter to an awk programmer.

 Compound statements consist of multiple statements separated by
 semicolons or newlines and enclosed within braces ({}).  They are
 sometimes referred to as 'blocks'.
4 expressions
 An expression such as 'a = 10' or 'n += i++' is a valid statement.

 Function invocations such as 'reformat_field($3)' are also valid
 statements.
4 if-then-else
 A conditional statement in awk uses the same syntax as for the 'C'
 programming language:  the 'if' keyword, followed by an expression
 in parentheses, followed by a statement--or block of statements
 enclosed within braces ({})--which will be executed if the expression
 is true but skipped if it's false.  This can optionally be followed
 by the 'else' keyword and another statement--or block of statements--
 which will be executed if (and only if) the expression was false.
5 examples
 Simple example showing a statement used to control how many numbers
 are printed on a given line.
       if ( ++i <= 10 )     #check whether this would be the 11th
              printf(" %5d", k)     #print on current line if not
       else {
              printf("\n %5d", k)   #print on next line if so
              i = 1                 #and reset the counter
       }
 Another example ('next' is described under 'action-controls')
       if ($1 > $2) { print "rejected"; next } else diff = $2 - $1
4 switch-case
 A gawk extension provides an alternative for conditional execution
 to the if-then-else construct.  The switch statement takes a value
 to use to decide which of one or more case clauses to execute,
 similar to the same construct in C and C++.  The main difference
 is that in those languages, the case values must be constant
 integers, whereas in awk they can by numbers, strings, or regular
 expressions.  Like in C/C++, an optional 'default' clause can be
 specified to serve as a catch-all for values which don't match
 any of the cases.

 The first case which matches the switch value is the one which
 will be executed.  If it doesn't use one of 'break', 'continue',
 'next', 'nextfile', 'return', or 'exit', then execution will
 continue into the body of the next case.  (Note that 'continue'
 doesn't operate as an explicit request to do such; rather, it
 causes execution of an enclosing for, while, or do-while
 statement to jump to the end of its loop.)
5 example
 In this example, the value of variable 'x' is examined.  It
 contains a mistake that someone coming from a background of
 programming in Pascal might accidentally make.

 switch (x) {
 case 1:     print "x is 1"; break;
 case 2:     print "x is 2"
 case "two": print "x is \"two\""; break;
 default:    print "x is neither 1 nor 2"; break
 }

 Note that if the value is '2', after printing "x is 2" it will
 continue into the next case and also print "x is \"two\"", which
 was probably not intended.  The 'break' statement is needed to
 jump out of the switch statement instead of falling through
 into the subsequent clause.  For the very last one, 'default'
 in this example, 'break' is optional; reaching the closing
 bracket of a switch statement also breaks out of the statement.
4 loops
 Three types of loop statements are available in awk.  Each uses
 the same syntax as 'C'.  The simplest of the three is the 'while'
 statement.  It consists of the 'while' keyword, followed by an
 expression enclosed within parentheses, followed by a statement--or
 block of statements in braces ({})--which will be executed if the
 expression evaluates to true.  The expression is evaluated before
 attempting to execute the statement; if it's true, the statement is
 executed (the entire block of statements if there is a block) and
 then the expression is re-evaluated.

 The second type of loop is the do-while loop.  It consists of the
 'do' keyword, followed by a statement (usually a block of statements
 enclosed within braces), followed by the 'while' keyword, followed
 by a test expression enclosed within parentheses.  The statement--or
 block--is always executed at least once.  Then the test expression
 is evaluated, and the statement(s) re-executed if the result was
 true (followed by re-evaluation of the test, and so on).

 The most complex of the three loops is the 'for' statement, and it
 has a second variant that is not found in 'C'.  The ordinary for-loop
 consists of the 'for' keyword, followed by three semicolon-separated
 expressions enclosed within parentheses, followed by a statement or
 brace-enclosed block of statements.  The first of the three
 expressions is an initialization clause; it is done before starting
 the loop.  The second expression is used as a test, just like the
 expression in a while-loop.  It is checked before attempting to
 execute the statement block, and then re-checked after each execution
 (if any) of the block.  The third expression is an 'increment' clause;
 it is evaluated after an execution of the statement block and before
 re-evaluation of the test (2nd) expression.  Normally, the increment
 clause will change a variable used in the test clause, in such a
 fashion that the test clause will eventually evaluate to false and
 cause the loop to finish.

 Note to 'C' programmers:  the comma (,) operator commonly used in
 'C' for-loop expressions is not valid in awk.

 The awk-specific variant of the for-loop is used for processing
 arrays.  Its syntax is 'for' keyword, followed by variable_name 'in'
 array_name (where 'var in array' is enclosed in parentheses),
 followed by a statement (or block).  Each valid subscript value for
 the array in question is successively placed--in no particular
 order--into the specified 'index' variable.  Order can optionally
 be controlled by assigning a sort mode to PROCINFO["sorted_in"].
5 while_example
 # strip fields from the input record until there's nothing left
 while (NF > 0) {
     $1 = ""    #this will affect the value of $0
     $0 = $0    #this causes $0 and NF to be re-evaluated
     print
 }
5 do_while_example
 # This is a variation of the while_example; it gives a slightly
 #   different display due to the order of operation.
 # echo input record until all fields have been stripped
 do {
     print      #output $0
     $1 = ""    #this will affect the value of $0
     $0 = $0    #this causes $0 and NF to be re-evaluated
 } while (NF > 0)
5 for_example
 # echo command line arguments (won't include option switches)
 for ( i = 0; i < ARGC; i++ )  print ARGV[i]

 # display contents of builtin environment array
 for (itm in ENVIRON)
     print itm, ENVIRON[itm]
5 for_index_in_array_sorting
 Normally indices in an array are processed in an arbitrary
 order when using the 'for (index in array)' statement,
 but a gawk-extension allows you to control that order.
 Assign a value to the "sorted_in" element of the PROCINFO[]
 array to accomplish this.  The value may be a comparison
 function which accepts four arguments (index and value of one
 element, then index and value of another), or a special value
 which specifies one of several built-in comparison functions.
 These functions are used to compare pairs of array elements
 and their result controls which of each pair comes before the
 other.
6 comparison_function
 A function assigned to PROCINFO["sorted_in"] should be
 prepared to accept four arguments and to return a numeric
 value, negative if the element specified by the first two
 arguments (its index and its value, respectively) is less
 than the element specified the second pair of arguments,
 zero if they compare equal, and positive of the first
 element is greater than the second.  Here's an example:

 function my_compare(idx1, val1, idx2, val2)
 {
   if (val1 < val2) return -1
   if (val2 > val2) return 1
   # the two values are equal
   return (idx1 < idx2) ? -1 : (idx1 > idx2)
 }

 This compares the two values and returns either negative
 or positive if they're different.  If they're the same,
 it compares the two indices as a tie-breaker instead of
 simply returning zero.

 You can force values to be numeric or to be string, as
 needed, and use more elaborate ordering criteria.  Just
 be sure that the results are consistent; returning a
 positive value when idx1,val1 is compared to idx2,val2
 and then also returning a positive value if idx2,val2
 gets compared to idx1,idx2 will likely confuse the sort
 routine and produce strange results.

 If you plan to sort arrays which contain sub-arrays (array
 elements which contain their own arrays) and you're sorting
 by value rather than by index, your compare routine should
 use the isarray() function to check for them (test second
 and fourth arguments to see whether they're arrays) and
 handle them appropriately.  The basic comparison operators
 like '<' will produce an error if used on arrays.
6 built-in_comparisons
 Here is a list of built-in compare routines that can be
 assigned to PROCINFO["sorted_in"].  They are strings
 and start with '@' so that these names can't be confused
 with actual functions.

 "@ind_str_asc"   order by indices compared as strings
                  (all array indices are strings internally,
                  even when they were assigned as numbers)
 "@ind_num_asc"   order by indices compared as numbers
                  (non-numeric ones end up with value 0)
 "@val_type_asc"  order by values using assigned type
                  (if a mixture of strings and numbers is
                  present, numbers come first, then strings)
 "@val_str_asc"   order by values compared as strings
 "@val_num_asc"   order by values compared as numbers
 "@ind_str_desc"  \
 "@ind_num_desc"   \
 "@val_type_desc"   descending versions of the above
 "@val_str_desc"   /
 "@val_num_desc"  /
 "@unsorted"      explicitly specify arbitrary order
                  (same as deleting the "sorted_in" element
                  from the PROCINFO[] array, or never having
                  assigned it a value in the first place)

 All the ascending sorts put sub-arrays--if any--last, and
 descending ones place them first.  When multiple sub-arrays
 are present, they tie with each other without regard to
 their contents; such ties are then disambiguated by
 comparing their indices.
6 processing_order
 Sorting of the array takes place as the 'for (index in array)'
 statement is about to start executing.  Changing the value of
 PROCINFO["sorted_in"] during the course of the loop will not
 affect traversal order, and could be used to control ordering
 of sub-arrays using different criteria.

 After the loop finishes, any ordering imposed on the indices
 is forgotten.  A subsequent 'for (index in array)' traversal
 of the same array will yield whatever order is specified by
 PROCINFO["sorted_in"] at that time, including reverting to
 arbitrary if it no longer has a value.
4 loop-controls
 There are two special statements--both from 'C'--for changing the
 behavior of loop execution.  The 'continue' statement is useful in
 a compound (block) statement; when executed, it effectively skips
 the rest of the block so that the increment-expression (only for
 for-loops) and loop-termination expression can be re-evaluated.

 The 'break' statement, when executed, effectively skips the rest
 of the block and also treats the test expression as if it were
 false (instead of actually re-evaluating it).  In this case, the
 increment-expression of a for-loop is also skipped.

 Inside nested loops, both 'break' and 'continue' only apply to the
 innermost loop.  When in compatibility mode, 'break' or 'continue'
 may be used outside of a loop; either will be treated like 'next'
 (see action-controls).
4 action-controls
 There are two special statements for controlling statement execution.
 The 'next' statement, when executed, causes the rest of the current
 action and all further pattern-action rules to be skipped, so that
 the next input record will be immediately processed.  This is useful
 if any early action knows that the current record will fail all the
 remaining patterns; skipping those rules will reduce processing time.

 A GAWK extension, 'nextfile', is also available.  It causes the
 remainder of the current file to be skipped, the ENDFILE action, if
 applicable, to be performed, and then the next input file will be
 processed.  If there is no next input file, the END action will be
 performed.  'nextfile' is not available in traditional awk.

 The 'exit' statement causes GAWK execution to terminate.  All open
 files are closed, and no further processing is done.  The END rule,
 if any, is executed.  'exit' takes an optional numeric value as a
 argument which is used as an exit status value, so that some sort
 of indication of why execution has stopped can be passed on to the
 user's environment.
4 other_statements
 The delete statement is used to remove an element from an array.
 The syntax is 'delete' keyword followed by array name, followed
 by index value enclosed in square brackets ([]).  As a gawk
 extension, 'delete' may also used on an array name without any
 index specified, to delete all its elements in a single operation.
 (The array itself will continue to exist as an array, even though
 it no longer contains any elements.)

 The return statement is used in user-defined functions.  The syntax
 is the keyword 'return' optionally followed by a string or numeric
 expression.

 See also subtopic 'functions IO_functions' for a description of
 'print', 'printf', and 'getline'.
3 fields
 When an input record is read, it is automatically split into fields
 based on the current values of FS (builtin variable defining field
 separator expression) and RS (builtin variable defining record
 separator character).  The default value of FS is an expression
 which matches one or more spaces and tabs; the default for RS is
 newline.  If the FIELDWIDTHS variable is set to a space separated
 list of numbers (as in ``FIELDWIDTHS = "2 3 2"'') then the input
 is treated as if it had fixed-width fields of the indicated sizes
 and the FS value will be ignored.

 The field prefix operator ($), is used to reference a particular
 field.  For example, $3 designates the third field of the current
 record.  The entire record can be referenced via $0 (and it holds
 the actual input record, not the values of $1, $2, ... concatenated
 together, so multiple spaces--when present--remain intact, unless
 a new value gets assigned).

 The builtin variable NF holds the number of fields in the current
 record.  $NF is therefore the value of the last field.  Attempts to
 access fields beyond NF result in null values (if a record contained
 3 fields, the value of $5 would be "").

 Assigning a new value to $0 causes all the other field values (and NF)
 to be re-evaluated.  Changing a specific field will cause $0 to receive
 a new value once it's re-evaluated, but until then the other existing
 fields remain unchanged.
4 field_separation
 Three built in variables control separating input lines into fields,
 and the most recently assigned of those three is the one which has
 effect.  PROCINFO["FS"] can be used to determine which one that is.

 FS is a character, string, or regular expression specifying what
 separates fields.  It is available in all implementations of awk so
 is the most widely used.  The default value is an explicit space and
 behaves as if the value was /[ \t\n]+/ to treat any number of spaces
 and tabs (and newlines, if RS isn't using them as record separators)
 as the separator.  (Explicitly using that regular expression
 actually produces different results if the input happens to have
 leading and/or trailing whitespace.  The default skips such space;
 the regexp increases NF by 1 and produces an empty $1 if there is
 leading whitespace and it increases NF by 1 and produces an empty $NF
 if there is trailing whitespace.  To actually force the separator to
 be a single space, use the regular expression / /.)

 FIELDWIDTHS is a string containing a space-separated list of numbers
 which indicate how wide each field is.  It is a gawk-extension and
 used to be considered experimental, but it has been in place for many
 years without significant changes.  There is no default value, nor is
 there any way to specify a repeat count the way a Fortran FORMAT
 statment could.

 FPAT is a regular expression which specifies field values rather than
 the separation between fields.  It is also a gawk-extension and is
 new with version 4.0.0.

 A gawk-extension makes setting FS to "" force each input character
 to be a separate field, similar to FIELDWIDTHS="1 1 1 1 1 1"(...) if
 you were able to supply an unlimited number of 1's.
3 variables
 Variables in awk can hold both numeric and string values and do not
 have to be pre-declared.  In fact, there is no way to explicitly
 declare them at all.  Variable names consist of a leading letter
 (either upper or lower case, which are distinct from each other)
 or underscore (_) character followed by any number of letters,
 digits, or underscores.

 When a variable that didn't previously exist is referenced, it is
 created and given a null value.  A null value is treated as 0 when
 used as a number, and is a string of zero characters in length if
 used as a string.
4 builtin_variables
 GAWK maintains several 'built-in' variables.  All have default values;
 some are updated automatically.  All the builtins have uppercase-only
 names.

 These builtin variables control how awk behaves
   FS  input field separator; default is a single space, which is
         treated as if it were a regular expression for matching
         one or more spaces and/or tabs and/or newlines; a value
         of " " also has a second special-case side-effect of
         causing leading and/or trailing blanks to be ignored
         instead of producing a null first and/or last field;
         initial value can be specified on the command line with
         the -F option (or /field_separator); the value can be a
         regular expression; as a gawk extension, if the value is
         an empty string (""), every character becomes a separate
         field
   RS  input record separator; default value is a newline ("\n");
         the value can be multiple characters or a regular expression
   OFS output field separator; value to place between variables in
         a 'print' statement; default is one space; can be arbitrary
         string
   ORS output record separator; value to implicitly terminate 'print'
         statement with; default is newline ("\n"); can be arbitrary
         string
   OFMT default output format used for printing numbers; default
         value is "%.6g"
   CONVFMT conversion format used for number-to-string conversions;
         default value is also "%.6g", like OFMT; not used when the
         number has a value which may be represented internally as
         an exact integer (typically within -2147483648 to 2147483647)
   SUBSEP subscript separator for array indices; used when an array
         subscript is specified as a comma separated list of values:
         the comma is replaced by SUBSEP and the resulting index
         is a concatenation of the values and SUBSEP(s); default
         value is "\034"; value may be arbitrary string
   IGNORECASE string and regular expression matching flag; if true
         (non-zero) matching ignores differences between upper and
         lower case letters; affects the '~' and '!~' operators,
         the 'index', 'match', 'split', 'sub', and 'gsub' functions,
         and field splitting based on FS; default value is false (0);
         has no effect if GAWK is in strict compatibility mode
   FIELDWIDTHS space or tab separated list of width sizes; takes
         precedence over FS when set, but is cleared if FS has a
         value assigned to it; [note: the current implementation
         of fixed-field input is considered experimental and is
         expected to evolve over time]
   FPAT an alternate way to specify fields, with a regexp pattern
         which defines field values rather than field separator
         [assigning a value to any of FS, FIELDWIDTHS, or FPAT
         causes the other two to be deactivated; the value of
         PROCINFO["FS"] can be used to determine which one is
         currently in use]
   BINMODE can be used force input and/or output files to be processed
         using binary I/O; a value of 1 or "r" forces binary mode when
         reading input, a value of 2 or "w" forces binary mode when
         writing output, and a value of 3 or "rw" causes GAWK to use
         binary mode for both input and output; BINMODE has no effect
         on reading from stdin or writing to stdout; they'll have
         already been opened in text mode before you assign a value
   LINT setting or unsetting this can dynamically toggle the --lint
         command line option on or off

 These builtin variables provide useful information
   NF  number of fields in the current record
   NR  record number (accumulated over all files when more than one
         input file is processed by the same program)
   FNR current record number of the current input file; reset to 0
         each time an input file is completed
   RT  record terminator, the input text which matched RS; not
         available when the `-Wtraditional' option is used
   RSTART starting position of substring matched by last invocation
         of the 'match' function; set to 0 if a match fails and at
         the start of each input record
   RLENGTH length of substring matched by the last invocation of the
         'match' function; set to -1 if a match fails
   FILENAME name of the input file currently being processed; the
         special name "-" is used to represent the standard input
   ENVIRON array of miscellaneous user environment values; the VMS
         implementation of GAWK provides values for ["USER"] (the
         username), ["PATH"] (current default directory), ["HOME"]
         (the user's login directory), and "[TERM]" (terminal type
         if available) [all info provided by C RTL's environ]
   PROCINFO miscellaneous process information and assorted GAWK
         extensions which don't fit in elsewhere
   ERRNO information about the cause of failure for 'getline' or
         'close' or for file open during a BEGINFILE rule; it is
         only set if an error has occurred, it isn't reset when
         any subsequent operation succeeds; the only exception is
         that it is reset prior to attempting to open a file so
         that BEGINFILE rule actions can distinguish between
         success and failure
   ARGC number of elements in the ARGV array, counting [0] which is
         the program name (ie, "gawk")
   ARGV array of command-line arguments (in [0] to [ARGC-1]); the
         program name (ie, "gawk") in held in ARGV[0]; command line
         parameters (data files and "var=value" expressions, but not
         program options or the awk program text string if present)
         are stored in ARGV[1] through ARGV[ARGC-1]; the awk program
         can change values of ARGC and ARGV[] during execution in
         order to alter which files are processed or which between-
         file assignments are made
   ARGIND current index into ARGV[]
4 arrays
 awk supports associative arrays to collect data into tables.  Array
 elements can be either numeric or string, as can the indices used to
 access them.  Each array must have a unique name, but a given array
 can hold both string and numeric elements at the same time.  Arrays
 are one-dimensional only, but multi-dimensional arrays can be
 simulated using comma (,) separated indices, whereby a single index
 value gets created by replacing commas with SUBSEP and concatenating
 the resulting expression into a single string.

 Referencing an array element is done with the expression
       Array[Index]
 where 'Array' represents the array's name and 'Index' represents a
 value or expression used for a subscript.  If the requested array
 element did not exist, it will be created and assigned an initial
 null value.  To check whether an element exists without creating it,
 use the 'in' boolean operator.
       Index in Array
 would check 'Array' for element 'Index' and return 1 if it existed
 or 0 otherwise.  To remove an element from an array, use the 'delete'
 statement
       delete Array[Index]
 To remove all array elements at once, use
       delete Array
 Note:  the latter is a gawk extension; also, there is no way to
 delete an ordinary variable or an entire array; 'delete' only works
 on array elements.

 To process all elements of an array (in succession) when their
 subscripts might be unknown, use the 'in' variant of the for-loop
       for (Index in Array) { ... }
 (See the "awk_language statements loops" entry for a way to control
 the order of traversal with this construct.)

 Starting with version 4.0.0 array values can contain arrays, sometimes
 referred to as sub-arrays.  They're created by assigning a value using
 multiple instances of subscripting:  'a[1][2] = 3' would create array
 a if it didn't already exist, create array element a[1] if it didn't
 already exist, create sub-array element a[1][2] if it didn't exist,
 then assign that the value 3.  You can't directly assign an existing
 array to be a subarray:  'a[1] = 2; a[3] = 4; b["a"] = a' would get
 rejected.  But you can produce the same effect by traversing the array
 and assigning it element by element:
 'a[1] = 2; a[3] = 4; for (i in a) b["a"][i] = a[i]'.
3 functions
 awk supports both built-in and user-defined functions.  A function
 may be considered a 'black-box' which accepts zero or more input
 parameters, performs some calculations or other manipulations based
 on them, and returns a single result.

 The syntax for calling a function consists of the function name
 immediately followed by an open parenthesis (left parenthesis '('),
 followed by an argument list, followed by a closing parenthesis
 (right parenthesis ')').  The argument list is a sequence of values
 (numbers, strings, variables, array references, or expressions
 involving the above and/or nested function calls), separated by
 commas and optional white space.

 The parentheses are required punctuation, except for the 'print' and
 'printf' builtin IO functions, where they're optional, and for the
 builtin IO function 'getline', where they're not allowed.  Some
 functions support optional [trailing] arguments which can be simply
 omitted (along with the corresponding comma if applicable).
4 numeric_functions
 Builtin numeric functions
   int(n)      returns the value of 'n' with any fraction truncated
                 [truncation of negative values is towards 0]
   sqrt(n)     the square root of n
   exp(n)      the exponential of n ('e' raised to the 'n'th power)
   log(n)      natural logarithm of n
   sin(n)      sine of n (in radians)
   cos(n)      cosine of n (radians)
   atan2(m,n)  arctangent of m/n (radians)
   rand()      random number in the range 0 to 1 (exclusive)
   srand(s)    sets the random number 'seed' to s, so that a sequence
                 of 'random' numbers can be repeated; returns the
                 previous seed value; srand() [argument omitted] sets
                 the seed to an 'unpredictable' value (based on date
                 and time, for instance, so should be unrepeatable)
4 string_functions
 Builtin string functions
   index(s,t)  search string s for substring t; result is 1-based
                 offset of t within s, or 0 if not found
   length(s)   returns the length of string s; either 'length()'
                 with its argument omitted or 'length' without any
                 parenthesized argument list will return length of $0
   match(s,r)  search string s for regular expression r; the offset
                 of the longest, left-most substring which matches
                 is returned, or 0 if no match was found; the builtin
                 variables RSTART and RLENGTH are also set [RSTART to
                 the return value and RLENGTH to the size of the
                 matching substring, or to -1 if no match was found]
   split(s,a,f,x) break string s into components based on field
                 separator f and store them in array a (into elements
                 [1], [2], and so on); the third argument is optional,
                 if omitted, the value of FS is used; the fourth one
                 is optional too, and is a gawk extension; when
                 specified it should be an array which will receive
                 the separators between the corresponding fields; the
                 return value is the number of components found
   patsplit(s,a,p,x) similar to split, but p is a regexp pattern
                 specifying field contents rather than a separator;
                 if not specified, the value of FPAT is used; this
                 function is a gawk extension
   sprintf(f,e,...) format expression(s) e using format string f and
                 return the result as a string; formatting is similar
                 to the printf function
   sub(r,t,s)  search string target s for regular expression r, and
                 if a match is found, replace the matching text with
                 substring t, then store the result back in s; if s
                 is omitted, use $0 for the string; the result is
                 either 1 if a match+substitution was made, or 0
                 otherwise; if substring t contains the character
                 '&', the text which matched the regular expression
                 is used instead of '&' [to suppress this feature
                 of '&', 'quote' it with a backslash (\); since this
                 will be inside a quoted string which will receive
                 'backslash' processing before being passed to sub(),
                 *two* consecutive backslashes will be needed "\\&"]
   gsub(r,t,s) similar to sub(), but gsub() replaces all nonoverlapping
                 substrings instead of just the first, and the return
                 value is the number of substitutions made
   gensub(r,t,n,s) search string s ($0 if omitted) for regexp r and
                 replace the n'th occurrence with substring t; the
                 result is the new string and s (or $0) remains
                 unchanged; if n begins with letter "g" or "G" then
                 all matches are replaced instead of just the n'th;
                 if r has parenthesized subexpressions in it, t may
                 contain the special sequences \\0, \\1, through \\9
                 which expand into the value of the corresponding
                 subexpression; this function is a gawk extension
   substr(s,p,l) extract a substring l characters long starting at
                 offset p in string s; l is optional, if omitted then
                 the remainder of the string (p thru end) is returned
   tolower(s)  return a copy of string s in which every uppercase
                 letter has been converted into lowercase
   toupper(s)  analogous to tolower(); convert lowercase to uppercase
   strtonum(s) convert string s into the corresponding number; if s
                 begins with "0x", the rest of the string will be
                 considered to be hexacimal digits, otherwise if it
                 begins with "0" (not "o"), the rest will be treated
                 as octal digits; this function is a gawk extension
4 array_functions
   isarray(a)  returns 1 of a is an array, 0 otherwise; most useful
                 when traversing an array which might contain array
                 values (sub-arrays)
   split(s,a[,f[,x]]) break string s into components based on field
                 separator f and store them in array a (into elements
                 [1], [2], and so on); the third argument is optional,
                 if omitted, the value of FS is used; the fourth one
                 is optional too, and is a gawk extension; when
                 specified it should be an array which will receive
                 the separators between the corresponding fields; the
                 return value is the number of components found
   patsplit(s,a[,p[,x]]) similar to split, but p is a regexp pattern
                 specifying field contents rather than a separator;
                 if not specified, the value of FPAT is used; this
                 function is a gawk extension
   asort(s[,d[,m]]) sort the contents of array s, replacing the index
                 values with an integer sequence of 1 to N; if d is
                 specified, leave the indices of s intact and put the
                 values and sequence index into d; if m is specified,
                 it should be a string containing "ascending" or
                 "descending" to control order, or "string" or "number"
                 to control how comparisons are performed, or a
                 combination of the two; m can also be a comparison
                 function similar to ones used by PROCINFO["sorted_in"]
   asorti(s[,d[,m]]) sort the indices of array s, replacing the values
                 with an integer sequence of 1 to N; if d is specified,
                 leave the values of s intact and put the indices and
                 sequence values into d; m is the same as for asort()
4 time_functions
 Builtin time functions
   systime()   return the current time of day as the number of seconds
                 since some reference point; on VMS the reference point
                 is January 1, 1970, at 12 AM local time (not UTC)
   mktime(s)   convert string s into number of seconds since the
                 reference point; s should contain a value of the form
                 "yyyy mm dd hh mm ss[ dst]" where yyyy is a four digit
                 year, mm a month number from 1 to 12, dd day-of-month
                 number from 1 to 31, hh hour 0 to 23, mm minute 0 to
                 59, ss second 0 to 60, and [ dst] is an optional flag
                 to handle daylight savings time: if dst is positive,
                 then daylight savins time is in effect, if zero, then
                 it isn't, and if negative or omitted, gawk attempts
                 to determine whether it was--or will be--at specified
                 date and time
   strftime(f,t,u) format time value t using format f; if it is omitted
                 then PROCINFO["strftime"] is used; if t is omitted,
                 the default is systime(); if u is present and non-zero
                 then t is treated as a UTC value, otherwise it is
                 considered to be local time

5 time_logical_names
 Gawk needs the SYS$TIMEZONE_RULE or TZ logical names defined or it will
 output the time in the GMT timezone.
5 time_formats
 Formatting directives similar to the 'printf' & 'sprintf' functions
 (each is introduced in the format string by preceding it with a
 percent sign (%)); the directive is substituted by the corresponding
 value
   a   abbreviated weekday name (Sun,Mon,Tue,Wed,Thu,Fri,Sat)
   A   full weekday name
   b   abbreviated month name (Jan,Feb,...)
   B   full month name
   c   date and time (Unix-style "aaa bbb dd HH:MM:SS YYYY" format)
   C   century prefix (19 or 20) [not century number, ie 20th]
   d   day of month as two digit decimal number (01-31)
   D   date in mm/dd/yy format
   e   day of month with leading space instead of leading 0 ( 1-31)
   E   ignored; following format character used
   H   hour (24 hour clock) as two digit number (00-23)
   h   abbreviated month name (Jan,Feb,...) [same as %b]
   I   hour (12 hour clock) as two digit number (01-12)
   j   day of year as three digit number (001-366)
   m   month as two digit number (01-12)
   M   minute as two digit number (00-59)
   n   'newline' (ie, treat %n as \n)
   O   ignored; following format character used
   p   AM/PM designation for 12 hour clock
   r   time in AM/PM format ("II:MM:SS p")
   R   time without seconds ("HH:MM")
   S   second as two digit number (00-59)
   t   tab (ie, treat %t as \t)
   T   time ("HH:MM:SS")
   U   week of year (00-53) [first Sunday is first day of week 1]
   V   date (VMS-style "dd-bbb-YYYY" with 'bbb' forced to uppercase)
   w   weekday as decimal digit (0 [Sunday] through 6 [Saturday])
   W   week of year (00-53) [first _Monday_ is first day of week 1]
   x   date ("aaa bbb dd YYYY")
   X   time ("HH:MM:SS")
   y   year without century (00-99)
   Y   year with century (19yy-20yy)
   Z   time zone name (always "local" for VMS)
   %   literal percent sign (%)
4 IO_functions
 Builtin I/O functions
   print x,... print the values of one or more expressions; if none
                 are listed, $0 is used; parentheses are optional;
                 when multiple values are printed, the current value
                 of builtin OFS (default is 1 space) is used to
                 separate them; the print line is implicitly
                 terminated with the current value of ORS (default
                 is newline); print does not have a return value
   printf(f,x,...) print the values of one or more expressions, using
                 the specified format string; null strings are used
                 to supply missing values (if any); no between field
                 or trailing newline characters are printed, they
                 should be specified within the format string; the
                 argument-enclosing parentheses are optional;
                 printf does not have a return value
   getline v   read a record into variable v; if v is omitted, $0 is
                 used (and NF, NR, and FNR are updated); if v is
                 specified, then field-splitting won't be performed;
                 note:  parentheses around the argument are *not*
                 allowed; return value is 1 for successful read, 0
                 if end of file is encountered, or -1 if some sort
                 of error occurred; [see 'redirection' for several
                 variants]
   close(s)    close a file or pipe specified by the string s; the
                 string used should have the same value as the one
                 used in a getline or print/printf redirection
   fflush(s)   flush output stream s; if s is omitted, stdout is
                 flushed; if it is specified but its value is an
                 empty string, all output streams are flushed
   system(s)   pass string s to executed by the operating system;
                 the command string is executed in a subprocess
5 redirection
 Both getline and print/printf support variant forms which use
 redirection and pipes.

 To read from a file (instead of from the primary input file), use
     getline var < "file"
 or  getline < "file"    (read into $0)
 where the string "file" represents either an actual file name (in
 quotes) or a variable which contains a file name string value or an
 expression which evaluates to a string filename.

 To create a pipe executing some command and read the result into
 a variable (or into $0), use
     "command" | getline var
 or  "command" | getline    (read into $0)
 where "command" is a literal string containing an operating system
 command or a variable with a string value representing such a
 command.

 To output into a file other that the primary output, use
     print x,... > "file"    (or >> "file")
 or  printf(f,x,...) > "file"    (or >> "file")
 similar to the 'getline' example above.  '>>' causes output to be
 appended to an existing file if it exists, or create the file if
 it doesn't already exist.  '>' always creates a new file.  The
 alternate redirection method of '>$' (for RMS text file attributes)
 is *only* available on the command line, not with 'print' or
 'printf' in  the current release.

 To output an error message, use 'print' or 'printf' and redirect
 the output to file "/dev/stderr" (or equivalently to "SYS$ERROR:"
 on VMS).  'stderr' will normally be the user's terminal, even if
 ordinary output is being redirected into a file.

 To feed awk output into another command, use
     print x,... | "command"    (similarly for 'printf')
 similar to the second 'getline' example.  In this case, output
 from awk will be passed as input to the specified operating system
 command.  The command must be capable of reading input from 'stdin'
 ("SYS$INPUT:" on VMS) in order to receive data in this manner.

 The 'close' function operates on the "file" or "command" argument
 specified here (either a literal string or a variable or expression
 resulting in a string value).  It completely closes the file or
 pipe so that further references to the same file or command string
 would re-open that file or command at the beginning.  Closing a
 pipe or redirection also releases some file-oriented resources.

 Note:  the VMS implementation of GAWK uses temporary files to
 simulate pipes, so a command must finish before 'getline' can get
 any input from it, and 'close' must be called for an output pipe
 before any data can be passed to the specified command.
5 formats
 Formatting characters used by the 'printf' and 'sprintf' functions
 (each is introduced in the format string by preceding it with a
 percent sign (%))
   %   include a literal percent sign (%) in the result
   c   format the next argument as a single ASCII character
         (prints first character of string argument, or corresponding
         ASCII character if numeric argument, e.g. 65 is 'A')
   s   format the next argument as a string (numeric arguments are
         converted into strings on demand)
   d   decimal number (ie, integer value in base 10)
   i   integer (equivalent to decimal)
   o   octal number (integer in base 8)
   x   hexadecimal number (integer in base 16) [lowercase]
   X   hexadecimal number [digits 'A' thru 'E' in uppercase]
   f   floating point number (digits, decimal point, fraction digits)
   e   exponential (scientific notation) number (digit, decimal
         point, fraction digits, letter 'e', sign '+' or '-',
         exponent digits)
   g   'fractional' number in either 'e' or 'f' format, whichever
         produces shorter result

 Several optional modifiers can be placed between the initiating
 percent sign and the format character (doesn't apply to %%).
   -   left justify (only matters when width specifier is present)
     (space) for numeric specifiers, prefix nonnegative values with
         a space and negative values with a minus sign
   +   for numeric specifiers, prefix nonnegative values with a plus
         sign and negative values with a minus sign
   #   alternate form applicable to several of the format characters
       (o, x, X, e, E, f, g, G)
   NN  width ['NN' represents 1 or more decimal digits]; actually
         minimum width to use, longer items will not be truncated; a
         leading 0 will cause right-justified numbers to be padded on
         the left with zeroes instead of spaces when they're aligned
   .MM precision [decimal point followed by 1 or more digits]; used
         as maximum width for strings (causing truncation if they're
         actually longer) or as number of fraction digits for 'f' or
         'e' numeric formats, or number of significant digits for 'g'
         numeric format
4 bitwise_functions
   Bitwise functions operate on bits (binary digits) of integer
   numeric values.  Non-integer numbers are converted into integers
   before their bits are accessed.

   and(x,y)    x AND y, where result contains 1 for bits that both x
         and y have set, 0 for other bits
   or(x,y)     x OR y, where the result contains 1 for any bits that
         either x or y or both have set, 0 for other bits
   xor(x,y)    x XOR y, where the result contains 1 for bits that x
         has set but y has clear or vice versa, 0 for other bits
   compl(x)    NOT x, where the result contains 1 for bits that x
         has clear and 0 for bits that it has set
   lshift(x,n) x << n, shift the bits of x by n positions left,
         approximately the same as x * 2^n
   rshift(x,n) x >> n, shift the bits of x by n positions right,
         approximately the same as int(x / 2^n)

   The set of bitwise functions is a gawk extension.
4 user_defined_functions
 User-defined functions may be created as needed to simplify awk
 programs or to collect commonly used code into one place.  The
 general syntax of a user-defined function is the 'function' keyword
 followed by unique function name, followed by a comma-separated
 parameter list enclosed in parentheses, followed by statement(s)
 enclosed within braces ({}).  A 'return' statement is customary
 but is not required.
       function FuncName(arg1,arg2) {
           # arbitrary statements
           return (arg1 + arg2) / 2
       }
 If a function does not use 'return' to specify an output value, the
 result received by the caller will be unpredictable.

 Functions may be placed in an awk program before, between, or after
 the pattern-action rules.  The abbreviation 'func' may be used in
 place of 'function', unless POSIX compatibility mode is in effect.
4 indirect_function_calls
 A gawk extension allows you to assign a string containing the name
 of a function to a variable, then call the function by preceding
 the variable with @ (at-sign) and following with the parenthesized
 argument list.  For example

 function my_max(x, y) { return (x > y) ? x : y }
 function my_min(x, y) { return (x < y) ? x : y }
 ...
 max_or_min = some_criterion ? "my_max" : "my_min"
 ...
 c = @max_or_min(a, b)

 would call either my_max() or my_min() depending upon the value of
 some_criterion at the time max_or_min was assigned.

 Indirect function calls only operate on user-defined functions, not
 on built-in ones.  If you need to use one of the latter, create a
 user-defined function to call the built-in function; this if often
 referred to as a "wrapper" function.
3 regular_expressions
 A regular expression is a shorthand way of specifying a 'wildcard'
 type of string comparison.  Regular expression matching is very
 fundamental to awk's operation.

 Meta symbols
   ^   matches beginning of line or beginning of string; note that
         embedded newlines ('\n') create multi-line strings, so
         beginning of line is not necessarily beginning of string
   $   matches end of line or end of string
   .   any single character (except newline)
   [ ] set of characters; [ABC] matches either 'A' or 'B' or 'C'; a
         dash (other than first or last of the set) denotes a range
         of characters: [A-Z] matches any upper case letter; if the
         first character of the set is '^', then the sense of match
         is reversed: [^0-9] matches any non-digit; several
         characters need to be quoted with backslash (\) if they
         occur in a set:  '\', ']', '-', and '^'; within sets,
         various special character class designations are recognized,
         such as [:digit:] and [:punct:], as per POSIX
   |   alternation (similar to boolean 'or'); match either of two
         patterns [for example "^start|stop$" matches leading 'start'
         or trailing 'stop']
   ( ) grouping, alter normal precedence [for example, "^(start|stop)$"
         matches lines reading either 'start' or 'stop']
   *   repeated matching; when placed after a pattern, indicates that
         the pattern should match any number of times [for example,
         "[a-z][0-9]*" matches a lower case letter followed by zero or
         more digits]
   +   repeated matching; when placed after a pattern, indicates that
         the pattern should match one or more times ["[0-9]+" matches
         any non-empty sequence of digits]
   ?   optional matching; indicates that the pattern can match zero or
         one times ["[a-z][0-9]?" matches lower case letter alone or
         followed by a single digit]
   { } interval specification; {n} to match n times or {m,n} to match
         at least m but not more than n times; only functional when
         either the `-Wposix' or `-Wre-interval' options are used
   \   quote; prevent the character which follows from having special
         meaning; if the regexp is specified as a string, then the
         backslash itself will need to be quoted by preceding it with
         another backslash

 A regular expression which matches a string or line will match against
 the first (left-most) substring which meets the pattern and include
 the longest sequence of characters which still meets that pattern.
3 comments
 Comments in awk programs are introduced with '#'.  Anything after
 '#' on a line is ignored by GAWK.  It's a good idea to include an
 explanation of what an awk program is doing and also who wrote it
 and when.
3 further_information
 For complete documentation on GAWK, see "Effective AWK Programming"
 by Arnold Robbins.  The second edition (ISBN 1-57831-000-8) is jointly
 published by SSC and the FSF (http://www.ssc.com).

 Source text for it is present in the file GAWK.TEXI.  A postscript
 version is available via anonymous FTP from host gnudist.gnu.org in
 directory /gnu/gawk, file gawk-{version}-doc.tar.gz where {version}
 would be the current version number, such as 3.0.6.

 Another source of documentation is "The AWK Programming Language"
 by Aho, Weinberger, and Kernighan (1988), published by Addison-Wesley.
 ISBN code is 0-201-07981-X.

 Each of these works contains both a reference on the awk language
 and a tutorial on awk's use, with many sample programs.
3 authors
 The awk programming language was originally created by Alfred V. Aho,
 Peter J. Weinberger, and Brian W. Kernighan in 1977.  The language
 was revised and enhanced in a new version which was released in 1985.

 GAWK, the GNU implementation of awk, was written in 1986 by Paul Rubin
 and Jay Fenlason, with advice from Richard Stallman, and with
 contributions from John Woods.  In 1988 and 1989, David Trueman and
 Arnold Robbins revised GAWK for compatibility with the newer awk.
 Arnold Robbins is the current maintainer.

 GAWK version 2.11.1 was ported to VMS by Pat Rankin in November, 1989,
 with further revisions in the Spring of 1990.  The VMS port was
 incorporated into the official GNU distribution of version 2.13 in
 Spring 1991.  (Version 2.12 was never publically released.)
2 release_notes
 GAWK 4.0.0 has many changes from 3.1.8, and these release_notes were
 not updated for any of the 3.1.* releases, so some information is
 probably missing or out of date.  In particular, the known_problems
 subtopic hasn't been touched in many years.
3 AWK_LIBRARY
 GAWK uses a built in search path when looking for a program file
 specified by the -f option (or the /input qualifier) when that file
 name does not include a device and/or directory.  GAWK will first
 look in the current default directory, then if the file wasn't found
 it will look in the directory specified by the translation of logical
 name "AWK_LIBRARY".
3 known_problems
 There are several known problems with GAWK running on VMS.  Some can
 be ignored, others require work-arounds.
4 file_formats
 If a file having the RMS attribute "Fortran carriage control" is
 read as input, it will generate an empty first record if the first
 actual record begins with a space (leading space becomes a newline).
 Also, the last record of the file will give a "record not terminated"
 warning.  Both of these minor problems are due to the way that the
 C Run-Time Library (VAXCRTL) converts record attributes.

 Another poor feature without a work-around is that there's no way to
 specify "append if possible, create with RMS text attributes if not"
 with the current command line I/O redirection.  '>>$' isn't supported.
 Ditto for binary output; '>>+' isn't supported.
4 RS_peculiarities
 Changing the record separator to something other than newline ('\n')
 will produce anomalous results for ordinary files.  For example,
 using RS = "\f" and FS = "\n" with the following input
       |rec 1, line 1
       |rec 1, line 2
       |^L    (form feed)
       |rec 2, line 1
       |rec 2, line 2
       |^L    (form feed)
       |rec 3, line 1
       |rec 3, line 2
       |(end of file)
 will produce two fields for record 1, but three fields each for
 records 2 and 3.  This is because the form-feed record delimiter is
 on its own line, so awk sees a newline after it.  Since newline is
 now a field separator, records 2 and 3 will have null first fields.
 The following awk code will work-around this problem by inserting
 a null first field in the first record, so that all records can be
 handled the same by subsequent processing.
       # fix up for first record (RS != "\n")
       FNR == 1  { if ( $0 == "" )     #leading separator
                     next              #skip its null record
                   else                #otherwise,
                     $0 = FS $0        #realign fields
                 }
 There is a second problem with this same example.  It will always
 trigger a "record not terminated" warning when it reaches the end of
 file.  In the sample shown, there is no final separator; however, if
 a trailing form-feed were present, it would produce a spurious final
 record with two null fields.  This occurs because the I/O system
 sees an implicit newline at the end of the last record, so awk sees
 a pair of null fields separated by that newline.  The following code
 fragment will fix that provided there are no null records (in this
 case, that would be two consecutive lines containing just form-feeds).
       # fix up for last record (RS != "\n")
       $0 == FS  { next }      #drop spurious final record
 Note that the "record not terminated" warning will persist.
4 cmd_inconsistency
 The DCL qualifier /OUTPUT is internally equivalent to '>$' output
 redirection, but the qualifier /INPUT corresponds to the -f option
 rather than to '<' input redirection.
4 exit
 The exit statement can optionally pass a final status value to the
 operating system.  GAWK expects a UN*X-style value instead of a
 VMS status value, so 0 indicates success.  A failure is indicated
 by 1 and VMS will set the ERROR status.  A fatal error is indicated
 by 2 and VMS will set the FATAL status.  All other values will will have
 the SUCCESS status.  The exit value is encoded to comply with VMS
 coding standards and will have the C_FACILITY_NO of 0x350000 with
 the constant 0xA000 added to the number shifted over by 3 bits to
 make room for the severity codes.

 To extract the actual gawk exit code from the VMS status use:
     unix_status = (vms_status .and. &x7f8) / 8

 A C program that uses exec() to call gawk will get the original
 UN*X-style exit value.

 Older versions of Gawk treated Unix exit code 0 as 1, A failure as
 2, and a fatal error as 4, and passed all the other numbers through.
 This violated the VMS exit status coding requirements.

4 rounding
 VAX/VMS floating point uses unbiased rounding.  This is different than
 what portable gawk programs expect.

3 changes
 Changes between version 4.0.0 and earlier versions

   [This 'changes' section hasn't been updated in many releases.  Some
   features mentioned here may have become available in versions 3.1.*.]

   General
     dgawk.exe does interactive debugging of awk programs
     pgawk.exe does comprehensive execution profiling of awk programs
     pgawk.exe is not currently supplied for VMS.
     -d[file] and -p[file] options added
     -Wcompat and -Wusage options dropped; use -Wtraditional and -Whelp
     BEGINFILE and ENDFILE built-in rule patterns
     nextfile statement skips remainder of current input file
     switch-case statement performs an alternate form of if-then-else
     indirect function calls: var="user_function"; @var(args)

     FPAT regexp pattern as alternative to FS field splitting
     patsplit() function, FPAT analog to split()
     PROCINFO["sorted_in"] can be used to control traversal order for
       'for (index in array)' statement
     asort(), asorti() functions, to sort arrays
     sub-arrays: array element values can be arrays
     isarray() function, to test whether a value is an array

     PROCINFO["strftime"] can be used to supply default format for
       date/time formatting by strftime() function
     mktime() function, to convert list of separate date and time fields
       into single numeric date/time value
     and(), or(), xor(), compl(), lshift(), rshift() functions, to
       perform bit-wise logic operations on numeric values
     strtonum() function, to convert string of digits into number, with
       support for radix prefix '0' (octal) and '0x' (hexadecimal)

   VMS-specific
     The VMS exit codes now correctly encode the gawk exit status and
     the VMS severity bits are set.
     Large file support is enabled on the platforms that support it.
     Extended filename support is enabled on the platforms that support it.
     New command qualifiers: /EXTRA_COMMANDS, /PROFILE, /DUMP_VARIABLES,
       /OPTIMIZE, /TRADITIONAL, /SANDBOX, /NON_DECIMAL_DATA
     Revised qualifier: /LINT, takes optional argument list
     Deprecated qualifier: /STRICT, superseded by /TRADITIONAL
3 prior_changes
 Changes between version 3.1.8 and [...] and 3.0.6

   [Someday someone ought to dig up and document this information....]

 Changes between version 3.0.6 and 2.15.6

   General
     RS can contain multiple characters or be a regexp
     Regular expression interval support added
     gensub() and fflush() functions added
     memory leak(s) introduced in 3.0.2 or 3.0.1 fixed
     the user manual has been substantially revised

   VMS-specific
     Switched to build with DEC C by default
 Changes between version 2.15.6 and 2.14

   General
     Many obscure bugs fixed
     `delete' may operate on an entire array
     ARGIND and ERRNO builtin variables added

   VMS-specific
     `>+ file' binary-mode output redirection added
     /variable=(foo=42) fixed
     Floating point number formatting improved

 Changes between version 2.14 and 2.13.2:

   General
     'next file' construct added
     'continue' outside of any loop is treated as 'next'
     Assorted bug fixes and efficiency improvements
     _The_GAWK_Manual_ updated
     Test suite expanded

   VMS-specific
     VMS POSIX support added
     Disk I/O throughput enhanced
     Pipe emulation improved and incorrect interaction with user-mode
         redefinition of SYS$OUTPUT eliminated

 Changes between version 2.13 and 2.11.1:  (2.12 was not released)

   General
     CONVFMT and FIELDWIDTHS builtin control variables added
     systime() and strftime() date/time functions added
     'lint' and 'posix' run-time options added
     '-W' command line option syntax supercedes '-c', '-C', and '-V'
     '-a' and '-e' regular expression options made obsolete
     Various bug fixes and efficiency improvements
     More platforms supported ('officially' including VMS)

   VMS-specific
     %g printf format fixed
     Handling of '\' on command line modified; no longer necessary to
         double it up
     Problem redirecting stderr (>&efile) at same time as stdin (<ifile)
         or stdout (>ofile) has been fixed
     ``2>&1'' and ``1>&2'' redirection constructs added
     Interaction between command line I/O redirection and gawk pipes
         fixed; also, name used for pseudo-pipe temporary file expanded
3 license
 GAWK is covered by the "GNU General Public License", the gist of which
 is that if you supply this software to a third party, you are expressly
 forbidden to prevent them from supplying it to a fourth party, and if
 you supply binaries you must make the source code available to them
 at no additional cost.  Any revisions or modified versions are also
 covered by the same license.  There is no warranty, express or implied,
 for this software.  It is provided "as is."

 [Disclaimer:  This is just an informal summary with no legal basis;
 refer to the actual GNU General Public License for specific details.]
!2 examples
!