1463
1464
from @command{awk}, and with a little help from me, set about adding
1464
1465
features to do this for @command{gawk}. At that time, he also
1465
1466
wrote the bulk of
1466
@cite{TCP/IP Internetworking with @command{gawk}}
1467
@cite{@value{GAWKINETTITLE}}
1467
1468
(a separate document, available as part of the @command{gawk} distribution).
1468
1469
His code finally became part of the main @command{gawk} distribution
1469
1470
with @command{gawk} @value{PVERSION} 3.1.
25834
25835
This @value{CHAPTER} discusses advanced features in @command{gawk}.
25835
25836
It's a bit of a ``grab bag'' of items that are otherwise unrelated
25836
25837
to each other.
25837
First, a command-line option allows @command{gawk} to recognize
25838
First, we look at a command-line option that allows @command{gawk} to recognize
25838
25839
nondecimal numbers in input data, not just in @command{awk}
25840
25841
Then, @command{gawk}'s special features for sorting arrays are presented.
25841
25842
Next, two-way I/O, discussed briefly in earlier parts of this
25842
25843
@value{DOCUMENT}, is described in full detail, along with the basics
25843
of TCP/IP networking. Finally, @command{gawk}
25844
of TCP/IP networking. Finally, we see how @command{gawk}
25844
25845
can @dfn{profile} an @command{awk} program, making it possible to tune
25845
25846
it for performance.
25847
25848
@c FULLXREF ON
25848
A number of advanced features require separate @value{CHAPTER}s of their
25849
Additional advanced features are discussed in separate @value{CHAPTER}s of their
25851
25852
@itemize @value{BULLET}
25939
25940
@node Array Sorting
25940
25941
@section Controlling Array Traversal and Array Sorting
25942
@command{gawk} lets you control the order in which a @samp{for (i in array)}
25943
@command{gawk} lets you control the order in which a
25944
@samp{for (@var{indx} in @var{array})}
25943
25945
loop traverses an array.
25945
25947
In addition, two built-in functions, @code{asort()} and @code{asorti()},
25955
25957
@node Controlling Array Traversal
25956
25958
@subsection Controlling Array Traversal
25958
By default, the order in which a @samp{for (i in array)} loop
25960
By default, the order in which a @samp{for (@var{indx} in @var{array})} loop
25959
25961
scans an array is not defined; it is generally based upon
25960
25962
the internal implementation of arrays inside @command{awk}.
25987
Here, @var{i1} and @var{i2} are the indices, and @var{v1} and @var{v2}
25989
Here, @code{i1} and @code{i2} are the indices, and @code{v1} and @code{v2}
25988
25990
are the corresponding values of the two elements being compared.
25989
Either @var{v1} or @var{v2}, or both, can be arrays if the array being
25991
Either @code{v1} or @code{v2}, or both, can be arrays if the array being
25990
25992
traversed contains subarrays as values.
25991
25993
(@DBXREF{Arrays of Arrays} for more information about subarrays.)
25992
25994
The three possible return values are interpreted as follows:
25995
25997
@item comp_func(i1, v1, i2, v2) < 0
25996
Index @var{i1} comes before index @var{i2} during loop traversal.
25998
Index @code{i1} comes before index @code{i2} during loop traversal.
25998
26000
@item comp_func(i1, v1, i2, v2) == 0
25999
Indices @var{i1} and @var{i2}
26000
come together but the relative order with respect to each other is undefined.
26001
Indices @code{i1} and @code{i2}
26002
come together, but the relative order with respect to each other is undefined.
26002
26004
@item comp_func(i1, v1, i2, v2) > 0
26003
Index @var{i1} comes after index @var{i2} during loop traversal.
26005
Index @code{i1} comes after index @code{i2} during loop traversal.
26006
26008
Our first comparison function can be used to scan an array in
26161
26163
elements compare equal. This is usually not a problem, but letting
26162
26164
the tied elements come out in arbitrary order can be an issue, especially
26163
26165
when comparing item values. The partial ordering of the equal elements
26164
may change the next time the array is traversed, if other elements are added or
26166
may change the next time the array is traversed, if other elements are added to or
26165
26167
removed from the array. One way to resolve ties when comparing elements
26166
26168
with otherwise equal values is to include the indices in the comparison
26167
26169
rules. Note that doing this may make the loop traversal less efficient,
26204
26206
Another point to keep in mind is that in the case of subarrays,
26205
26207
the element values can themselves be arrays; a production comparison
26206
26208
function should use the @code{isarray()} function
26207
(@pxref{Type Functions}),
26209
(@pxref{Type Functions})
26208
26210
to check for this, and choose a defined sorting order for subarrays.
26210
26212
All sorting based on @code{PROCINFO["sorted_in"]}
26212
26214
because the @code{PROCINFO} array is not special in that case.
26214
26216
As a side note, sorting the array indices before traversing
26215
the array has been reported to add 15% to 20% overhead to the
26217
the array has been reported to add a 15% to 20% overhead to the
26216
26218
execution time of @command{awk} programs. For this reason,
26217
26219
sorted array traversal is not the default.
26271
26273
Often, what's needed is to sort on the values of the @emph{indices}
26272
26274
instead of the values of the elements. To do that, use the
26273
26275
@code{asorti()} function. The interface and behavior are identical to
26274
that of @code{asort()}, except that the index values are used for sorting,
26276
that of @code{asort()}, except that the index values are used for sorting
26275
26277
and become the values of the result array:
26306
26308
or both. This is extremely powerful.
26308
26310
Once the array is sorted, @code{asort()} takes the @emph{values} in
26309
their final order, and uses them to fill in the result array, whereas
26310
@code{asorti()} takes the @emph{indices} in their final order, and uses
26311
their final order and uses them to fill in the result array, whereas
26312
@code{asorti()} takes the @emph{indices} in their final order and uses
26311
26313
them to fill in the result array.
26313
26315
@cindex reference counting, sorting arrays
26604
26606
@cindex @command{gawk}, @code{ERRNO} variable in
26605
26607
@cindex @code{ERRNO} variable
26606
26608
@quotation NOTE
26607
Failure in opening a two-way socket will result in a non-fatal error
26609
Failure in opening a two-way socket will result in a nonfatal error
26608
26610
being returned to the calling code. The value of @code{ERRNO} indicates
26609
26611
the error (@pxref{Auto-set}).
26610
26612
@end quotation
26623
26625
This program reads the current date and time from the local system's
26624
TCP @samp{daytime} server.
26626
TCP @code{daytime} server.
26625
26627
It then prints the results and closes the connection.
26627
26629
Because this topic is extensive, the use of @command{gawk} for
26628
26630
TCP/IP programming is documented separately.
26631
@inforef{Top, , General Introduction, gawkinet, TCP/IP Internetworking with @command{gawk}},
26633
@inforef{Top, , General Introduction, gawkinet, @value{GAWKINETTITLE}},
26635
26637
@uref{http://www.gnu.org/software/gawk/manual/gawkinet/,
26636
@cite{TCP/IP Internetworking with @command{gawk}}},
26638
@cite{@value{GAWKINETTITLE}}},
26637
26639
which comes as part of the @command{gawk} distribution,
26638
26640
@end ifnotinfo
26639
26641
for a much more complete introduction and discussion, as well as
26711
26713
Here is the @file{awkprof.out} that results from running the
26712
@command{gawk} profiler on this program and data. (This example also
26714
@command{gawk} profiler on this program and data (this example also
26713
26715
illustrates that @command{awk} programmers sometimes get up very early
26714
in the morning to work.)
26716
in the morning to work):
26716
26718
@cindex @code{BEGIN} pattern, and profiling
26717
26719
@cindex @code{END} pattern, and profiling
26772
26774
The program is printed in the order @code{BEGIN} rules,
26773
26775
@code{BEGINFILE} rules,
26774
pattern/action rules,
26775
@code{ENDFILE} rules, @code{END} rules and functions, listed
26776
pattern--action rules,
26777
@code{ENDFILE} rules, @code{END} rules, and functions, listed
26776
26778
alphabetically.
26777
26779
Multiple @code{BEGIN} and @code{END} rules retain their
26778
26780
separate identities, as do
26781
26783
@cindex patterns, counts, in a profile
26783
Pattern-action rules have two counts.
26785
Pattern--action rules have two counts.
26784
26786
The first count, to the left of the rule, shows how many times
26785
26787
the rule's pattern was @emph{tested}.
26786
26788
The second count, to the right of the rule's opening left brace
26847
26849
@command{gawk} supplies leading comments in
26848
26850
front of the @code{BEGIN} and @code{END} rules,
26849
26851
the @code{BEGINFILE} and @code{ENDFILE} rules,
26850
the pattern/action rules, and the functions.
26852
the pattern--action rules, and the functions.
26854
26856
The profiled version of your program may not look exactly like what you
26855
26857
typed when you wrote it. This is because @command{gawk} creates the
26856
profiled version by ``pretty printing'' its internal representation of
26858
profiled version by ``pretty-printing'' its internal representation of
26857
26859
the program. The advantage to this is that @command{gawk} can produce
26858
26860
a standard representation.
26859
26861
Also, things such as:
26936
26938
@cindex @code{SIGQUIT} signal (MS-Windows)
26937
26939
@cindex signals, @code{QUIT}/@code{SIGQUIT} (MS-Windows)
26938
26940
When @command{gawk} runs on MS-Windows systems, it uses the
26939
@code{INT} and @code{QUIT} signals for producing the profile and, in
26941
@code{INT} and @code{QUIT} signals for producing the profile, and in
26940
26942
the case of the @code{INT} signal, @command{gawk} exits. This is
26941
26943
because these systems don't support the @command{kill} command, so the
26942
26944
only signals you can deliver to a program are those generated by the
26943
26945
keyboard. The @code{INT} signal is generated by the
26944
@kbd{Ctrl-@key{C}} or @kbd{Ctrl-@key{BREAK}} key, while the
26945
@code{QUIT} signal is generated by the @kbd{Ctrl-@key{\}} key.
26946
@kbd{Ctrl-c} or @kbd{Ctrl-BREAK} key, while the
26947
@code{QUIT} signal is generated by the @kbd{Ctrl-\} key.
26947
26949
Finally, @command{gawk} also accepts another option, @option{--pretty-print}.
26948
When called this way, @command{gawk} ``pretty prints'' the program into
26950
When called this way, @command{gawk} ``pretty-prints'' the program into
26949
26951
@file{awkprof.out}, without any execution counts.
26951
26953
@quotation NOTE
27001
27003
By using special @value{FN}s with the @samp{|&} operator, you can open a
27002
TCP/IP (or UDP/IP) connection to remote hosts in the Internet. @command{gawk}
27004
TCP/IP (or UDP/IP) connection to remote hosts on the Internet. @command{gawk}
27003
27005
supports both IPv4 and IPv6.
27009
27011
@command{gawk} to dump the profile and keep going, including a function call stack.
27012
You can also just ``pretty print'' the program. This currently also runs
27014
You can also just ``pretty-print'' the program. This currently also runs
27013
27015
the program, but that will change in the next major release.
36270
36272
@item doc/gawkinet.texi
36271
36273
The Texinfo source file for
36273
@inforef{Top, , General Introduction, gawkinet, TCP/IP Internetworking with @command{gawk}}.
36275
@inforef{Top, , General Introduction, gawkinet, @value{GAWKINETTITLE}}.
36276
@cite{TCP/IP Internetworking with @command{gawk}}.
36278
@cite{@value{GAWKINETTITLE}}.
36277
36279
@end ifnotinfo
36278
36280
It should be processed with @TeX{}
36279
36281
(via @command{texi2dvi} or @command{texi2pdf})