27204
28817
Look forward to a future release when these and other missing features may
27205
28818
be added, and of course feel free to try to add them yourself!
28820
@node Debugging Summary
28823
@itemize @value{BULLET}
28825
Programs rarely work correctly the first time. Finding bugs
28826
is @dfn{debugging} and a program that helps you find bugs is a
28827
@dfn{debugger}. @command{gawk} has a built-in debugger that works very
28828
similarly to the GNU Debugger, GDB.
28831
Debuggers let you step through your program one statement at a time,
28832
examine and change variable and array values, and do a number of other
28833
things that let understand what your program is actually doing (as
28834
opposed to what it is supposed to do).
28837
Like most debuggers, the @command{gawk} debugger works in terms of stack
28838
frames, and lets you set both breakpoints (stop at a point in the code)
28839
and watchpoints (stop when a data value changes).
28842
The debugger command set is fairly complete, providing control over
28843
breakpoints, execution, viewing and changing data, working with the stack,
28844
getting information, and other tasks.
28847
If the @code{readline} library is available when @command{gawk} is
28848
compiled, it is used by the debugger to provide command-line history
27207
28853
@node Arbitrary Precision Arithmetic
27208
28854
@chapter Arithmetic and Arbitrary Precision Arithmetic with @command{gawk}
27209
28855
@cindex arbitrary precision
27210
28856
@cindex multiple precision
27211
28857
@cindex infinite precision
27212
@cindex floating-point numbers, arbitrary precision
27216
@cindex Knuth, Donald
27218
@i{There's a credibility gap: We don't know how much of the computer's answers
27219
to believe. Novice computer users solve this problem by implicitly trusting
27220
in the computer as an infallible authority; they tend to believe that all
27221
digits of a printed answer are significant. Disillusioned computer users have
27222
just the opposite approach; they are constantly afraid that their answers
27223
are almost meaningless.}@footnote{Donald E.@: Knuth.
27224
@cite{The Art of Computer Programming}. Volume 2,
27225
@cite{Seminumerical Algorithms}, third edition,
27226
1998, ISBN 0-201-89683-4, p.@: 229.}
27227
@author Donald Knuth
27230
This @value{CHAPTER} discusses issues that you may encounter
27231
when performing arithmetic. It begins by discussing some of
27232
the general attributes of computer arithmetic, along with how
27233
this can influence what you see when running @command{awk} programs.
27234
This discussion applies to all versions of @command{awk}.
27236
The @value{CHAPTER} then moves on to describe @dfn{arbitrary precision
27237
arithmetic}, a feature which is specific to @command{gawk}.
28858
@cindex floating-point, numbers@comma{} arbitrary precision
28860
This @value{CHAPTER} introduces some basic concepts relating to
28861
how computers do arithmetic and briefly lists the features in
28862
@command{gawk} for performing arbitrary precision floating point
28863
computations. It then proceeds to describe floating-point arithmetic,
28864
which is what @command{awk} uses for all its computations, including a
28865
discussion of arbitrary precision floating point arithmetic, which is
28866
a feature available only in @command{gawk}. It continues on to present
28867
arbitrary precision integers, and concludes with a description of some
28868
points where @command{gawk} and the POSIX standard are not quite in
27240
* General Arithmetic:: An introduction to computer arithmetic.
27241
* Floating-point Programming:: Effective Floating-point Programming.
27242
* Gawk and MPFR:: How @command{gawk} provides
27243
arbitrary-precision arithmetic.
27244
* Arbitrary Precision Floats:: Arbitrary Precision Floating-point Arithmetic
27245
with @command{gawk}.
27246
* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with
28872
* Computer Arithmetic:: A quick intro to computer math.
28873
* Math Definitions:: Defining terms used.
28874
* MPFR features:: The MPFR features in @command{gawk}.
28875
* FP Math Caution:: Things to know.
28876
* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with
28878
* POSIX Floating Point Problems:: Standards Versus Existing Practice.
28879
* Floating point summary:: Summary of floating point discussion.
27250
@node General Arithmetic
28882
@node Computer Arithmetic
27251
28883
@section A General Description of Computer Arithmetic
27254
@cindex floating-point, numbers
27255
@cindex numbers, floating-point
27256
Within computers, there are two kinds of numeric values: @dfn{integers}
27257
and @dfn{floating-point}.
27258
In school, integer values were referred to as ``whole'' numbers---that is,
27259
numbers without any fractional part, such as 1, 42, or @minus{}17.
28885
Until now, we have worked with data as either numbers or
28886
strings. Ultimately, however, computers represent everything in terms
28887
of @dfn{binary digits}, or @dfn{bits}. A decimal digit can take on any
28888
of 10 values: zero through nine. A binary digit can take on any of two
28889
values, zero or one. Using binary, computers (and computer software)
28890
can represent and manipulate numerical and character data. In general,
28891
the more bits you can use to represent a particular thing, the greater
28892
the range of possible values it can take on.
28894
Modern computers support at least two, and often more, ways to do
28895
arithmetic. Each kind of arithmetic uses a different representation
28896
(organization of the bits) for the numbers. The kinds of arithmetic
28897
that interest us are:
28900
@item Decimal arithmetic
28901
This is the kind of arithmetic you learned in elementary school, using
28902
paper and pencil (and/or a calculator). In theory, numbers can have an
28903
arbitrary number of digits on either side (or both sides) of the decimal
28904
point, and the results of a computation are always exact.
28906
Some modern system can do decimal arithmetic in hardware, but usually you
28907
need a special software library to provide access to these instructions.
28908
There are also libraries that do decimal arithmetic entirely in software.
28910
Despite the fact that some users expect @command{gawk} to be performing
28911
decimal arithmetic,@footnote{We don't know why they expect this, but
28912
they do.} it does not do so.
28914
@item Integer arithmetic
28915
In school, integer values were referred to as ``whole'' numbers---that
28916
is, numbers without any fractional part, such as 1, 42, or @minus{}17.
27260
28917
The advantage to integer numbers is that they represent values exactly.
27261
The disadvantage is that their range is limited. On most systems,
27262
this range is @minus{}2,147,483,648 to 2,147,483,647.
27263
However, many systems now support a range from
27264
@minus{}9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
28918
The disadvantage is that their range is limited.
27266
28920
@cindex unsigned integers
27267
28921
@cindex integers, unsigned
27268
Integer values come in two flavors: @dfn{signed} and @dfn{unsigned}.
27269
Signed values may be negative or positive, with the range of values just
27271
Unsigned values are always positive. On most systems,
27272
the range is from 0 to 4,294,967,295.
27273
However, many systems now support a range from
27274
0 to 18,446,744,073,709,551,615.
27276
@cindex double precision floating-point
27277
@cindex single precision floating-point
27278
Floating-point numbers represent what are called ``real'' numbers; i.e.,
27279
those that do have a fractional part, such as 3.1415927.
27280
The advantage to floating-point numbers is that they
27281
can represent a much larger range of values.
27282
The disadvantage is that there are numbers that they cannot represent
27284
@command{awk} uses @dfn{double precision} floating-point numbers, which
27285
can hold more digits than @dfn{single precision}
27286
floating-point numbers.
27287
@c Floating-point issues are discussed more fully in
27288
@c @ref{Floating Point Issues}.
27290
There a several important issues to be aware of, described next.
27293
* Floating Point Issues:: Stuff to know about floating-point numbers.
27294
* Integer Programming:: Effective integer programming.
27297
@node Floating Point Issues
27298
@subsection Floating-Point Number Caveats
27300
This @value{SECTION} describes some of the issues
27301
involved in using floating-point numbers.
27303
There is a very nice
27304
@uref{http://www.validlab.com/goldberg/paper.pdf, paper on floating-point arithmetic}
27306
``What Every Computer Scientist Should Know About Floating-point Arithmetic,''
27307
@cite{ACM Computing Surveys} @strong{23}, 1 (1991-03), 5-48.
27308
This is worth reading if you are interested in the details,
27309
but it does require a background in computer science.
27312
* String Conversion Precision:: The String Value Can Lie.
27313
* Unexpected Results:: Floating Point Numbers Are Not Abstract
27315
* POSIX Floating Point Problems:: Standards Versus Existing Practice.
27318
@node String Conversion Precision
27319
@subsubsection The String Value Can Lie
27321
Internally, @command{awk} keeps both the numeric value
27322
(double precision floating-point) and the string value for a variable.
27323
Separately, @command{awk} keeps
27324
track of what type the variable has
27325
(@pxref{Typing and Comparison}),
27326
which plays a role in how variables are used in comparisons.
27328
It is important to note that the string value for a number may not
27329
reflect the full value (all the digits) that the numeric value
27331
The following program, @file{values.awk}, illustrates this:
27336
# see it for what it is
27337
printf("sum = %.12g\n", sum)
27347
This program shows the full value of the sum of @code{$1} and @code{$2}
27348
using @code{printf}, and then prints the string values obtained
27349
from both automatic conversion (via @code{CONVFMT}) and
27350
from printing (via @code{OFMT}).
27352
Here is what happens when the program is run:
27355
$ @kbd{echo 3.654321 1.2345678 | awk -f values.awk}
27356
@print{} sum = 4.8888888
27357
@print{} a = <4.88889>
27358
@print{} sum = 4.88889
27361
This makes it clear that the full numeric value is different from
27362
what the default string representations show.
27364
@code{CONVFMT}'s default value is @code{"%.6g"}, which yields a value with
27365
at most six significant digits. For some applications, you might want to
27366
change it to specify more precision.
27367
On most modern machines, most of the time,
27368
17 digits is enough to capture a floating-point number's
27369
value exactly.@footnote{Pathological cases can require up to
27370
752 digits (!), but we doubt that you need to worry about this.}
27372
@node Unexpected Results
27373
@subsubsection Floating Point Numbers Are Not Abstract Numbers
27375
@cindex floating-point, numbers
27376
Unlike numbers in the abstract sense (such as what you studied in high school
27377
or college arithmetic), numbers stored in computers are limited in certain ways.
27378
They cannot represent an infinite number of digits, nor can they always
27379
represent things exactly.
27381
floating-point numbers cannot
27382
always represent values exactly. Here is an example:
27385
$ @kbd{awk '@{ printf("%010d\n", $1 * 100) @}'}
27387
@print{} 0000051579
27389
@print{} 0000051579
27391
@print{} 0000051580
27393
@print{} 0000051582
27398
This shows that some values can be represented exactly,
27399
whereas others are only approximated. This is not a ``bug''
27400
in @command{awk}, but simply an artifact of how computers
27404
It cannot be emphasized enough that the behavior just
27405
described is fundamental to modern computers. You will
27406
see this kind of thing happen in @emph{any} programming
27407
language using hardware floating-point numbers. It is @emph{not}
27408
a bug in @command{gawk}, nor is it something that can be ``just
27412
@cindex negative zero
27413
@cindex positive zero
27414
@cindex zero@comma{} negative vs.@: positive
27415
Another peculiarity of floating-point numbers on modern systems
27416
is that they often have more than one representation for the number zero!
27417
In particular, it is possible to represent ``minus zero'' as well as
27418
regular, or ``positive'' zero.
27420
This example shows that negative and positive zero are distinct values
27421
when stored internally, but that they are in fact equal to each other,
27422
as well as to ``regular'' zero:
27425
$ @kbd{gawk 'BEGIN @{ mz = -0 ; pz = 0}
27426
> @kbd{printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz}
27427
> @kbd{printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0}
27429
@print{} -0 = -0, +0 = 0, (-0 == +0) -> 1
27430
@print{} mz == 0 -> 1, pz == 0 -> 1
27433
It helps to keep this in mind should you process numeric data
27434
that contains negative zero values; the fact that the zero is negative
27435
is noted and can affect comparisons.
27437
@node POSIX Floating Point Problems
27438
@subsubsection Standards Versus Existing Practice
27440
Historically, @command{awk} has converted any non-numeric looking string
27441
to the numeric value zero, when required. Furthermore, the original
27442
definition of the language and the original POSIX standards specified that
27443
@command{awk} only understands decimal numbers (base 10), and not octal
27444
(base 8) or hexadecimal numbers (base 16).
27446
Changes in the language of the
27447
2001 and 2004 POSIX standards can be interpreted to imply that @command{awk}
27448
should support additional features. These features are:
27452
Interpretation of floating point data values specified in hexadecimal
27453
notation (@samp{0xDEADBEEF}). (Note: data values, @emph{not}
27454
source code constants.)
27457
Support for the special IEEE 754 floating point values ``Not A Number''
27458
(NaN), positive Infinity (``inf'') and negative Infinity (``@minus{}inf'').
27459
In particular, the format for these values is as specified by the ISO 1999
27460
C standard, which ignores case and can allow machine-dependent additional
27461
characters after the @samp{nan} and allow either @samp{inf} or @samp{infinity}.
27464
The first problem is that both of these are clear changes to historical
27469
The @command{gawk} maintainer feels that supporting hexadecimal floating
27470
point values, in particular, is ugly, and was never intended by the
27471
original designers to be part of the language.
27474
Allowing completely alphabetic strings to have valid numeric
27475
values is also a very severe departure from historical practice.
27478
The second problem is that the @code{gawk} maintainer feels that this
27479
interpretation of the standard, which requires a certain amount of
27480
``language lawyering'' to arrive at in the first place, was not even
27481
intended by the standard developers. In other words, ``we see how you
27482
got where you are, but we don't think that that's where you want to be.''
27484
Recognizing the above issues, but attempting to provide compatibility
27485
with the earlier versions of the standard,
27486
the 2008 POSIX standard added explicit wording to allow, but not require,
27487
that @command{awk} support hexadecimal floating point values and
27488
special values for ``Not A Number'' and infinity.
27490
Although the @command{gawk} maintainer continues to feel that
27491
providing those features is inadvisable,
27492
nevertheless, on systems that support IEEE floating point, it seems
27493
reasonable to provide @emph{some} way to support NaN and Infinity values.
27494
The solution implemented in @command{gawk} is as follows:
27498
With the @option{--posix} command-line option, @command{gawk} becomes
27499
``hands off.'' String values are passed directly to the system library's
27500
@code{strtod()} function, and if it successfully returns a numeric value,
27501
that is what's used.@footnote{You asked for it, you got it.}
27502
By definition, the results are not portable across
27503
different systems. They are also a little surprising:
27506
$ @kbd{echo nanny | gawk --posix '@{ print $1 + 0 @}'}
27508
$ @kbd{echo 0xDeadBeef | gawk --posix '@{ print $1 + 0 @}'}
27509
@print{} 3735928559
27513
Without @option{--posix}, @command{gawk} interprets the four strings
27519
specially, producing the corresponding special numeric values.
27520
The leading sign acts a signal to @command{gawk} (and the user)
27521
that the value is really numeric. Hexadecimal floating point is
27522
not supported (unless you also use @option{--non-decimal-data},
27523
which is @emph{not} recommended). For example:
27526
$ @kbd{echo nanny | gawk '@{ print $1 + 0 @}'}
27528
$ @kbd{echo +nan | gawk '@{ print $1 + 0 @}'}
27530
$ @kbd{echo 0xDeadBeef | gawk '@{ print $1 + 0 @}'}
27534
@command{gawk} does ignore case in the four special values.
27535
Thus @samp{+nan} and @samp{+NaN} are the same.
27538
@node Integer Programming
27539
@subsection Mixing Integers And Floating-point
27541
As has been mentioned already, @command{awk} uses hardware double
27542
precision with 64-bit IEEE binary floating-point representation
27543
for numbers on most systems. A large integer like 9,007,199,254,740,997
27544
has a binary representation that, although finite, is more than 53 bits long;
27545
it must also be rounded to 53 bits.
27546
The biggest integer that can be stored in a C @code{double} is usually the same
27547
as the largest possible value of a @code{double}. If your system @code{double}
27548
is an IEEE 64-bit @code{double}, this largest possible value is an integer and
27549
can be represented precisely. What more should one know about integers?
27551
If you want to know what is the largest integer, such that it and
27552
all smaller integers can be stored in 64-bit doubles without losing precision,
27560
The next representable number is the even number
27567
meaning it is unlikely that you will be able to make
27568
@command{gawk} print
27576
The range of integers exactly representable by a 64-bit double
27579
@math{[-2^{53}, 2^{53}]}.
27582
[@minus{}2^53, 2^53].
27584
If you ever see an integer outside this range in @command{awk}
27585
using 64-bit doubles, you have reason to be very suspicious about
27586
the accuracy of the output. Here is a simple program with erroneous output:
27589
$ @kbd{gawk 'BEGIN @{ i = 2^53 - 1; for (j = 0; j < 4; j++) print i + j @}'}
27590
@print{} 9007199254740991
27591
@print{} 9007199254740992
27592
@print{} 9007199254740992
27593
@print{} 9007199254740994
27596
The lesson is to not assume that any large integer printed by @command{awk}
27597
represents an exact result from your computation, especially if it wraps
27598
around on your screen.
27600
@node Floating-point Programming
27601
@section Understanding Floating-point Programming
27603
Numerical programming is an extensive area; if you need to develop
27604
sophisticated numerical algorithms then @command{gawk} may not be
27605
the ideal tool, and this documentation may not be sufficient.
27606
It might require digesting a book or two@footnote{One recommended title is
27607
@cite{Numerical Computing with IEEE Floating Point Arithmetic}, Michael L.@:
27608
Overton, Society for Industrial and Applied Mathematics, 2004.
27609
ISBN: 0-89871-482-6, ISBN-13: 978-0-89871-482-1. See
27610
@uref{http://www.cs.nyu.edu/cs/faculty/overton/book}.}
27611
to really internalize how to compute
27612
with ideal accuracy and precision,
27613
and the result often depends on the particular application.
27616
A floating-point calculation's @dfn{accuracy} is how close it comes
27617
to the real value. This is as opposed to the @dfn{precision}, which
27618
usually refers to the number of bits used to represent the number
27619
(see @uref{http://en.wikipedia.org/wiki/Accuracy_and_precision,
27620
the Wikipedia article} for more information).
27623
There are two options for doing floating-point calculations:
27624
hardware floating-point (as used by standard @command{awk} and
27625
the default for @command{gawk}), and @dfn{arbitrary-precision}
27626
floating-point, which is software based.
27627
From this point forward, this @value{CHAPTER}
27628
aims to provide enough information to understand both, and then
27629
will focus on @command{gawk}'s facilities for the latter.@footnote{If you
27630
are interested in other tools that perform arbitrary precision arithmetic,
27631
you may want to investigate the POSIX @command{bc} tool. See
27632
@uref{http://pubs.opengroup.org/onlinepubs/009695399/utilities/bc.html,
27633
the POSIX specification for it}, for more information.}
28922
In computers, integer values come in two flavors: @dfn{signed} and
28923
@dfn{unsigned}. Signed values may be negative or positive, whereas
28924
unsigned values are always positive (that is, greater than or equal
28927
In computer systems, integer arithmetic is exact, but the possible
28928
range of values is limited. Integer arithmetic is generally faster than
28929
floating point arithmetic.
28931
@item Floating point arithmetic
28932
Floating-point numbers represent what were called in school ``real''
28933
numbers; i.e., those that have a fractional part, such as 3.1415927.
28934
The advantage to floating-point numbers is that they can represent a
28935
much larger range of values than can integers. The disadvantage is that
28936
there are numbers that they cannot represent exactly.
28938
Modern systems support floating point arithmetic in hardware, with a
28939
limited range of values. There are software libraries that allow
28940
the use of arbitrary precision floating point calculations.
28942
POSIX @command{awk} uses @dfn{double precision} floating-point numbers, which
28943
can hold more digits than @dfn{single precision} floating-point numbers.
28944
@command{gawk} has facilities for performing arbitrary precision floating
28945
point arithmetic, which we describe in more detail shortly.
28948
Computers work with integer and floating point values of different
28949
ranges. Integer values are usually either 32 or 64 bits in size. Single
28950
precision floating point values occupy 32 bits, whereas double precision
28951
floating point values occupy 64 bits. Floating point values are always
28952
signed. The possible ranges of values are shown in the following table.
28954
@multitable @columnfractions .34 .33 .33
28955
@headitem Numeric representation @tab Miniumum value @tab Maximum value
28956
@item 32-bit signed integer @tab @minus{}2,147,483,648 @tab 2,147,483,647
28957
@item 32-bit unsigned integer @tab 0 @tab 4,294,967,295
28958
@item 64-bit signed integer @tab @minus{}9,223,372,036,854,775,808 @tab 9,223,372,036,854,775,807
28959
@item 64-bit unsigned integer @tab 0 @tab 18,446,744,073,709,551,615
28960
@item Single precision floating point (approximate) @tab @code{1.175494e-38} @tab @code{3.402823e+38}
28961
@item Double precision floating point (approximate) @tab @code{2.225074e-308} @tab @code{1.797693e+308}
28964
@node Math Definitions
28965
@section Other Stuff To Know
28967
The rest of this @value{CHAPTER} uses a number of terms. Here are some
28968
informal definitions that should help you work your way through the material
28973
A floating-point calculation's accuracy is how close it comes
28974
to the real (paper and pencil) value.
28977
The difference between what the result of a computation ``should be''
28978
and what it actually is. It is best to minimize error as much
28982
The order of magnitude of a value;
28983
some number of bits in a floating-point value store the exponent.
28986
A special value representing infinity. Operations involving another
28987
number and infinity produce infinity.
28990
``Not A Number.'' A special value indicating a result that can't
28991
happen in real math, but that can happen in floating-point computations.
28994
How the significand (see later in this list) is usually stored. The
28995
value is adjusted so that the first bit is one, and then that leading
28996
one is assumed instead of physically stored. This provides one
28997
extra bit of precision.
29000
The number of bits used to represent a floating-point number.
29001
The more bits, the more digits you can represent.
29002
Binary and decimal precisions are related approximately, according to the
29007
@math{prec = 3.322 @cdot dps}
29011
@var{prec} = 3.322 * @var{dps}
29015
<emphasis>prec</emphasis> = 3.322 ⋅ <emphasis>dps</emphasis> @c
29020
Here, @var{prec} denotes the binary precision
29021
(measured in bits) and @var{dps} (short for decimal places)
29022
is the decimal digits.
29024
@item Rounding mode
29025
How numbers are rounded up or down when necessary.
29026
More details are provided later.
29029
A floating point value consists the significand multiplied by 10
29030
to the power of the exponent. For example, in @code{1.2345e67},
29031
the significand is @code{1.2345}.
29034
From @uref{http://en.wikipedia.org/wiki/Numerical_stability,
29035
the Wikipedia article on numerical stability}:
29036
``Calculations that can be proven not to magnify approximation errors
29037
are called @dfn{numerically stable}.''
29040
See @uref{http://en.wikipedia.org/wiki/Accuracy_and_precision,
29041
the Wikipedia article on accuracy and precision} for more information
29042
on some of those terms.
29044
On modern systems, floating-point hardware uses the representation and
29045
operations defined by the IEEE 754 standard.
29046
Three of the standard IEEE 754 types are 32-bit single precision,
29047
64-bit double precision and 128-bit quadruple precision.
29048
The standard also specifies extended precision formats
29049
to allow greater precisions and larger exponent ranges.
29050
(@command{awk} uses only the 64-bit double precision format.)
29052
@ref{table-ieee-formats} lists the precision and exponent
29053
field values for the basic IEEE 754 binary formats:
29055
@float Table,table-ieee-formats
29056
@caption{Basic IEEE Format Context Values}
29057
@multitable @columnfractions .20 .20 .20 .20 .20
29058
@headitem Name @tab Total bits @tab Precision @tab emin @tab emax
29059
@item Single @tab 32 @tab 24 @tab @minus{}126 @tab +127
29060
@item Double @tab 64 @tab 53 @tab @minus{}1022 @tab +1023
29061
@item Quadruple @tab 128 @tab 113 @tab @minus{}16382 @tab +16383
29066
The precision numbers include the implied leading one that gives them
29067
one extra bit of significand.
29070
@node MPFR features
29071
@section Arbitrary Precison Arithmetic Features In @command{gawk}
29073
By default, @command{gawk} uses the double precision floating point values
29074
supplied by the hardware of the system it runs on. However, if it was
29075
compiled to do, @command{gawk} uses the @uref{http://www.mpfr.org, GNU
29076
MPFR} and @uref{http://gmplib.org, GNU MP} (GMP) libraries for arbitrary
29077
precision arithmetic on numbers. You can see if MPFR support is available
29081
$ @kbd{gawk --version}
29082
@print{} GNU Awk 4.1.1, API: 1.1 (GNU MPFR 3.1.0-p3, GNU MP 5.0.2)
29083
@print{} Copyright (C) 1989, 1991-2014 Free Software Foundation.
29088
(You may see different version numbers than what's shown here. That's OK;
29089
what's important is to see that GNU MPFR and GNU MP are listed in
29092
Additionally, there are a few elements available in the @code{PROCINFO}
29093
array to provide information about the MPFR and GMP libraries
29094
(@pxref{Auto-set}).
29096
The MPFR library provides precise control over precisions and rounding
29097
modes, and gives correctly rounded, reproducible, platform-independent
29098
results. With either of the command-line options @option{--bignum} or
29099
@option{-M}, all floating-point arithmetic operators and numeric functions
29100
can yield results to any desired precision level supported by MPFR.
29102
Two built-in variables, @code{PREC} and @code{ROUNDMODE},
29103
provide control over the working precision and the rounding mode.
29104
The precision and the rounding mode are set globally for every operation
29106
@xref{Auto-set}, for more information.
29108
@node FP Math Caution
29109
@section Floating Point Arithmetic: Caveat Emptor!
29112
Math class is tough!
29113
@author Late 1980's Barbie
29116
This @value{SECTION} provides a high level overview of the issues
29117
involved when doing lots of floating-point arithmetic.@footnote{There
29118
is a very nice @uref{http://www.validlab.com/goldberg/paper.pdf,
29119
paper on floating-point arithmetic} by David Goldberg, ``What Every
29120
Computer Scientist Should Know About Floating-point Arithmetic,''
29121
@cite{ACM Computing Surveys} @strong{23}, 1 (1991-03), 5-48. This is
29122
worth reading if you are interested in the details, but it does require
29123
a background in computer science.}
29124
The discussion applies to both hardware and arbitrary-precision
29125
floating-point arithmetic.
29128
The material here is purposely general. If you need to do serious
29129
computer arithmetic, you should do some research first, and not
29130
rely just on what we tell you.
29134
* Inexactness of computations:: Floating point math is not exact.
29135
* Getting Accuracy:: Getting more accuracy takes some work.
29136
* Try To Round:: Add digits and round.
29137
* Setting precision:: How to set the precision.
29138
* Setting the rounding mode:: How to set the rounding mode.
29141
@node Inexactness of computations
29142
@subsection Floating Point Arithmetic Is Not Exact
27635
29144
Binary floating-point representations and arithmetic are inexact.
27636
29145
Simple values like 0.1 cannot be precisely represented using
27759
29363
@print{} 3.141592653589797
27762
There is no need to be unduly suspicious about the results from
27763
floating-point arithmetic. The lesson to remember is that
27764
floating-point arithmetic is always more complex than arithmetic using
27765
pencil and paper. In order to take advantage of the power
27766
of computer floating-point, you need to know its limitations
27767
and work within them. For most casual use of floating-point arithmetic,
27768
you will often get the expected result in the end if you simply round
27769
the display of your final results to the correct number of significant
27772
As general advice, avoid presenting numerical data in a manner that
27773
implies better precision than is actually the case.
27776
* Floating-point Representation:: Binary floating-point representation.
27777
* Floating-point Context:: Floating-point context.
27778
* Rounding Mode:: Floating-point rounding mode.
27781
@node Floating-point Representation
27782
@subsection Binary Floating-point Representation
27783
@cindex IEEE-754 format
27785
Although floating-point representations vary from machine to machine,
27786
the most commonly encountered representation is that defined by the
27787
IEEE 754 Standard. An IEEE-754 format value has three components:
27791
A sign bit telling whether the number is positive or negative.
27794
An @dfn{exponent}, @var{e}, giving its order of magnitude.
27797
A @dfn{significand}, @var{s},
27798
specifying the actual digits of the number.
27804
@math{s @cdot 2^e}.
27809
The first bit of a non-zero binary significand
27810
is always one, so the significand in an IEEE-754 format only includes the
27811
fractional part, leaving the leading one implicit.
27812
The significand is stored in @dfn{normalized} format,
27813
which means that the first bit is always a one.
27815
Three of the standard IEEE-754 types are 32-bit single precision,
27816
64-bit double precision and 128-bit quadruple precision.
27817
The standard also specifies extended precision formats
27818
to allow greater precisions and larger exponent ranges.
27820
@node Floating-point Context
27821
@subsection Floating-point Context
27822
@cindex context, floating-point
27824
A floating-point @dfn{context} defines the environment for arithmetic operations.
27825
It governs precision, sets rules for rounding, and limits the range for exponents.
27826
The context has the following primary components:
27830
Precision of the floating-point format in bits.
27833
Maximum exponent allowed for the format.
27836
Minimum exponent allowed for the format.
27838
@item Underflow behavior
27839
The format may or may not support gradual underflow.
27842
The rounding mode of the context.
27845
@ref{table-ieee-formats} lists the precision and exponent
27846
field values for the basic IEEE-754 binary formats:
27848
@float Table,table-ieee-formats
27849
@caption{Basic IEEE Format Context Values}
27850
@multitable @columnfractions .20 .20 .20 .20 .20
27851
@headitem Name @tab Total bits @tab Precision @tab emin @tab emax
27852
@item Single @tab 32 @tab 24 @tab @minus{}126 @tab +127
27853
@item Double @tab 64 @tab 53 @tab @minus{}1022 @tab +1023
27854
@item Quadruple @tab 128 @tab 113 @tab @minus{}16382 @tab +16383
27859
The precision numbers include the implied leading one that gives them
27860
one extra bit of significand.
27863
A floating-point context can also determine which signals are treated
27864
as exceptions, and can set rules for arithmetic with special values.
27865
Please consult the IEEE-754 standard or other resources for details.
27867
@command{gawk} ordinarily uses the hardware double precision
27868
representation for numbers. On most systems, this is IEEE-754
27869
floating-point format, corresponding to 64-bit binary with 53 bits
27873
In case an underflow occurs, the standard allows, but does not require,
27874
the result from an arithmetic operation to be a number smaller than
27875
the smallest nonzero normalized number. Such numbers do
27876
not have as many significant digits as normal numbers, and are called
27877
@dfn{denormals} or @dfn{subnormals}. The alternative, simply returning a zero,
27878
is called @dfn{flush to zero}. The basic IEEE-754 binary formats
27879
support subnormal numbers.
27882
@node Rounding Mode
27883
@subsection Floating-point Rounding Mode
27884
@cindex rounding mode, floating-point
27886
The @dfn{rounding mode} specifies the behavior for the results of numerical
27887
operations when discarding extra precision. Each rounding mode indicates
27888
how the least significant returned digit of a rounded result is to
27890
@ref{table-rounding-modes} lists the IEEE-754 defined
27893
@float Table,table-rounding-modes
27894
@caption{IEEE 754 Rounding Modes}
27895
@multitable @columnfractions .45 .55
27896
@headitem Rounding Mode @tab IEEE Name
27897
@item Round to nearest, ties to even @tab @code{roundTiesToEven}
27898
@item Round toward plus Infinity @tab @code{roundTowardPositive}
27899
@item Round toward negative Infinity @tab @code{roundTowardNegative}
27900
@item Round toward zero @tab @code{roundTowardZero}
27901
@item Round to nearest, ties away from zero @tab @code{roundTiesToAway}
27905
The default mode @code{roundTiesToEven} is the most preferred,
27906
but the least intuitive. This method does the obvious thing for most values,
27907
by rounding them up or down to the nearest digit.
27908
For example, rounding 1.132 to two digits yields 1.13,
27909
and rounding 1.157 yields 1.16.
27911
However, when it comes to rounding a value that is exactly halfway between,
27912
things do not work the way you probably learned in school.
27913
In this case, the number is rounded to the nearest even digit.
27914
So rounding 0.125 to two digits rounds down to 0.12,
27915
but rounding 0.6875 to three digits rounds up to 0.688.
27916
You probably have already encountered this rounding mode when
27917
using @code{printf} to format floating-point numbers.
27923
for (i = 1; i < 10; i++) @{
27925
printf("%4.1f => %2.0f\n", x, x)
27931
produces the following output when run on the author's system:@footnote{It
27932
is possible for the output to be completely different if the
27933
C library in your system does not use the IEEE-754 even-rounding
27934
rule to round halfway cases for @code{printf}.}
27948
The theory behind the rounding mode @code{roundTiesToEven} is that
27949
it more or less evenly distributes upward and downward rounds
27950
of exact halves, which might cause any round-off error
27951
to cancel itself out. This is the default rounding mode used
27952
in IEEE-754 computing functions and operators.
27954
The other rounding modes are rarely used.
27955
Round toward positive infinity (@code{roundTowardPositive})
27956
and round toward negative infinity (@code{roundTowardNegative})
27957
are often used to implement interval arithmetic,
27958
where you adjust the rounding mode to calculate upper and lower bounds
27959
for the range of output. The @code{roundTowardZero}
27960
mode can be used for converting floating-point numbers to integers.
27961
The rounding mode @code{roundTiesToAway} rounds the result to the
27962
nearest number and selects the number with the larger magnitude
27965
Some numerical analysts will tell you that your choice of rounding style
27966
has tremendous impact on the final outcome, and advise you to wait until
27967
final output for any rounding. Instead, you can often avoid round-off error problems by
27968
setting the precision initially to some value sufficiently larger than
27969
the final desired precision, so that the accumulation of round-off error
27970
does not influence the outcome.
27971
If you suspect that results from your computation are
27972
sensitive to accumulation of round-off error,
27973
one way to be sure is to look for a significant difference in output
27974
when you change the rounding mode.
27976
@node Gawk and MPFR
27977
@section @command{gawk} + MPFR = Powerful Arithmetic
27979
The rest of this @value{CHAPTER} describes how to use the arbitrary precision
27980
(also known as @dfn{multiple precision} or @dfn{infinite precision}) numeric
27981
capabilities in @command{gawk} to produce maximally accurate results
27984
But first you should check if your version of
27985
@command{gawk} supports arbitrary precision arithmetic.
27986
The easiest way to find out is to look at the output of
27987
the following command:
27990
$ @kbd{gawk --version}
27991
@print{} GNU Awk 4.1.0, API: 1.0 (GNU MPFR 3.1.0-p3, GNU MP 5.0.2)
27992
@print{} Copyright (C) 1989, 1991-2013 Free Software Foundation.
27996
@command{gawk} uses the
27997
@uref{http://www.mpfr.org, GNU MPFR}
27999
@uref{http://gmplib.org, GNU MP} (GMP)
28000
libraries for arbitrary precision
28001
arithmetic on numbers. So if you do not see the names of these libraries
28002
in the output, then your version of @command{gawk} does not support
28003
arbitrary precision arithmetic.
28006
there are a few elements available in the @code{PROCINFO} array
28007
to provide information about the MPFR and GMP libraries.
28008
@xref{Auto-set}, for more information.
28011
Even if you aren't interested in arbitrary precision arithmetic, you
28012
may still benefit from knowing about how @command{gawk} handles numbers
28013
in general, and the limitations of doing arithmetic with ordinary
28014
@command{gawk} numbers.
28018
@node Arbitrary Precision Floats
28019
@section Arbitrary Precision Floating-point Arithmetic with @command{gawk}
28021
@command{gawk} uses the GNU MPFR library
28022
for arbitrary precision floating-point arithmetic. The MPFR library
28023
provides precise control over precisions and rounding modes, and gives
28024
correctly rounded, reproducible, platform-independent results. With one
28025
of the command-line options @option{--bignum} or @option{-M},
28026
all floating-point arithmetic operators and numeric functions can yield
28027
results to any desired precision level supported by MPFR.
28028
Two built-in variables, @code{PREC} and @code{ROUNDMODE},
28029
provide control over the working precision and the rounding mode
28030
(@pxref{Setting Precision}, and
28031
@pxref{Setting Rounding Mode}).
28032
The precision and the rounding mode are set globally for every operation
28035
The default working precision for arbitrary precision floating-point values is
28036
53 bits, and the default value for @code{ROUNDMODE} is @code{"N"},
28037
which selects the IEEE-754 @code{roundTiesToEven} rounding mode
28038
(@pxref{Rounding Mode}).@footnote{The
28039
default precision is 53 bits, since according to the MPFR documentation,
28040
the library should be able to exactly reproduce all computations with
28041
double-precision machine floating-point numbers (@code{double} type
28042
in C), except the default exponent range is much wider and subnormal
28043
numbers are not implemented.}
28044
@command{gawk} uses the default exponent range in MPFR
28046
(@math{emax = 2^{30} - 1, emin = -emax})
28049
(@var{emax} = 2^30 @minus{} 1, @var{emin} = @minus{}@var{emax})
28051
for all floating-point contexts.
28052
There is no explicit mechanism to adjust the exponent range.
28053
MPFR does not implement subnormal numbers by default,
28054
and this behavior cannot be changed in @command{gawk}.
28057
When emulating an IEEE-754 format (@pxref{Setting Precision}),
28058
@command{gawk} internally adjusts the exponent range
28059
to the value defined for the format and also performs computations needed for
28060
gradual underflow (subnormal numbers).
28064
MPFR numbers are variable-size entities, consuming only as much space as
28065
needed to store the significant digits. Since the performance using MPFR
28066
numbers pales in comparison to doing arithmetic using the underlying machine
28067
types, you should consider using only as much precision as needed by
28072
* Setting Precision:: Setting the working precision.
28073
* Setting Rounding Mode:: Setting the rounding mode.
28074
* Floating-point Constants:: Representing floating-point constants.
28075
* Changing Precision:: Changing the precision of a number.
28076
* Exact Arithmetic:: Exact arithmetic with floating-point numbers.
28079
@node Setting Precision
28080
@subsection Setting the Working Precision
28081
@cindex @code{PREC} variable
29366
@node Setting precision
29367
@subsection Setting The Precision
28083
29369
@command{gawk} uses a global working precision; it does not keep track of
28084
29370
the precision or accuracy of individual numbers. Performing an arithmetic
28085
29371
operation or calling a built-in function rounds the result to the current
28086
working precision. The default working precision is 53 bits, which can be
28087
modified using the built-in variable @code{PREC}. You can also set the
28088
value to one of the pre-defined case-insensitive strings
29372
working precision. The default working precision is 53 bits, which you can
29373
modify using the built-in variable @code{PREC}. You can also set the
29374
value to one of the predefined case-insensitive strings
28089
29375
shown in @ref{table-predefined-precision-strings},
28090
to emulate an IEEE-754 binary format.
29376
to emulate an IEEE 754 binary format.
28092
29378
@float Table,table-predefined-precision-strings
28093
@caption{Predefined precision strings for @code{PREC}}
29379
@caption{Predefined Precision Strings For @code{PREC}}
28094
29380
@multitable {@code{"double"}} {12345678901234567890123456789012345}
28095
@headitem @code{PREC} @tab IEEE-754 Binary Format
29381
@headitem @code{PREC} @tab IEEE 754 Binary Format
28096
29382
@item @code{"half"} @tab 16-bit half-precision.
28097
29383
@item @code{"single"} @tab Basic 32-bit single precision.
28098
29384
@item @code{"double"} @tab Basic 64-bit double precision.
28172
29443
@end multitable
28175
@code{ROUNDMODE} has the default value @code{"N"},
28176
which selects the IEEE-754 rounding mode @code{roundTiesToEven}.
28177
In @ref{table-gawk-rounding-modes}, @code{"A"} is listed to select the IEEE-754 mode
28178
@code{roundTiesToAway}. This is only available
28179
if your version of the MPFR library supports it; otherwise setting
28180
@code{ROUNDMODE} to this value has no effect. @xref{Rounding Mode},
28181
for the meanings of the various rounding modes.
28183
Here is an example of how to change the default rounding behavior of
28184
@code{printf}'s output:
28187
$ @kbd{gawk -M -v ROUNDMODE="Z" 'BEGIN @{ printf("%.2f\n", 1.378) @}'}
28191
@node Floating-point Constants
28192
@subsection Representing Floating-point Constants
28193
@cindex constants, floating-point
28195
Be wary of floating-point constants! When reading a floating-point constant
28196
from program source code, @command{gawk} uses the default precision,
28198
by an assignment to the special variable @code{PREC} on the command
28199
line, to store it internally as a MPFR number.
28200
Changing the precision using @code{PREC} in the program text does
28201
@emph{not} change the precision of a constant. If you need to
28202
represent a floating-point constant at a higher precision than the
28203
default and cannot use a command line assignment to @code{PREC},
28204
you should either specify the constant as a string, or
28205
as a rational number, whenever possible. The following example
28206
illustrates the differences among various ways to
28207
print a floating-point constant:
28210
$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 0.1) @}'}
28211
@print{} 0.1000000000000000055511151
28212
$ @kbd{gawk -M -v PREC=113 'BEGIN @{ printf("%0.25f\n", 0.1) @}'}
28213
@print{} 0.1000000000000000000000000
28214
$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", "0.1") @}'}
28215
@print{} 0.1000000000000000000000000
28216
$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 1/10) @}'}
28217
@print{} 0.1000000000000000000000000
28220
In the first case, the number is stored with the default precision of 53 bits.
28222
@node Changing Precision
28223
@subsection Changing the Precision of a Number
28225
@cindex Laurie, Dirk
28227
@i{The point is that in any variable-precision package,
28228
a decision is made on how to treat numbers given as data,
28229
or arising in intermediate results, which are represented in
28230
floating-point format to a precision lower than working precision.
28231
Do we promote them to full membership of the high-precision club,
28232
or do we treat them and all their associates as second-class citizens?
28233
Sometimes the first course is proper, sometimes the second, and it takes
28234
careful analysis to tell which.}@footnote{Dirk Laurie.
28235
@cite{Variable-precision Arithmetic Considered Perilous --- A Detective Story}.
28236
Electronic Transactions on Numerical Analysis. Volume 28, pp. 168-173, 2008.}
28237
@author Dirk Laurie
28240
@command{gawk} does not implicitly modify the precision of any previously
28241
computed results when the working precision is changed with an assignment
28242
to @code{PREC}. The precision of a number is always the one that was
28243
used at the time of its creation, and there is no way for the user
28244
to explicitly change it afterwards. However, since the result of a
28245
floating-point arithmetic operation is always an arbitrary precision
28246
floating-point value---with a precision set by the value of @code{PREC}---one of the
28247
following workarounds effectively accomplishes the desired behavior:
28260
@node Exact Arithmetic
28261
@subsection Exact Arithmetic with Floating-point Numbers
28264
Never depend on the exactness of floating-point arithmetic,
28265
even for apparently simple expressions!
28268
Can arbitrary precision arithmetic give exact results? There are
28269
no easy answers. The standard rules of algebra often do not apply
28270
when using floating-point arithmetic.
28271
Among other things, the distributive and associative laws
28272
do not hold completely, and order of operation may be important
28273
for your computation. Rounding error, cumulative precision loss
28274
and underflow are often troublesome.
28276
When @command{gawk} tests the expressions @samp{0.1 + 12.2} and @samp{12.3}
28278
using the machine double precision arithmetic, it decides that they
28280
(@xref{Floating-point Programming}.)
28281
You can get the result you want by increasing the precision;
28282
56 bits in this case will get the job done:
28285
$ @kbd{gawk -M -v PREC=56 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'}
28289
If adding more bits is good, perhaps adding even more bits of
28290
precision is better?
28291
Here is what happens if we use an even larger value of @code{PREC}:
28294
$ @kbd{gawk -M -v PREC=201 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'}
28298
This is not a bug in @command{gawk} or in the MPFR library.
28299
It is easy to forget that the finite number of bits used to store the value
28300
is often just an approximation after proper rounding.
28301
The test for equality succeeds if and only if @emph{all} bits in the two operands
28302
are exactly the same. Since this is not necessarily true after floating-point
28303
computations with a particular precision and effective rounding rule,
28304
a straight test for equality may not work.
28306
So, don't assume that floating-point values can be compared for equality.
28307
You should also exercise caution when using other forms of comparisons.
28308
The standard way to compare between floating-point numbers is to determine
28309
how much error (or @dfn{tolerance}) you will allow in a comparison and
28310
check to see if one value is within this error range of the other.
28312
In applications where 15 or fewer decimal places suffice,
28313
hardware double precision arithmetic can be adequate, and is usually much faster.
28314
But you do need to keep in mind that every floating-point operation
28315
can suffer a new rounding error with catastrophic consequences as illustrated
28316
by our earlier attempt to compute the value of the constant @value{PI}
28317
(@pxref{Floating-point Programming}).
28318
Extra precision can greatly enhance the stability and the accuracy
28319
of your computation in such cases.
28321
Repeated addition is not necessarily equivalent to multiplication
28322
in floating-point arithmetic. In the example in
28323
@ref{Floating-point Programming}:
28326
$ @kbd{gawk 'BEGIN @{}
28327
> @kbd{for (d = 1.1; d <= 1.5; d += 0.1) # loop five times (?)}
28335
you may or may not succeed in getting the correct result by choosing
28336
an arbitrarily large value for @code{PREC}. Reformulation of
28337
the problem at hand is often the correct approach in such situations.
29446
@code{ROUNDMODE} has the default value @code{"N"}, which
29447
selects the IEEE 754 rounding mode @code{roundTiesToEven}.
29448
In @ref{table-gawk-rounding-modes}, the value @code{"A"} selects
29449
@code{roundTiesToAway}. This is only available if your version of the
29450
MPFR library supports it; otherwise setting @code{ROUNDMODE} to @code{"A"}
29453
The default mode @code{roundTiesToEven} is the most preferred,
29454
but the least intuitive. This method does the obvious thing for most values,
29455
by rounding them up or down to the nearest digit.
29456
For example, rounding 1.132 to two digits yields 1.13,
29457
and rounding 1.157 yields 1.16.
29459
However, when it comes to rounding a value that is exactly halfway between,
29460
things do not work the way you probably learned in school.
29461
In this case, the number is rounded to the nearest even digit.
29462
So rounding 0.125 to two digits rounds down to 0.12,
29463
but rounding 0.6875 to three digits rounds up to 0.688.
29464
You probably have already encountered this rounding mode when
29465
using @code{printf} to format floating-point numbers.
29471
for (i = 1; i < 10; i++) @{
29473
printf("%4.1f => %2.0f\n", x, x)
29479
produces the following output when run on the author's system:@footnote{It
29480
is possible for the output to be completely different if the
29481
C library in your system does not use the IEEE 754 even-rounding
29482
rule to round halfway cases for @code{printf}.}
29496
The theory behind @code{roundTiesToEven} is that it more or less evenly
29497
distributes upward and downward rounds of exact halves, which might
29498
cause any accumulating round-off error to cancel itself out. This is the
29499
default rounding mode for IEEE 754 computing functions and operators.
29501
The other rounding modes are rarely used. Round toward positive infinity
29502
(@code{roundTowardPositive}) and round toward negative infinity
29503
(@code{roundTowardNegative}) are often used to implement interval
29504
arithmetic, where you adjust the rounding mode to calculate upper and
29505
lower bounds for the range of output. The @code{roundTowardZero} mode can
29506
be used for converting floating-point numbers to integers. The rounding
29507
mode @code{roundTiesToAway} rounds the result to the nearest number and
29508
selects the number with the larger magnitude if a tie occurs.
29510
Some numerical analysts will tell you that your choice of rounding
29511
style has tremendous impact on the final outcome, and advise you to
29512
wait until final output for any rounding. Instead, you can often avoid
29513
round-off error problems by setting the precision initially to some
29514
value sufficiently larger than the final desired precision, so that
29515
the accumulation of round-off error does not influence the outcome.
29516
If you suspect that results from your computation are sensitive to
29517
accumulation of round-off error, look for a significant difference in
29518
output when you change the rounding mode to be sure.
28339
29520
@node Arbitrary Precision Integers
28340
29521
@section Arbitrary Precision Integer Arithmetic with @command{gawk}
28341
@cindex integer, arbitrary precision
29522
@cindex integers, arbitrary precision
29523
@cindex arbitrary precision integers
28343
If one of the options @option{--bignum} or @option{-M} is specified,
28344
@command{gawk} performs all
28345
integer arithmetic using GMP arbitrary precision integers.
28346
Any number that looks like an integer in a program source or data file
28347
is stored as an arbitrary precision integer.
28348
The size of the integer is limited only by your computer's memory.
28349
The current floating-point context has no effect on operations involving integers.
28350
For example, the following computes
29525
When given one of the options @option{--bignum} or @option{-M},
29526
@command{gawk} performs all integer arithmetic using GMP arbitrary
29527
precision integers. Any number that looks like an integer in a source
29528
or @value{DF} is stored as an arbitrary precision integer. The size
29529
of the integer is limited only by the available memory. For example,
29530
the following computes
28352
29532
@math{5^{4^{3^{2}}}},
28356
29538
@end ifnottex
29540
5<superscript>4<superscript>3<superscript>2</superscript></superscript></superscript>, @c
28357
29542
the result of which is beyond the
28358
limits of ordinary @command{gawk} numbers:
29543
limits of ordinary hardware double-precision floating point values:
28361
29546
$ @kbd{gawk -M 'BEGIN @{}
28426
29618
gawk -M 'BEGIN @{ n = 13.0; print n % 2.0 @}'
28429
Note that for the particular example above, there is likely best
29621
Note that for the particular example above, it is likely best
28430
29622
to just use the following:
28433
29625
gawk -M 'BEGIN @{ n = 13; print n % 2 @}'
29628
@node POSIX Floating Point Problems
29629
@section Standards Versus Existing Practice
29631
Historically, @command{awk} has converted any non-numeric looking string
29632
to the numeric value zero, when required. Furthermore, the original
29633
definition of the language and the original POSIX standards specified that
29634
@command{awk} only understands decimal numbers (base 10), and not octal
29635
(base 8) or hexadecimal numbers (base 16).
29637
Changes in the language of the
29638
2001 and 2004 POSIX standards can be interpreted to imply that @command{awk}
29639
should support additional features. These features are:
29641
@itemize @value{BULLET}
29643
Interpretation of floating point data values specified in hexadecimal
29644
notation (e.g., @code{0xDEADBEEF}). (Note: data values, @emph{not}
29645
source code constants.)
29648
Support for the special IEEE 754 floating point values ``Not A Number''
29649
(NaN), positive Infinity (``inf'') and negative Infinity (``@minus{}inf'').
29650
In particular, the format for these values is as specified by the ISO 1999
29651
C standard, which ignores case and can allow implementation-dependent additional
29652
characters after the @samp{nan} and allow either @samp{inf} or @samp{infinity}.
29655
The first problem is that both of these are clear changes to historical
29658
@itemize @value{BULLET}
29660
The @command{gawk} maintainer feels that supporting hexadecimal floating
29661
point values, in particular, is ugly, and was never intended by the
29662
original designers to be part of the language.
29665
Allowing completely alphabetic strings to have valid numeric
29666
values is also a very severe departure from historical practice.
29669
The second problem is that the @code{gawk} maintainer feels that this
29670
interpretation of the standard, which requires a certain amount of
29671
``language lawyering'' to arrive at in the first place, was not even
29672
intended by the standard developers. In other words, ``we see how you
29673
got where you are, but we don't think that that's where you want to be.''
29675
Recognizing the above issues, but attempting to provide compatibility
29676
with the earlier versions of the standard,
29677
the 2008 POSIX standard added explicit wording to allow, but not require,
29678
that @command{awk} support hexadecimal floating point values and
29679
special values for ``Not A Number'' and infinity.
29681
Although the @command{gawk} maintainer continues to feel that
29682
providing those features is inadvisable,
29683
nevertheless, on systems that support IEEE floating point, it seems
29684
reasonable to provide @emph{some} way to support NaN and Infinity values.
29685
The solution implemented in @command{gawk} is as follows:
29687
@itemize @value{BULLET}
29689
With the @option{--posix} command-line option, @command{gawk} becomes
29690
``hands off.'' String values are passed directly to the system library's
29691
@code{strtod()} function, and if it successfully returns a numeric value,
29692
that is what's used.@footnote{You asked for it, you got it.}
29693
By definition, the results are not portable across
29694
different systems. They are also a little surprising:
29697
$ @kbd{echo nanny | gawk --posix '@{ print $1 + 0 @}'}
29699
$ @kbd{echo 0xDeadBeef | gawk --posix '@{ print $1 + 0 @}'}
29700
@print{} 3735928559
29704
Without @option{--posix}, @command{gawk} interprets the four strings
29710
specially, producing the corresponding special numeric values.
29711
The leading sign acts a signal to @command{gawk} (and the user)
29712
that the value is really numeric. Hexadecimal floating point is
29713
not supported (unless you also use @option{--non-decimal-data},
29714
which is @emph{not} recommended). For example:
29717
$ @kbd{echo nanny | gawk '@{ print $1 + 0 @}'}
29719
$ @kbd{echo +nan | gawk '@{ print $1 + 0 @}'}
29721
$ @kbd{echo 0xDeadBeef | gawk '@{ print $1 + 0 @}'}
29725
@command{gawk} ignores case in the four special values.
29726
Thus @samp{+nan} and @samp{+NaN} are the same.
29729
@node Floating point summary
29732
@itemize @value{BULLET}
29734
Most computer arithmetic is done using either integers or floating-point
29735
values. The default for @command{awk} is to use double-precision
29736
floating-point values.
29739
In the 1980's, Barbie mistakenly said ``Math class is tough!''
29740
While math isn't tough, floating-point arithmetic isn't the same
29741
as pencil and paper math, and care must be taken:
29744
@itemize @value{MINUS}
29746
Not all numbers can be represented exactly.
29749
Comparing values should use a delta, instead of being done directly
29750
with @samp{==} and @samp{!=}.
29756
Operations are not always truly associative or distributive.
29760
Increasing the accuracy can help, but it is not a panacea.
29763
Often, increasing the accuracy and then rounding to the desired
29764
number of digits produces reasonable results.
29767
Use either @option{-M} or @option{--bignum} to enable MPFR
29768
arithmetic. Use @code{PREC} to set the precision in bits, and
29769
@code{ROUNDMODE} to set the IEEE 754 rounding mode.
29772
With @option{-M} or @option{--bignum}, @command{gawk} performs
29773
arbitrary precision integer arithmetic using the GMP library.
29774
This is faster and more space efficient than using MPFR for
29775
the same calculations.
29778
There are several ``dark corners'' with respect to floating-point
29779
numbers where @command{gawk} disagrees with the POSIX standard.
29780
It pays to be aware of them.
29783
Overall, there is no need to be unduly suspicious about the results from
29784
floating-point arithmetic. The lesson to remember is that floating-point
29785
arithmetic is always more complex than arithmetic using pencil and
29786
paper. In order to take advantage of the power of computer floating-point,
29787
you need to know its limitations and work within them. For most casual
29788
use of floating-point arithmetic, you will often get the expected result
29789
if you simply round the display of your final results to the correct number
29790
of significant decimal digits.
29793
As general advice, avoid presenting numerical data in a manner that
29794
implies better precision than is actually the case.
28436
29798
@node Dynamic Extensions
28437
29799
@chapter Writing Extensions for @command{gawk}
29800
@cindex dynamically loaded extensions
28439
29802
It is possible to add new functions written in C or C++ to @command{gawk} using
28440
29803
dynamically loaded libraries. This facility is available on systems
32731
34442
@c ENDOFRANGE exgnot
32732
34443
@c ENDOFRANGE posnot
34445
@c This does not need to be in the formal book.
34447
@node Feature History
34448
@appendixsec History of @command{gawk} Features
34452
https://groups.google.com/forum/#!topic/comp.lang.awk/SAUiRuff30c
34453
This motivated me to add this section.
34457
I've tried to follow this general order, esp.@: for the 3.0 and 3.1 sections:
34460
language changes (e.g., hex constants)
34461
differences in standard awk functions
34464
new command-line options
34467
Within each category, be alphabetical.
34470
This @value{SECTION} describes the features in @command{gawk}
34471
over and above those in POSIX @command{awk},
34472
in the order they were added to @command{gawk}.
34474
Version 2.10 of @command{gawk} introduced the following features:
34476
@itemize @value{BULLET}
34478
The @env{AWKPATH} environment variable for specifying a path search for
34479
the @option{-f} command-line option
34483
The @code{IGNORECASE} variable and its effects
34484
(@pxref{Case-sensitivity}).
34487
The @file{/dev/stdin}, @file{/dev/stdout}, @file{/dev/stderr} and
34488
@file{/dev/fd/@var{N}} special @value{FN}s
34489
(@pxref{Special Files}).
34492
Version 2.13 of @command{gawk} introduced the following features:
34494
@itemize @value{BULLET}
34496
The @code{FIELDWIDTHS} variable and its effects
34497
(@pxref{Constant Size}).
34500
The @code{systime()} and @code{strftime()} built-in functions for obtaining
34501
and printing timestamps
34502
(@pxref{Time Functions}).
34505
Additional command-line options
34508
@itemize @value{MINUS}
34510
The @option{-W lint} option to provide error and portability checking
34511
for both the source code and at runtime.
34514
The @option{-W compat} option to turn off the GNU extensions.
34517
The @option{-W posix} option for full POSIX compliance.
34521
Version 2.14 of @command{gawk} introduced the following feature:
34523
@itemize @value{BULLET}
34525
The @code{next file} statement for skipping to the next @value{DF}
34526
(@pxref{Nextfile Statement}).
34529
Version 2.15 of @command{gawk} introduced the following features:
34531
@itemize @value{BULLET}
34533
New variables (@pxref{Built-in Variables}):
34535
@itemize @value{MINUS}
34537
@code{ARGIND}, which tracks the movement of @code{FILENAME}
34538
through @code{ARGV}.
34541
@code{ERRNO}, which contains the system error message when
34542
@code{getline} returns @minus{}1 or @code{close()} fails.
34546
The @file{/dev/pid}, @file{/dev/ppid}, @file{/dev/pgrpid}, and
34547
@file{/dev/user} special @value{FN}s. These have since been removed.
34550
The ability to delete all of an array at once with @samp{delete @var{array}}
34554
Command line option changes
34557
@itemize @value{MINUS}
34559
The ability to use GNU-style long-named options that start with @option{--}.
34562
The @option{--source} option for mixing command-line and library-file
34567
Version 3.0 of @command{gawk} introduced the following features:
34569
@itemize @value{BULLET}
34571
New or changed variables:
34573
@itemize @value{MINUS}
34575
@code{IGNORECASE} changed, now applying to string comparison as well
34576
as regexp operations
34577
(@pxref{Case-sensitivity}).
34580
@code{RT}, which contains the input text that matched @code{RS}
34585
Full support for both POSIX and GNU regexps
34589
The @code{gensub()} function for more powerful text manipulation
34590
(@pxref{String Functions}).
34593
The @code{strftime()} function acquired a default time format,
34594
allowing it to be called with no arguments
34595
(@pxref{Time Functions}).
34598
The ability for @code{FS} and for the third
34599
argument to @code{split()} to be null strings
34600
(@pxref{Single Character Fields}).
34603
The ability for @code{RS} to be a regexp
34607
The @code{next file} statement became @code{nextfile}
34608
(@pxref{Nextfile Statement}).
34611
The @code{fflush()} function from
34612
Brian Kernighan's @command{awk}
34613
(then at Bell Laboratories;
34614
@pxref{I/O Functions}).
34617
New command line options:
34619
@itemize @value{MINUS}
34621
The @option{--lint-old} option to
34622
warn about constructs that are not available in
34623
the original Version 7 Unix version of @command{awk}
34624
(@pxref{V7/SVR3.1}).
34627
The @option{-m} option from Brian Kernighan's @command{awk}. (He was
34628
still at Bell Laboratories at the time.) This was later removed from
34629
both his @command{awk} and from @command{gawk}.
34632
The @option{--re-interval} option to provide interval expressions in regexps
34633
(@pxref{Regexp Operators}).
34636
The @option{--traditional} option was added as a better name for
34637
@option{--compat} (@pxref{Options}).
34641
The use of GNU Autoconf to control the configuration process
34642
(@pxref{Quick Installation}).
34646
This has since been removed.
34650
Version 3.1 of @command{gawk} introduced the following features:
34652
@itemize @value{BULLET}
34655
(@pxref{Built-in Variables}):
34657
@itemize @value{MINUS}
34659
@code{BINMODE}, for non-POSIX systems,
34660
which allows binary I/O for input and/or output files
34661
(@pxref{PC Using}).
34664
@code{LINT}, which dynamically controls lint warnings.
34667
@code{PROCINFO}, an array for providing process-related information.
34670
@code{TEXTDOMAIN}, for setting an application's internationalization text domain
34671
(@pxref{Internationalization}).
34675
The ability to use octal and hexadecimal constants in @command{awk}
34676
program source code
34677
(@pxref{Nondecimal-numbers}).
34680
The @samp{|&} operator for two-way I/O to a coprocess
34681
(@pxref{Two-way I/O}).
34684
The @file{/inet} special files for TCP/IP networking using @samp{|&}
34685
(@pxref{TCP/IP Networking}).
34688
The optional second argument to @code{close()} that allows closing one end
34689
of a two-way pipe to a coprocess
34690
(@pxref{Two-way I/O}).
34693
The optional third argument to the @code{match()} function
34694
for capturing text-matching subexpressions within a regexp
34695
(@pxref{String Functions}).
34698
Positional specifiers in @code{printf} formats for
34699
making translations easier
34700
(@pxref{Printf Ordering}).
34703
A number of new built-in functions:
34705
@itemize @value{MINUS}
34707
The @code{asort()} and @code{asorti()} functions for sorting arrays
34708
(@pxref{Array Sorting}).
34711
The @code{bindtextdomain()}, @code{dcgettext()} and @code{dcngettext()} functions
34712
for internationalization
34713
(@pxref{Programmer i18n}).
34716
The @code{extension()} function and the ability to add
34717
new built-in functions dynamically
34718
(@pxref{Dynamic Extensions}).
34721
The @code{mktime()} function for creating timestamps
34722
(@pxref{Time Functions}).
34725
The @code{and()}, @code{or()}, @code{xor()}, @code{compl()},
34726
@code{lshift()}, @code{rshift()}, and @code{strtonum()} functions
34727
(@pxref{Bitwise Functions}).
34731
@cindex @code{next file} statement
34732
The support for @samp{next file} as two words was removed completely
34733
(@pxref{Nextfile Statement}).
34736
Additional command-line options
34739
@itemize @value{MINUS}
34741
The @option{--dump-variables} option to print a list of all global variables.
34744
The @option{--exec} option, for use in CGI scripts.
34747
The @option{--gen-po} command-line option and the use of a leading
34748
underscore to mark strings that should be translated
34749
(@pxref{String Extraction}).
34752
The @option{--non-decimal-data} option to allow non-decimal
34754
(@pxref{Nondecimal Data}).
34757
The @option{--profile} option and @command{pgawk}, the
34758
profiling version of @command{gawk}, for producing execution
34759
profiles of @command{awk} programs
34760
(@pxref{Profiling}).
34763
The @option{--use-lc-numeric} option to force @command{gawk}
34764
to use the locale's decimal point for parsing input data
34765
(@pxref{Conversion}).
34769
The use of GNU Automake to help in standardizing the configuration process
34770
(@pxref{Quick Installation}).
34773
The use of GNU @command{gettext} for @command{gawk}'s own message output
34774
(@pxref{Gawk I18N}).
34777
BeOS support. This was later removed.
34780
Tandem support. This was later removed.
34783
The Atari port became officially unsupported and was
34784
later removed entirely.
34787
The source code changed to use ISO C standard-style function definitions.
34790
POSIX compliance for @code{sub()} and @code{gsub()}
34791
(@pxref{Gory Details}).
34794
The @code{length()} function was extended to accept an array argument
34795
and return the number of elements in the array
34796
(@pxref{String Functions}).
34799
The @code{strftime()} function acquired a third argument to
34800
enable printing times as UTC
34801
(@pxref{Time Functions}).
34804
Version 4.0 of @command{gawk} introduced the following features:
34806
@itemize @value{BULLET}
34809
Variable additions:
34811
@itemize @value{MINUS}
34813
@code{FPAT}, which allows you to specify a regexp that matches
34814
the fields, instead of matching the field separator
34815
(@pxref{Splitting By Content}).
34818
If @code{PROCINFO["sorted_in"]} exists, @samp{for(iggy in foo)} loops sort the
34819
indices before looping over them. The value of this element
34820
provides control over how the indices are sorted before the loop
34822
(@pxref{Controlling Scanning}).
34825
@code{PROCINFO["strftime"]}, which holds
34826
the default format for @code{strftime()}
34827
(@pxref{Time Functions}).
34831
The special files @file{/dev/pid}, @file{/dev/ppid}, @file{/dev/pgrpid}
34832
and @file{/dev/user} were removed.
34835
Support for IPv6 was added via the @file{/inet6} special file.
34836
@file{/inet4} forces IPv4 and @file{/inet} chooses the system
34837
default, which is probably IPv4
34838
(@pxref{TCP/IP Networking}).
34841
The use of @samp{\s} and @samp{\S} escape sequences in regular expressions
34842
(@pxref{GNU Regexp Operators}).
34845
Interval expressions became part of default regular expressions
34846
(@pxref{Regexp Operators}).
34849
POSIX character classes work even with @option{--traditional}
34850
(@pxref{Regexp Operators}).
34853
@code{break} and @code{continue} became invalid outside a loop,
34854
even with @option{--traditional}
34855
(@pxref{Break Statement}, and also see
34856
@ref{Continue Statement}).
34859
@code{fflush()}, @code{nextfile}, and @samp{delete @var{array}}
34860
are allowed if @option{--posix} or @option{--traditional}, since they
34861
are all now part of POSIX.
34864
An optional third argument to
34865
@code{asort()} and @code{asorti()}, specifying how to sort
34866
(@pxref{String Functions}).
34869
The behavior of @code{fflush()} changed to match Brian Kernighan's @command{awk}
34870
and for POSIX; now both @samp{fflush()} and @samp{fflush("")}
34871
flush all open output redirections
34872
(@pxref{I/O Functions}).
34875
The @code{isarray()}
34876
function which distinguishes if an item is an array
34877
or not, to make it possible to traverse arrays of arrays
34878
(@pxref{Type Functions}).
34881
The @code{patsplit()}
34882
function which gives the same capability as @code{FPAT}, for splitting
34883
(@pxref{String Functions}).
34886
An optional fourth argument to the @code{split()} function,
34887
which is an array to hold the values of the separators
34888
(@pxref{String Functions}).
34892
(@pxref{Arrays of Arrays}).
34895
The @code{BEGINFILE} and @code{ENDFILE} special patterns
34896
(@pxref{BEGINFILE/ENDFILE}).
34899
Indirect function calls
34900
(@pxref{Indirect Calls}).
34903
@code{switch} / @code{case} are enabled by default
34904
(@pxref{Switch Statement}).
34907
Command line option changes
34910
@itemize @value{MINUS}
34912
The @option{-b} and @option{--characters-as-bytes} options
34913
which prevent @command{gawk} from treating input as a multibyte string.
34916
The redundant @option{--compat}, @option{--copyleft}, and @option{--usage}
34917
long options were removed.
34920
The @option{--gen-po} option was finally renamed to the correct @option{--gen-pot}.
34923
The @option{--sandbox} option which disables certain features.
34926
All long options acquired corresponding short options, for use in @samp{#!} scripts.
34930
Directories named on the command line now produce a warning, not a fatal
34931
error, unless @option{--posix} or @option{--traditional} are used
34932
(@pxref{Command line directories}).
34935
The @command{gawk} internals were rewritten, bringing the @command{dgawk}
34936
debugger and possibly improved performance
34937
(@pxref{Debugger}).
34940
Per the GNU Coding Standards, dynamic extensions must now define
34941
a global symbol indicating that they are GPL-compatible
34942
(@pxref{Plugin License}).
34945
In POSIX mode, string comparisons use @code{strcoll()} / @code{wcscoll()}
34946
(@pxref{POSIX String Comparison}).
34949
The option for raw sockets was removed, since it was never implemented
34950
(@pxref{TCP/IP Networking}).
34953
Ranges of the form @samp{[d-h]} are treated as if they were in the
34954
C locale, no matter what kind of regexp is being used, and even if
34956
(@pxref{Ranges and Locales}).
34959
Support was removed for the following systems:
34961
@itemize @value{MINUS}
34978
MS-DOS with Microsoft Compiler
34981
MS-Windows with Microsoft Compiler
34987
SunOS 3.x, Sun 386 (Road Runner)
34993
Prestandard VAX C compiler for VAX/VMS
34997
Version 4.1 of @command{gawk} introduced the following features:
34999
@itemize @value{BULLET}
35003
@code{SYMTAB}, @code{FUNCTAB}, and @code{PROCINFO["identifiers"]}
35004
(@pxref{Auto-set}).
35007
The three executables @command{gawk}, @command{pgawk}, and @command{dgawk}, were merged into
35008
one, named just @command{gawk}. As a result the command line options changed.
35011
Command line option changes
35014
@itemize @value{MINUS}
35016
The @option{-D} option invokes the debugger.
35019
The @option{-i} and @option{--include} options
35020
load @command{awk} library files.
35023
The @option{-l} and @option{--load} options load compiled dynamic extensions.
35026
The @option{-M} and @option{--bignum} options enable MPFR.
35029
The @option{-o} only does pretty-printing.
35032
The @option{-p} option is used for profiling.
35035
The @option{-R} option was removed.
35039
Support for high precision arithmetic with MPFR.
35040
(@pxref{Arbitrary Precision Arithmetic}).
35043
The @code{and()}, @code{or()} and @code{xor()} functions
35044
changed to allow any number of arguments,
35045
with a minimum of two
35046
(@pxref{Bitwise Functions}).
35049
The dynamic extension interface was completely redone
35050
(@pxref{Dynamic Extensions}).
35054
@c XXX ADD MORE STUFF HERE
32734
35057
@node Common Extensions
32735
35058
@appendixsec Common Extensions Summary