~vcs-imports/gawk/master : revision 502

14

* awk: (gawk)Invoking gawk. Text scanning and processing.

15

@end direntry

16

17

@ifset FOR_PRINT

18

@tex

19

\gdef\xrefprintnodename#1{``#1''}

20

@end tex

21

@end ifset

22

23

@ifclear FOR_PRINT

24

@c With early 2014 texinfo.tex, restore PDF links and colors

25

@tex

26

\gdef\linkcolor{0.5 0.09 0.12} % Dark Red

27

\gdef\urlcolor{0.5 0.09 0.12} % Also

28

\global\urefurlonlylinktrue

29

@end tex

30

@end ifclear

31

32

@ifnotdocbook

33

@set BULLET @bullet{}

34

@set MINUS @minus{}

35

@end ifnotdocbook

36

37

@ifdocbook

38

@set BULLET

39

@set MINUS

40

@end ifdocbook

41

17

42

@set xref-automatic-section-title

18

43

19

44

@c The following information should be updated here only!

21

46

@c applies to and all the info about who's publishing this edition

22

47

23

48

@c These apply across the board.

24

@set UPDATE-MONTH May, 2013

49

@set UPDATE-MONTH June, 2014

25

50

@set VERSION 4.1

26

@set PATCHLEVEL 0

27

28

@set FSF

51

@set PATCHLEVEL 1

29

52

30

53

@set TITLE GAWK: Effective AWK Programming

31

54

@set SUBTITLE A User's Guide for GNU Awk

39

62

@set SUBSECTION subsection

40

63

@set DARKCORNER @inmargin{@image{lflashlight,1cm}, @image{rflashlight,1cm}}

41

64

@set COMMONEXT (c.e.)

65

@set PAGE page

42

66

@end iftex

43

67

@ifinfo

44

68

@set DOCUMENT Info file

48

72

@set SUBSECTION node

49

73

@set DARKCORNER (d.c.)

50

74

@set COMMONEXT (c.e.)

75

@set PAGE screen

51

76

@end ifinfo

52

77

@ifhtml

53

78

@set DOCUMENT Web page

57

82

@set SUBSECTION subsection

58

83

@set DARKCORNER (d.c.)

59

84

@set COMMONEXT (c.e.)

85

@set PAGE screen

60

86

@end ifhtml

61

87

@ifdocbook

62

88

@set DOCUMENT book

66

92

@set SUBSECTION subsection

67

93

@set DARKCORNER (d.c.)

68

94

@set COMMONEXT (c.e.)

95

@set PAGE page

69

96

@end ifdocbook

70

97

@ifxml

71

98

@set DOCUMENT book

75

102

@set SUBSECTION subsection

76

103

@set DARKCORNER (d.c.)

77

104

@set COMMONEXT (c.e.)

105

@set PAGE page

78

106

@end ifxml

79

107

@ifplaintext

80

108

@set DOCUMENT book

84

112

@set SUBSECTION subsection

85

113

@set DARKCORNER (d.c.)

86

114

@set COMMONEXT (c.e.)

115

@set PAGE page

87

116

@end ifplaintext

88

117

118

@ifdocbook

119

@c empty on purpose

120

@set PART1

121

@set PART2

122

@set PART3

123

@set PART4

124

@end ifdocbook

125

126

@ifnotdocbook

127

@set PART1 Part I:@*

128

@set PART2 Part II:@*

129

@set PART3 Part III:@*

130

@set PART4 Part IV:@*

131

@end ifnotdocbook

132

89

133

@c some special symbols

90

134

@iftex

91

135

@set LEQ @math{@leq}

92

136

@set PI @math{@pi}

93

137

@end iftex

138

@ifdocbook

139

@set LEQ @inlineraw{docbook, ≤}

140

@set PI @inlineraw{docbook, &pgr;}

141

@end ifdocbook

94

142

@ifnottex

143

@ifnotdocbook

95

144

@set LEQ <=

96

145

@set PI @i{pi}

146

@end ifnotdocbook

97

147

@end ifnottex

98

148

99

149

@ifnottex

150

@ifnotdocbook

100

151

@macro ii{text}

101

152

@i{\text\}

102

153

@end macro

154

@end ifnotdocbook

103

155

@end ifnottex

104

156

157

@ifdocbook

158

@macro ii{text}

159

@inlineraw{docbook,<lineannotation>\text\</lineannotation>}

160

@end macro

161

@end ifdocbook

162

163

@ifclear FOR_PRINT

164

@set FN file name

165

@set FFN File Name

166

@set DF data file

167

@set DDF Data File

168

@set PVERSION version

169

@end ifclear

170

@ifset FOR_PRINT

171

@set FN filename

172

@set FFN Filename

173

@set DF datafile

174

@set DDF Datafile

175

@set PVERSION Version

176

@end ifset

177

105

178

@c For HTML, spell out email addresses, to avoid problems with

106

179

@c address harvesters for spammers.

107

180

@ifhtml

115

188

@end macro

116

189

@end ifnothtml

117

190

191

@c Indexing macros

192

@ifinfo

193

194

@macro cindexawkfunc{name}

195

@cindex @code{\name\}

196

@end macro

197

198

@macro cindexgawkfunc{name}

199

@cindex @code{\name\}

200

@end macro

201

202

@end ifinfo

203

204

@ifnotinfo

205

206

@macro cindexawkfunc{name}

207

@cindex @code{\name\()} function

208

@end macro

209

210

@macro cindexgawkfunc{name}

211

@cindex @code{\name\()} function (@command{gawk})

212

@end macro

213

@end ifnotinfo

214

118

215

@ignore

119

216

Some comments on the layout for TeX.

120

1. Use at least texinfo.tex 2000-09-06.09

121

2. I have done A LOT of work to make this look good. There are `@page' commands

122

and use of `@group ... @end group' in a number of places. If you muck

123

with anything, it's your responsibility not to break the layout.

217

1. Use at least texinfo.tex 2014-01-30.15

218

2. When using @docbook, if the last line is part of a paragraph, end

219

it with a space and @c so that the lines won't run together. This is a

220

quirk of the language / makeinfo, and isn't going to change.

124

221

@end ignore

125

222

126

223

@c merge the function and variable indexes into the concept index

136

233

@syncodeindex fn cp

137

234

@syncodeindex vr cp

138

235

@end ifxml

236

@ifdocbook

237

@synindex fn cp

238

@synindex vr cp

239

@end ifdocbook

139

240

140

241

@c If "finalout" is commented out, the printed output will show

141

242

@c black boxes that mark lines that are too long. Thus, it is

147

248

@end iftex

148

249

149

250

@copying

150

Copyright @copyright{} 1989, 1991, 1992, 1993, 1996, 1997, 1998, 1999,

151

2000, 2001, 2002, 2003, 2004, 2005, 2007, 2009, 2010, 2011, 2012, 2013

152

Free Software Foundation, Inc.

251

@docbook

252

<para>

253

“To boldly go where no man has gone before” is a

254

Registered Trademark of Paramount Pictures Corporation.</para>

255

256

<para>Published by:</para>

257

258

<literallayout class="normal">Free Software Foundation

259

51 Franklin Street, Fifth Floor

260

Boston, MA 02110-1301 USA

261

Phone: +1-617-542-5942

262

Fax: +1-617-542-2652

263

Email: <email>gnu@@gnu.org</email>

264

URL: <ulink url="http://www.gnu.org">http://www.gnu.org/</ulink></literallayout>

265

266

267

Free Software Foundation, Inc.

268

269

@end docbook

270

271

@ifnotdocbook

272

Copyright @copyright{} 1989, 1991, 1992, 1993, 1996--2005, 2007, 2009--2014 @*

273

Free Software Foundation, Inc.

274

@end ifnotdocbook

153

275

@sp 2

154

276

155

277

This is Edition @value{EDITION} of @cite{@value{TITLE}: @value{SUBTITLE}},

197

319

@subtitle @value{UPDATE-MONTH}

198

320

@author Arnold D. Robbins

199

321

322

@ifnotdocbook

200

323

@c Include the Distribution inside the titlepage environment so

201

324

@c that headings are turned off. Headings on and off do not work.

202

325

221

344

ISBN 1-882114-28-0 @*

222

345

@sp 2

223

346

@insertcopying

347

@end ifnotdocbook

224

348

@end titlepage

225

349

226

350

@c Thanks to Bob Chassell for directions on doing dedications.

229

353

@page

230

354

@w{ }

231

355

@sp 9

232

@center @i{To Miriam, for making me complete.}

233

@sp 1

234

@center @i{To Chana, for the joy you bring us.}

235

@sp 1

236

@center @i{To Rivka, for the exponential increase.}

237

@sp 1

238

@center @i{To Nachum, for the added dimension.}

239

@sp 1

240

@center @i{To Malka, for the new beginning.}

356

@center @i{To my parents, for their love, and for the wonderful example they set for me.}

357

@sp 1

358

@center @i{To my wife Miriam, for making me complete.

359

Thank you for building your life together with me.}

360

@sp 1

361

@center @i{To our children Chana, Rivka, Nachum and Malka, for enrichening our lives in innumerable ways.}

362

@sp 1

241

363

@w{ }

242

364

@page

243

365

@w{ }

245

367

@headings on

246

368

@end iftex

247

369

370

@docbook

371

372

<para>To my parents, for their love, and for the wonderful

373

example they set for me.</para>

374

<para>To my wife Miriam, for making me complete.

375

Thank you for building your life together with me.</para>

376

<para>To our children Chana, Rivka, Nachum and Malka,

377

for enrichening our lives in innumerable ways.</para>

378

</dedication>

379

@end docbook

380

248

381

@iftex

249

382

@headings off

250

383

@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @|

253

386

254

387

@ifnottex

255

388

@ifnotxml

389

@ifnotdocbook

256

390

@node Top

257

391

@top General Introduction

258

392

@c Preface node should come right after the Top

264

398

265

399

@insertcopying

266

400

401

@end ifnotdocbook

267

402

@end ifnotxml

268

403

@end ifnottex

269

404

331

466

includes command-line syntax.

332

467

* One-shot:: Running a short throwaway

333

468

@command{awk} program.

334

* Read Terminal:: Using no input files (input from

335

terminal instead).

469

* Read Terminal:: Using no input files (input from the

470

keyboard instead).

336

471

* Long:: Putting permanent @command{awk}

337

472

programs in files.

338

473

* Executable Scripts:: Making self-contained @command{awk}

354

489

* Other Features:: Other Features of @command{awk}.

355

490

* When:: When to use @command{gawk} and when to

356

491

use other things.

492

* Intro Summary:: Summary of the introduction.

357

493

* Command Line:: How to run @command{awk}.

358

494

* Options:: Command-line options and their

359

495

meanings.

375

511

program.

376

512

* Obsolete:: Obsolete Options and/or features.

377

513

* Undocumented:: Undocumented Options and Features.

514

* Invoking Summary:: Invocation summary.

378

515

* Regexp Usage:: How to Use Regular Expressions.

379

516

* Escape Sequences:: How to write nonprinting characters.

380

517

* Regexp Operators:: Regular Expression Operators.

383

520

* Case-sensitivity:: How to do case-insensitive matching.

384

521

* Leftmost Longest:: How much text matches.

385

522

* Computed Regexps:: Using Dynamic Regexps.

523

* Regexp Summary:: Regular expressions summary.

386

524

* Records:: Controlling how data is split into

387

525

records.

526

* awk split records:: How standard @command{awk} splits

527

records.

528

* gawk split records:: How @command{gawk} splits records.

388

529

* Fields:: An introduction to fields.

389

530

* Nonconstant Fields:: Nonconstant Field Numbers.

390

531

* Changing Fields:: Changing the Contents of a Field.

396

537

field.

397

538

* Command Line Field Separator:: Setting @code{FS} from the

398

539

command-line.

540

* Full Line Fields:: Making the full line be a single

541

field.

399

542

* Field Splitting Summary:: Some final points and a summary table.

400

543

* Constant Size:: Reading constant width data.

401

544

* Splitting By Content:: Defining Fields By Content

421

564

* Read Timeout:: Reading input with a timeout.

422

565

* Command line directories:: What happens if you put a directory on

423

566

the command line.

567

* Input Summary:: Input summary.

568

* Input Exercises:: Exercises.

424

569

* Print:: The @code{print} statement.

425

570

* Print Examples:: Simple examples of @code{print}

426

571

statements.

444

589

* Special Caveats:: Things to watch out for.

445

590

* Close Files And Pipes:: Closing Input and Output Files and

446

591

Pipes.

592

* Output Summary:: Output summary.

593

* Output exercises:: Exercises.

447

594

* Values:: Constants, Variables, and Regular

448

595

Expressions.

449

596

* Constants:: String, numeric and regexp constants.

459

606

This is an advanced method of input.

460

607

* Conversion:: The conversion of strings to numbers

461

608

and vice versa.

609

* Strings And Numbers:: How @command{awk} Converts Between

610

Strings And Numbers.

611

* Locale influences conversions:: How the locale may affect conversions.

462

612

* All Operators:: @command{gawk}'s operators.

463

613

* Arithmetic Ops:: Arithmetic operations (@samp{+},

464

614

@samp{-}, etc.)

486

636

* Function Calls:: A function call is an expression.

487

637

* Precedence:: How various operators nest.

488

638

* Locales:: How the locale affects things.

639

* Expressions Summary:: Expressions summary.

489

640

* Pattern Overview:: What goes into a pattern.

490

641

* Regexp Patterns:: Using regexps as patterns.

491

642

* Expression Patterns:: Any expression can be used as a

532

683

gives you information.

533

684

* ARGC and ARGV:: Ways to use @code{ARGC} and

534

685

@code{ARGV}.

686

* Pattern Action Summary:: Patterns and Actions summary.

535

687

* Array Basics:: The basics of arrays.

536

688

* Array Intro:: Introduction to Arrays

537

689

* Reference to Elements:: How to examine one element of an

554

706

@command{awk}.

555

707

* Multiscanning:: Scanning multidimensional arrays.

556

708

* Arrays of Arrays:: True multidimensional arrays.

709

* Arrays Summary:: Summary of arrays.

557

710

* Built-in:: Summarizes the built-in functions.

558

711

* Calling Built-in:: How to call built-in functions.

559

712

* Numeric Functions:: Functions that work with numbers,

588

741

runtime.

589

742

* Indirect Calls:: Choosing the function to call at

590

743

runtime.

744

* Functions Summary:: Summary of functions.

591

745

* Library Names:: How to best name private global

592

746

variables in library functions.

593

747

* General Functions:: Functions that are of general use.

622

776

* Group Functions:: Functions for getting group

623

777

information.

624

778

* Walking Arrays:: A function to walk arrays of arrays.

779

* Library Functions Summary:: Summary of library functions.

780

* Library exercises:: Exercises.

625

781

* Running Examples:: How to run these examples.

626

782

* Clones:: Clones of common utilities.

627

783

* Cut Program:: The @command{cut} utility.

651

807

* Anagram Program:: Finding anagrams from a dictionary.

652

808

* Signature Program:: People do amazing things with too much

653

809

time on their hands.

810

* Programs Summary:: Summary of programs.

811

* Programs Exercises:: Exercises.

654

812

* Nondecimal Data:: Allowing nondecimal input data.

655

813

* Array Sorting:: Facilities for controlling array

656

814

traversal and sorting arrays.

662

820

* TCP/IP Networking:: Using @command{gawk} for network

663

821

programming.

664

822

* Profiling:: Profiling your @command{awk} programs.

823

* Advanced Features Summary:: Summary of advanced features.

665

824

* I18N and L10N:: Internationalization and Localization.

666

* Explaining gettext:: How GNU @code{gettext} works.

825

* Explaining gettext:: How GNU @command{gettext} works.

667

826

* Programmer i18n:: Features for the programmer.

668

827

* Translator i18n:: Features for the translator.

669

828

* String Extraction:: Extracting marked strings.

673

832

* I18N Example:: A simple i18n example.

674

833

* Gawk I18N:: @command{gawk} is also

675

834

internationalized.

835

* I18N Summary:: Summary of I18N stuff.

676

836

* Debugging:: Introduction to @command{gawk}

677

837

debugger.

678

838

* Debugging Concepts:: Debugging in General.

691

851

* Miscellaneous Debugger Commands:: Miscellaneous Commands.

692

852

* Readline Support:: Readline support.

693

853

* Limitations:: Limitations and future plans.

694

* General Arithmetic:: An introduction to computer

695

arithmetic.

696

* Floating Point Issues:: Stuff to know about floating-point

697

numbers.

698

* String Conversion Precision:: The String Value Can Lie.

699

* Unexpected Results:: Floating Point Numbers Are Not

700

Abstract Numbers.

701

* POSIX Floating Point Problems:: Standards Versus Existing Practice.

702

* Integer Programming:: Effective integer programming.

703

* Floating-point Programming:: Effective Floating-point Programming.

704

* Floating-point Representation:: Binary floating-point representation.

705

* Floating-point Context:: Floating-point context.

706

* Rounding Mode:: Floating-point rounding mode.

707

* Gawk and MPFR:: How @command{gawk} provides

708

arbitrary-precision arithmetic.

709

* Arbitrary Precision Floats:: Arbitrary Precision Floating-point

710

Arithmetic with @command{gawk}.

711

* Setting Precision:: Setting the working precision.

712

* Setting Rounding Mode:: Setting the rounding mode.

713

* Floating-point Constants:: Representing floating-point constants.

714

* Changing Precision:: Changing the precision of a number.

715

* Exact Arithmetic:: Exact arithmetic with floating-point

716

numbers.

854

* Debugging Summary:: Debugging summary.

855

* Computer Arithmetic:: A quick intro to computer math.

856

* Math Definitions:: Defining terms used.

857

* MPFR features:: The MPFR features in @command{gawk}.

858

* FP Math Caution:: Things to know.

859

* Inexactness of computations:: Floating point math is not exact.

860

* Inexact representation:: Numbers are not exactly represented.

861

* Comparing FP Values:: How to compare floating point values.

862

* Errors accumulate:: Errors get bigger as they go.

863

* Getting Accuracy:: Getting more accuracy takes some work.

864

* Try To Round:: Add digits and round.

865

* Setting precision:: How to set the precision.

866

* Setting the rounding mode:: How to set the rounding mode.

717

867

* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic

718

868

with @command{gawk}.

869

* POSIX Floating Point Problems:: Standards Versus Existing Practice.

870

* Floating point summary:: Summary of floating point discussion.

719

871

* Extension Intro:: What is an extension.

720

872

* Plugin License:: A note about licensing.

721

873

* Extension Mechanism Outline:: An outline of how it works.

723

875

* Extension API Functions Introduction:: Introduction to the API functions.

724

876

* General Data Types:: The data types.

725

877

* Requesting Values:: How to get a value.

878

* Memory Allocation Functions:: Functions for allocating memory.

726

879

* Constructor Functions:: Functions for creating values.

727

880

* Registration Functions:: Functions to register things with

728

881

@command{gawk}.

776

929

* Extension Sample Time:: An interface to @code{gettimeofday()}

777

930

and @code{sleep()}.

778

931

* gawkextlib:: The @code{gawkextlib} project.

932

* Extension summary:: Extension summary.

933

* Extension Exercises:: Exercises.

779

934

* V7/SVR3.1:: The major changes between V7 and

780

935

System V Release 3.1.

781

936

* SVR4:: Minor changes between System V

785

940

version of @command{awk}.

786

941

* POSIX/GNU:: The extensions in @command{gawk} not

787

942

in POSIX @command{awk}.

943

* Feature History:: The history of the features in

944

@command{gawk}.

788

945

* Common Extensions:: Common Extensions Summary.

789

946

* Ranges and Locales:: How locales used to affect regexp

790

947

ranges.

791

948

* Contributors:: The major contributors to

792

949

@command{gawk}.

950

* History summary:: History summary.

793

951

* Gawk Distribution:: What is in the @command{gawk}

794

952

distribution.

795

953

* Getting:: How to get the distribution.

817

975

* VMS Installation:: Installing @command{gawk} on VMS.

818

976

* VMS Compilation:: How to compile @command{gawk} under

819

977

VMS.

978

* VMS Dynamic Extensions:: Compiling @command{gawk} dynamic

979

extensions on VMS.

820

980

* VMS Installation Details:: How to install @command{gawk} under

821

981

VMS.

822

982

* VMS Running:: How to run @command{gawk} under VMS.

983

* VMS GNV:: The VMS GNV Project.

823

984

* VMS Old Gawk:: An old version comes with some VMS

824

985

systems.

825

986

* Bugs:: Reporting Problems and Bugs.

826

987

* Other Versions:: Other freely available @command{awk}

827

988

implementations.

989

* Installation summary:: Summary of installation.

828

990

* Compatibility Mode:: How to disable certain @command{gawk}

829

991

extensions.

830

992

* Additions:: Making Additions To @command{gawk}.

833

995

@command{gawk}.

834

996

* New Ports:: Porting @command{gawk} to a new

835

997

operating system.

836

* Derived Files:: Why derived files are kept in the

837

@command{git} repository.

998

* Derived Files:: Why derived files are kept in the Git

999

repository.

838

1000

* Future Extensions:: New features that may be implemented

839

1001

one day.

840

1002

* Implementation Limitations:: Some limitations of the

845

1007

* Extension Other Design Decisions:: Some other design decisions.

846

1008

* Extension Future Growth:: Some room for future growth.

847

1009

* Old Extension Mechanism:: Some compatibility for old extensions.

1010

* Notes summary:: Summary of implementation notes.

848

1011

* Basic High Level:: The high level view.

849

1012

* Basic Data Typing:: A very quick intro to data types.

850

1013

@end detailmenu

852

1015

853

1016

@c dedication for Info file

854

1017

@ifinfo

855

@center To Miriam, for making me complete.

856

@sp 1

857

@center To Chana, for the joy you bring us.

858

@sp 1

859

@center To Rivka, for the exponential increase.

860

@sp 1

861

@center To Nachum, for the added dimension.

862

@sp 1

863

@center To Malka, for the new beginning.

1018

To my parents, for their love, and for the wonderful

1019

example they set for me.

1020

@sp 1

1021

To my wife Miriam, for making me complete.

1022

Thank you for building your life together with me.

1023

@sp 1

1024

To our children Chana, Rivka, Nachum and Malka,

1025

for enrichening our lives in innumerable ways.

864

1026

@end ifinfo

865

1027

866

1028

@summarycontents

869

1031

@node Foreword

870

1032

@unnumbered Foreword

871

1033

1034

@c This bit is post-processed by a script which turns the chapter

1035

@c tag into a preface tag, and moves this stuff to before the title.

1036

@c Bleah.

1037

@docbook

1038

1039

1040

<firstname>Michael</firstname>

1041

<surname>Brennan</surname>

1042

1043

<affiliation><jobtitle>Author of mawk</jobtitle></affiliation>

1044

</author>

1045

<date>March, 2001</date>

1046

</prefaceinfo>

1047

@end docbook

1048

872

1049

Arnold Robbins and I are good friends. We were introduced

873

1050

@c 11 years ago

874

1051

in 1990

953

1130

The new @command{pgawk} (profiling @command{gawk}), produces

954

1131

program execution counts.

955

1132

I recently experimented with an algorithm that for

956

@math{n} lines of input, exhibited

1133

@ifnotdocbook

1134

@math{n}

1135

@end ifnotdocbook

1136

@ifdocbook

1137

@i{n}

1138

@end ifdocbook

1139

lines of input, exhibited

957

1140

@tex

958

1141

$\sim\! Cn^2$

959

1142

@end tex

960

1143

@ifnottex

1144

@ifnotdocbook

961

1145

~ C n^2

1146

@end ifnotdocbook

962

1147

@end ifnottex

1148

@docbook

1149

<emphasis>&sim; Cn<superscript>2</superscript></emphasis> @c

1150

@end docbook

963

1151

performance, while

964

1152

theory predicted

965

1153

@tex

966

1154

$\sim\! Cn\log n$

967

1155

@end tex

968

1156

@ifnottex

1157

@ifnotdocbook

969

1158

~ C n log n

1159

@end ifnotdocbook

970

1160

@end ifnottex

1161

@docbook

1162

<emphasis>&sim; Cn log n</emphasis> @c

1163

@end docbook

971

1164

behavior. A few minutes poring

972

1165

over the @file{awkprof.out} profile pinpointed the problem to

973

1166

a single line of code. @command{pgawk} is a welcome addition to

977

1170

using AWK programs, and developing @command{gawk}, into this book. If you use

978

1171

AWK or want to learn how, then read this book.

979

1172

1173

@ifnotdocbook

1174

@cindex Brennan, Michael

980

1175

@display

981

1176

Michael Brennan

982

1177

Author of @command{mawk}

983

1178

March, 2001

984

1179

@end display

1180

@end ifnotdocbook

985

1181

986

1182

@node Preface

987

1183

@unnumbered Preface

990

1186

@c

991

1187

@c 12/2000: Chuck wants the preface & intro combined.

992

1188

1189

@c This bit is post-processed by a script which turns the chapter

1190

@c tag into a preface tag, and moves this stuff to before the title.

1191

@c Bleah.

1192

@docbook

1193

1194

1195

<firstname>Arnold</firstname>

1196

<surname>Robbins</surname>

1197

<affiliation><jobtitle>Nof Ayalon</jobtitle></affiliation>

1198

<affiliation><jobtitle>ISRAEL</jobtitle></affiliation>

1199

</author>

1200

1201

</prefaceinfo>

1202

@end docbook

1203

993

1204

Several kinds of tasks occur repeatedly

994

1205

when working with text files.

995

1206

You might want to extract certain lines and discard the rest.

1001

1212

The @command{awk} utility interprets a special-purpose programming language

1002

1213

that makes it easy to handle simple data-reformatting jobs.

1003

1214

1215

@cindex Brian Kernighan's @command{awk}

1004

1216

The GNU implementation of @command{awk} is called @command{gawk}; if you

1005

1217

invoke it with the proper options or environment variables

1006

1218

(@pxref{Options}), it is fully

1007

1219

compatible with

1008

the POSIX@footnote{The 2008 POSIX standard is online at

1009

@url{http://www.opengroup.org/onlinepubs/9699919799/}.}

1220

the POSIX@footnote{The 2008 POSIX standard is accessable online at

1221

@w{@url{http://www.opengroup.org/onlinepubs/9699919799/}.}}

1010

1222

specification of the @command{awk} language

1011

1223

and with the Unix version of @command{awk} maintained

1012

1224

by Brian Kernighan.

1023

1235

@cindex @command{awk}, uses for

1024

1236

Using @command{awk} allows you to:

1025

1237

1026

@itemize @bullet

1238

@itemize @value{BULLET}

1027

1239

@item

1028

1240

Manage small, personal databases

1029

1241

1048

1260

@command{gawk}

1049

1261

provides facilities that make it easy to:

1050

1262

1051

@itemize @bullet

1263

@itemize @value{BULLET}

1052

1264

@item

1053

1265

Extract bits and pieces of data for processing

1054

1266

1057

1269

1058

1270

@item

1059

1271

Perform simple network communications

1272

1273

@item

1274

Profile and debug @command{awk} programs.

1275

1276

@item

1277

Extend the language with functions written in C or C++.

1060

1278

@end itemize

1061

1279

1062

1280

This @value{DOCUMENT} teaches you about the @command{awk} language and

1072

1290

different computing environments. This @value{DOCUMENT}, while describing

1073

1291

the @command{awk} language in general, also describes the particular

1074

1292

implementation of @command{awk} called @command{gawk} (which stands for

1075

``GNU awk''). @command{gawk} runs on a broad range of Unix systems,

1293

``GNU @command{awk}''). @command{gawk} runs on a broad range of Unix systems,

1076

1294

ranging from Intel@registeredsymbol{}-architecture PC-based computers

1077

up through large-scale systems,

1078

such as Crays. @command{gawk} has also been ported to Mac OS X,

1079

Microsoft Windows (all versions) and OS/2 PCs,

1080

and VMS.

1295

up through large-scale systems.

1296

@command{gawk} has also been ported to Mac OS X,

1297

Microsoft Windows

1298

@ifset FOR_PRINT

1299

(all versions),

1300

@end ifset

1301

@ifclear FOR_PRINT

1302

(all versions) and OS/2 PCs,

1303

@end ifclear

1304

and OpenVMS.

1081

1305

(Some other, obsolete systems to which @command{gawk} was once ported

1082

1306

are no longer supported and the code for those systems

1083

1307

has been removed.)

1151

1375

@cite{TCP/IP Internetworking with @command{gawk}}

1152

1376

(a separate document, available as part of the @command{gawk} distribution).

1153

1377

His code finally became part of the main @command{gawk} distribution

1154

with @command{gawk} version 3.1.

1378

with @command{gawk} @value{PVERSION} 3.1.

1155

1379

1156

1380

John Haque rewrote the @command{gawk} internals, in the process providing

1157

1381

an @command{awk}-level debugger. This version became available as

1158

@command{gawk} version 4.0, in 2011.

1382

@command{gawk} @value{PVERSION} 4.0, in 2011.

1159

1383

1160

1384

@xref{Contributors},

1161

1385

for a complete list of those who made important contributions to @command{gawk}.

1170

1394

is often referred to as ``new @command{awk}'' (@command{nawk}).

1171

1395

1172

1396

@cindex @command{awk}, versions of

1173

Because of this, there are systems with multiple

1174

versions of @command{awk}.

1175

Some systems have an @command{awk} utility that implements the

1176

original version of the @command{awk} language and a @command{nawk} utility

1177

for the new version.

1178

Others have an @command{oawk} version for the ``old @command{awk}''

1179

language and plain @command{awk} for the new one. Still others only

1180

have one version, which is usually the new one.@footnote{Often, these systems

1181

use @command{gawk} for their @command{awk} implementation!}

1182

1183

1397

@cindex @command{nawk} utility

1184

1398

@cindex @command{oawk} utility

1185

All in all, this makes it difficult for you to know which version of

1186

@command{awk} you should run when writing your programs. The best advice

1187

we can give here is to check your local documentation. Look for @command{awk},

1188

@command{oawk}, and @command{nawk}, as well as for @command{gawk}.

1189

It is likely that you already

1190

have some version of new @command{awk} on your system, which is what

1191

you should use when running your programs. (Of course, if you're reading

1192

this @value{DOCUMENT}, chances are good that you have @command{gawk}!)

1399

For some time after new @command{awk} was introduced, there were

1400

systems with multiple versions of @command{awk}. Some systems had

1401

an @command{awk} utility that implemented the original version of the

1402

@command{awk} language and a @command{nawk} utility for the new version.

1403

Others had an @command{oawk} version for the ``old @command{awk}''

1404

language and plain @command{awk} for the new one. Still others only

1405

had one version, which is usually the new one.

1406

1407

Today, only Solaris systems still use an old @command{awk} for the

1408

default @command{awk} utility. (A more modern @command{awk} lives in

1409

@file{/usr/xpg6/bin} on these systems.) All other modern systems use

1410

some version of new @command{awk}.@footnote{Many of these systems use

1411

@command{gawk} for their @command{awk} implementation!}

1412

1413

It is likely that you already have some version of new @command{awk} on

1414

your system, which is what you should use when running your programs.

1415

(Of course, if you're reading this @value{DOCUMENT}, chances are good

1416

that you have @command{gawk}!)

1193

1417

1194

1418

Throughout this @value{DOCUMENT}, whenever we refer to a language feature

1195

1419

that should be available in any complete implementation of POSIX @command{awk},

1207

1431

This @value{DOCUMENT} explains

1208

1432

both how to write programs in the @command{awk} language and how to

1209

1433

run the @command{awk} utility.

1210

The term @dfn{@command{awk} program} refers to a program written by you in

1434

The term ``@command{awk} program'' refers to a program written by you in

1211

1435

the @command{awk} programming language.

1212

1436

1213

1437

@cindex @command{gawk}, @command{awk} and

1217

1441

as defined in the POSIX standard. It does so in the context of the

1218

1442

@command{gawk} implementation. While doing so, it also

1219

1443

attempts to describe important differences between @command{gawk}

1220

and other @command{awk} implementations.@footnote{All such differences

1444

and other @command{awk}

1445

@ifclear FOR_PRINT

1446

implementations.@footnote{All such differences

1221

1447

appear in the index under the

1222

1448

entry ``differences in @command{awk} and @command{gawk}.''}

1449

@end ifclear

1450

@ifset FOR_PRINT

1451

implementations.

1452

@end ifset

1223

1453

Finally, any @command{gawk} features that are not in

1224

1454

the POSIX standard for @command{awk} are noted.

1225

1455

1227

1457

This @value{DOCUMENT} has the difficult task of being both a tutorial and a reference.

1228

1458

If you are a novice, feel free to skip over details that seem too complex.

1229

1459

You should also ignore the many cross-references; they are for the

1230

expert user and for the online Info and HTML versions of the document.

1460

expert user and for the online Info and HTML versions of the @value{DOCUMENT}.

1231

1461

@end ifnotinfo

1232

1462

1233

1463

There are sidebars

1251

1481

1252

1482

This @value{DOCUMENT} is split into several parts, as follows:

1253

1483

1484

@c FULLXREF ON

1485

1254

1486

Part I describes the @command{awk} language and @command{gawk} program in detail.

1255

1487

It starts with the basics, and continues through all of the features of @command{awk}.

1256

1488

It contains the following chapters:

1334

1566

@ref{Dynamic Extensions}, describes how to add new variables and

1335

1567

functions to @command{gawk} by writing extensions in C or C++.

1336

1568

1569

@ifclear FOR_PRINT

1337

1570

Part IV provides the appendices, the Glossary, and two licenses that cover

1338

1571

the @command{gawk} source code and this @value{DOCUMENT}, respectively.

1339

1572

It contains the following appendices:

1573

@end ifclear

1574

@ifset FOR_PRINT

1575

Part IV provides the following appendices:

1576

@end ifset

1340

1577

1341

1578

@ref{Language History},

1342

1579

describes how the @command{awk} language has evolved since

1351

1588

in @command{gawk} and where to get other freely

1352

1589

available @command{awk} implementations.

1353

1590

1591

@ifset FOR_PRINT

1592

The version of this @value{DOCUMENT} distributed with @command{gawk}

1593

contains additional appendices and other end material.

1594

To save space, we have omitted them from the

1595

printed edition. You may find them online, as follows:

1596

1597

@uref{http://www.gnu.org/software/gawk/manual/html_node/Notes.html,

1598

The appendix on implementation notes}

1599

describes how to disable @command{gawk}'s extensions, as

1600

well as how to contribute new code to @command{gawk},

1601

and some possible future directions for @command{gawk} development.

1602

1603

@uref{http://www.gnu.org/software/gawk/manual/html_node/Basic-Concepts.html,

1604

The appendix on basic concepts}

1605

provides some very cursory background material for those who

1606

are completely unfamiliar with computer programming.

1607

1608

@uref{http://www.gnu.org/software/gawk/manual/html_node/Glossary.html,

1609

The Glossary}

1610

defines most, if not all, the significant terms used

1611

throughout the @value{DOCUMENT}. If you find terms that you aren't familiar with,

1612

try looking them up here.

1613

1614

@uref{http://www.gnu.org/software/gawk/manual/html_node/Copying.html, The GNU GPL} and

1615

@uref{http://www.gnu.org/software/gawk/manual/html_node/GNU-Free-Documentation-License.html, the GNU FDL}

1616

are the licenses that cover the @command{gawk} source code

1617

and this @value{DOCUMENT}, respectively.

1618

@end ifset

1619

1620

@ifclear FOR_PRINT

1354

1621

@ref{Notes},

1355

1622

describes how to disable @command{gawk}'s extensions, as

1356

1623

well as how to contribute new code to @command{gawk},

1361

1628

are completely unfamiliar with computer programming.

1362

1629

1363

1630

The @ref{Glossary}, defines most, if not all, the significant terms used

1364

throughout the book. If you find terms that you aren't familiar with,

1631

throughout the @value{DOCUMENT}. If you find terms that you aren't familiar with,

1365

1632

try looking them up here.

1366

1633

1367

1634

@ref{Copying}, and

1368

1635

@ref{GNU Free Documentation License},

1369

1636

present the licenses that cover the @command{gawk} source code

1370

1637

and this @value{DOCUMENT}, respectively.

1638

@end ifclear

1639

1640

@c FULLXREF OFF

1371

1641

1372

1642

@node Conventions

1373

1643

@unnumberedsec Typographical Conventions

1409

1679

strongly, it is done @strong{like this}. The first occurrence of

1410

1680

a new term is usually its @dfn{definition} and appears in the same

1411

1681

font as the previous occurrence of ``definition'' in this sentence.

1412

Finally, file names are indicated like this: @file{/path/to/ourfile}.

1682

Finally, @value{FN}s are indicated like this: @file{/path/to/ourfile}.

1413

1683

@end ifnotinfo

1414

1684

1415

1685

Characters that you type at the keyboard look @kbd{like this}. In particular,

1441

1711

@ifnottex

1442

1712

``(d.c.)''.

1443

1713

@end ifnottex

1714

@ifclear FOR_PRINT

1444

1715

They also appear in the index under the heading ``dark corner.''

1716

@end ifclear

1445

1717

1446

As noted by the opening quote, though, any

1447

coverage of dark corners

1448

is, by definition, incomplete.

1718

As noted by the opening quote, though, any coverage of dark corners is,

1719

by definition, incomplete.

1449

1720

1450

1721

Extensions to the standard @command{awk} language that are supported by

1451

1722

more than one @command{awk} implementation are marked

1723

@ifclear FOR_PRINT

1452

1724

``@value{COMMONEXT},'' and listed in the index under ``common extensions''

1453

1725

and ``extensions, common.''

1726

@end ifclear

1727

@ifset FOR_PRINT

1728

``@value{COMMONEXT}.''

1729

@end ifset

1454

1730

1455

1731

@node Manual History

1456

1732

@unnumberedsec The GNU Project and This Book

1473

1749

computing environment.

1474

1750

The FSF uses the ``GNU General Public License'' (GPL) to ensure that

1475

1751

their software's

1476

source code is always available to the end user. A

1477

copy of the GPL is included

1752

source code is always available to the end user.

1753

@ifclear FOR_PRINT

1754

A copy of the GPL is included

1478

1755

@ifnotinfo

1479

1756

in this @value{DOCUMENT}

1480

1757

@end ifnotinfo

1481

1758

for your reference

1482

1759

(@pxref{Copying}).

1760

@end ifclear

1483

1761

The GPL applies to the C language source code for @command{gawk}.

1484

1762

To find out more about the FSF and the GNU Project online,

1485

1763

see @uref{http://www.gnu.org, the GNU Project's home page}.

1502

1780

system for Intel@registeredsymbol{},

1503

1781

Power Architecture,

1504

1782

Sun SPARC, IBM S/390, and other

1783

@ifclear FOR_PRINT

1505

1784

systems.@footnote{The terminology ``GNU/Linux'' is explained

1506

1785

in the @ref{Glossary}.}

1786

@end ifclear

1787

@ifset FOR_PRINT

1788

systems.

1789

@end ifset

1507

1790

Many GNU/Linux distributions are

1508

1791

available for download from the Internet.

1509

1792

1523

1806

information in it is free to anyone. The machine-readable

1524

1807

source code for the @value{DOCUMENT} comes with @command{gawk}; anyone

1525

1808

may take this @value{DOCUMENT} to a copying machine and make as many

1526

copies as they like. (Take a moment to check the Free Documentation

1809

copies as they like.

1810

@ifclear FOR_PRINT

1811

(Take a moment to check the Free Documentation

1527

1812

License in @ref{GNU Free Documentation License}.)

1813

@end ifclear

1528

1814

@end ifnotinfo

1529

1815

1530

@ignore

1531

@cindex Close, Diane

1532

The @value{DOCUMENT} itself has gone through several previous,

1533

preliminary editions.

1534

Paul Rubin wrote the very first draft of @cite{The GAWK Manual};

1535

it was around 40 pages in size.

1536

Diane Close and Richard Stallman improved it, yielding the

1537

version which I started working with in the fall of 1988.

1538

It was around 90 pages long and barely described the original, ``old''

1539

version of @command{awk}. After substantial revision, the first version of

1540

the @cite{The GAWK Manual} to be released was Edition 0.11 Beta in

1541

October of 1989. The manual then underwent more substantial revision

1542

for Edition 0.13 of December 1991.

1543

David Trueman, Pat Rankin and Michal Jaegermann contributed sections

1544

of the manual for Edition 0.13.

1545

That edition was published by the

1546

FSF as a bound book early in 1992. Since then there were several

1547

minor revisions, notably Edition 0.14 of November 1992 that was published

1548

by the FSF in January of 1993 and Edition 0.16 of August 1993.

1549

1550

Edition 1.0 of @cite{GAWK: The GNU Awk User's Guide} represented a significant re-working

1551

of @cite{The GAWK Manual}, with much additional material.

1552

The FSF and I agreed that I was now the primary author.

1553

@c I also felt that the manual needed a more descriptive title.

1554

1555

In January 1996, SSC published Edition 1.0 under the title @cite{Effective AWK Programming}.

1556

In February 1997, they published Edition 1.0.3 which had minor changes

1557

as a ``second edition.''

1558

In 1999, the FSF published this same version as Edition 2

1559

of @cite{GAWK: The GNU Awk User's Guide}.

1560

1561

Edition @value{EDITION} maintains the basic structure of Edition 1.0,

1562

but with significant additional material, reflecting the host of new features

1563

in @command{gawk} version @value{VERSION}.

1564

Of particular note is

1565

@ref{Array Sorting},

1566

@ref{Bitwise Functions},

1567

@ref{Internationalization},

1568

@ref{Advanced Features},

1569

and

1570

@ref{Dynamic Extensions}.

1571

@end ignore

1572

1573

1816

@cindex Close, Diane

1574

1817

The @value{DOCUMENT} itself has gone through a number of previous editions.

1575

1818

Paul Rubin wrote the very first draft of @cite{The GAWK Manual};

1585

1828

In 1996, Edition 1.0 was released with @command{gawk} 3.0.0.

1586

1829

The FSF published the first two editions under

1587

1830

the title @cite{The GNU Awk User's Guide}.

1831

@ifset FOR_PRINT

1832

SSC published two editions of the @value{DOCUMENT} under the

1833

title @cite{Effective awk Programming}, and in O'Reilly published

1834

the third edition in 2001.

1835

@end ifset

1588

1836

1589

1837

This edition maintains the basic structure of the previous editions.

1590

For Edition 4.0, the content has been thoroughly reviewed

1838

For FSF edition 4.0, the content has been thoroughly reviewed

1591

1839

and updated. All references to @command{gawk} versions prior to 4.0 have been

1592

1840

removed.

1593

1841

Of significant note for this edition was @ref{Debugger}.

1594

1842

1595

For edition @value{EDITION}, the content has been reorganized into parts,

1843

For FSF edition

1844

@ifclear FOR_PRINT

1845

@value{EDITION},

1846

@end ifclear

1847

@ifset FOR_PRINT

1848

@value{EDITION}

1849

(the fourth edition as published by O'Reilly),

1850

@end ifset

1851

the content has been reorganized into parts,

1596

1852

and the major new additions are @ref{Arbitrary Precision Arithmetic},

1597

1853

and @ref{Dynamic Extensions}.

1598

1854

1599

@cite{@value{TITLE}} will undoubtedly continue to evolve.

1600

An electronic version

1601

comes with the @command{gawk} distribution from the FSF.

1602

If you find an error in this @value{DOCUMENT}, please report it!

1603

@xref{Bugs}, for information on submitting

1604

problem reports electronically.

1605

1855

This @value{DOCUMENT} will undoubtedly continue to evolve. An electronic

1856

version comes with the @command{gawk} distribution from the FSF. If you

1857

find an error in this @value{DOCUMENT}, please report it! @xref{Bugs},

1858

for information on submitting problem reports electronically.

1859

1860

@ifset FOR_PRINT

1861

@c fakenode --- for prepinfo

1862

@unnumberedsec How to Stay Current

1863

1864

It may be you have a version of @command{gawk} which is newer than the

1865

one described in this @value{DOCUMENT}. To find out what has changed,

1866

you should first look at the @file{NEWS} file in the @command{gawk}

1867

distribution, which provides a high level summary of what changed in

1868

each release.

1869

1870

You can then look at the @uref{http://www.gnu.org/software/gawk/manual/,

1871

online version} of this @value{DOCUMENT} to read about any new features.

1872

@end ifset

1873

1874

@ifclear FOR_PRINT

1606

1875

@node How To Contribute

1607

1876

@unnumberedsec How to Contribute

1608

1877

1619

1888

contributed code: the archive did not grow and the domain went unused

1620

1889

for several years.

1621

1890

1622

Fortunately, late in 2008, a volunteer took on the task of setting up

1891

Late in 2008, a volunteer took on the task of setting up

1623

1892

an @command{awk}-related web site---@uref{http://awk.info}---and did a very

1624

1893

nice job.

1625

1894

1628

1897

of the world, please see @uref{http://awk.info/?contribute} for how to

1629

1898

contribute it to the web site.

1630

1899

1900

As of this writing, this website is in search of a maintainer; please

1901

contact me if you are interested.

1902

1631

1903

@ignore

1632

1904