The AWK Programming Language, Second Edition

Updated Mon Feb 5 10:22:02 EST 2024

     

Available in paperback and e-book formats. Order at Amazon and other fine booksellers.

Introduction

This page holds material related to the second edition of The AWK Programming Language. The first edition was written by Al Aho, Brian Kernighan and Peter Weinberger in 1988. Awk has evolved since then, there are multiple implementations, and of course the computing world has changed enormously. The new edition of the Awk book reflects some of those changes.

The book is now available on paper and electronically. We are continuing to add material that we hope will be of interest -- historical documents, bits of code, and occasional essays on Awk and related topics.

The table of contents and preface of the new edition is here.

Programs and data files are now available, though not in a very orderly form. Download programs.tar (33 MB).

Contact us at info@awk.dev.

Errata

(These are listed in page number order.)

Sep 20, 2023, page 3, line -5:

The input line Kathy 15.50 10 should be in italic. Thanks to Galen Menzel for spotting this error.

Oct 6, 2023, page 9, line -2:

It should say "$2 is less than 20 ..." to match the code. Thanks to Kevin Lo for spotting this error.

Oct 10, 2023, page 24:

The test at the top of the page should be

NR == 10 { exit }
Otherwise the code prints 11 lines, not 10. Thanks to Mark Konezny.

Sep 16, 2023, pages 26 and 27:

The streaming version of mc one page 26 prints output in 7 columns, not 5, since the loop starts with n set to zero and ends with n = 6. Here's a better version:

{ out = sprintf("%s%-10.10s  ", out, $0)
  if (++n >= 5) {
    print substr(out, 1, length(out)-2)
    out = ""
    n = 0
  }
}

The second version of mc on page 27 has a different problem: it doesn't include the two spaces between columns when computing the number of columns, so the result is always too high. Probably easiest fixed like this:

  ncol = int(60 / (max+2) + 0.5)  # int(x) returns integer value of x
Many thanks to 郭济琳 (Jilin Guo) for spotting these errors.

Oct 8, 2023, page 32, line 9:

The expression s = s $n++ " " in the function rest works on Awk but not on Gawk. It's an ambiguity in resolving the precedences of the prefix $ and the postfix ++. Fixed by adding parentheses:

s = s $(n++) " "
The same construction appears near the middle of the page and is fixed in the same way. Thanks to 郭济琳 (Jilin Guo).

Oct 8, 2023, page 33, line -10:

The split function is better written as

split(date, d)
to properly handle single-digit days that might be preceded by two spaces instead of one. Alternatively, the third argument could be /  +/. Thanks to 郭济琳 (Jilin Guo).

Nov 4, 2023, page 42, line -10:

The test should be $5 < 0.5 to match the text, which says "What about beer with less that say 0.5%?" Thanks to 郭济琳 (Jilin Guo).

Nov 21, 2023, page 88:

The derivation at the bottom of the page is missing a couple of intermediate states. It should read

Sentence -> Nounphrase Verbphrase
         -> the girl Verbphrase
         -> the girl Verb Modlist Adverb
         -> the girl runs Modlist Adverb
         -> the girl runs very Modlist Adverb
         -> the girl runs very very Modlist Adverb
         -> the girl runs very very Adverb
         -> the girl runs very very quickly
Thanks to Eran Yarkon.

Nov 21, 2023, page 94, line -10:

In

pfx = tolower($0)
gsub(/[^A-Za-z]/, "", pfx)
the RE doesn't need A-Z since pfx has no upper case letters. Thanks to Eran Yarkon.

Nov 27, 2023, page 111, line 8 of section 7.2:

The display should read title caption, not label.

On page 114, the line { ok = 1 } about 12 lines up from the bottom of the page doesn't do anything since ok is set to zero by the next pattern.

On page 115, inside the for loop in the END block, there's no need to test flag again; it's never empty at this point. Thanks to Eran Yarkon for these.

Awk for Exploratory Data Analysis (Sep 21, 2023)

Awk has always been a good tool for taking a quick look at some dataset. How many items of what kind are there? What is the range of numeric values in some field? Are there anomalies in the data, like rows with too many or too few fields?

A new chapter in the book talks about using Awk for this kind of analysis, using a couple of datasets. But there are plenty of other examples as well. BWK co-taught a course in the Humanities sequence at Princeton where Awk was taught to some very non-technical students as a tool for looking at some neat data about English poetry.

This essay describes some of what went on there; it might give you some ideas about how Awk can be used in a different domain.

Interesting Threads

Ben Hoyt, author of GoAwk and one of the expert technical reviewers of the second edition, has an interesting blog post on an implementation of the Unix make command in 60 lines of Awk, along with a Python version for comparison. One wouldn't make make in Awk, as Ben notes, but it's a good vehicle for learning how something works. (Sep 21, 2023)

There's a Hacker News thread on Ben's original post here, with some interesting comments.

Awk Source and Documentation

Awk source is maintained at https://github.com/onetrueawk/awk.

Gawk releases are at https://ftp.gnu.org/gnu/gawk; the Gawk manual is here.

Arnold Robbins has compiled a list of other implementations of Awk.

Historical Documents

The citations in the original Awk book have by now become quite dusty, but some of the material is still interesting and potentially useful. Here are references to some of the documents, perhaps updated.

Autre temps, autres Awks

  • There is a room named "Awk" in the Ole-Johan Dahl informatics building at the University of Oslo. Dahl and Kristen Nygaard received the ACM Turing Award in 2001 for the creation of Simula 67 and the development of object-oriented programming. Many of the rooms in the building are named after programming languages; Awk is in good company. (from Hacker News, July 3, 2023)

  • A treasured memory from our late friend and colleague, Dennis Ritchie, posted on local Unix systems in 1986:
    awk.news (dmr) Tue Jul 15 23:47:57 1986
    
       Rosa Miller, a Tlinget [sic] Indian who lives in Juneau, ... is a member
       of the Dipper House of the Dog Salmon Clan of the Raven Moiety of the
       Awk Tribe of the Tlingit (pronounced KLINK-it) Nation....
    
       Mrs. Miller contends that the Awk Tribal Council in Juneau was set up by
       people who were not Awks but, as she calls them, "Johnny come latelies"
       to the area....
    	   New York Times, 7/14, p. A8
    
       CORRECTION
    
       Because of a transmission error, the Alaska Journal yesterday,
       from Anchorage, misidentified an Indian tribe.  It is the Auk, not Awk.
    	   New York Times, 7/15, p. B1
    

  • From the Oxford English Dictionary (with thanks to Nelson Beebe):
    Awk (adj, obs;  also awke, auk, awck) [from ON afug, turned the wrong way, back foremost, perverse]
    
    1. Directed the other way or in the wrong direction, back-handed, from the left hand.
    1634:  "With an awke stroke gaue hym a grete wounde."
    
    2. Untoward, froward, perverse, in nature or disposition.
    1642:  "Our natures more crooked, inconstante, awk, and perverse."
    
    3. Out of the way, odd, strange (rare) [fortunately]
    
    4. Untoward to deal with, awkward to use, clumsy.
    
    There are also awkly, awkness, awkward, awkwardish, awkwardly, awkwardness, and awky.
    

  • Nelson also adds:
    In Scotland, upon April Day, they have a custom of ``hunting the gowk
    ...', properly, a cuckoo, and is used here, metaphorically in vulgar
    language, for a fool.  This is done by sending silly people upon
    fools' errands from place to place, by means of a letter in which it
    is written: ``On the first day of April, Hunt the gowk another mile.''
    
    		John Brand's ``Observations on Popular Antiquities, 1813''
    		(c) Jeffrey Kacirk
    

  • This triumph of the advertiser's art didn't last very long, unfortunately. We are grateful to A&W Restaurants for their support.

  • A cartoon from Russell Myers' Broom Hilda in June 2018:


  • Ozan Yigit spotted this artwork on a Danish TV show and identified it with Google image search: A lithographic art print by Gérard Gasiorowski from Galerie Maeght in Paris for an exhibition in 1982.

  • An old but good cartoon from User Friendly, by J. D. "Illiad" Frazer:

    Do you have one to add? Send it along! Mail to info@awk.dev.