Name: Sed & Awk, Programmation avancée
Rating: 3.86 (30 reviews)

Rate this book

Sed & Awk, Programmation avancée

Dale Dougherty

Rate this book

A l'heure où les systèmes de type UNIX renforcent leur présence dans tous les domaines de l'informatique, les administrateurs et les programmeurs, tout comme les utilisateurs occasionnels, doivent traiter un nombre croissant de fichiers hétérogènes.
Qu'il s'agisse de mettre en forme des rapports de log, ou de générer automatiquement de tableaux statistiques, la préparation des fichiers en vue d'un traitement ultérieur sont souvent des tâches répétitives, quil est beaucoup plus économique de confier à un programme ou à un simple script?
sed et awk servent justement à cela. Grâce à eux, un script de quelques lignes peut vous faire économiser des heures de travail.
Vous trouverez dans cet ouvrage non seulement des références complètes pour ces deux langages, mais aussi une multitude d'exemples concrets et d'études de cas. En plus des concepts de basen comme les "expressions régulières" d'Unis, vous apprendrez des notions que peu de programmeurs maîtrisent parfaitement :
Avec sed :
- La modification simultanée de plusieurs fichiers
- L'utilisation rationnelle de sordres de substitution
- Les tampons multi-lignes
- Les commandes de contrôle de flots
Avec awk :
- Le passage de paramètres à un script
- Les tableaux milti-dimensionnels
- L'écriture de fonctions utilisateur
- La fonction getline
Les chapitres 11 (Applications vraie grandeur) et 12 (Une anthologie de script) montrent comment utiliser toutes ces subtilités pour créer des outils réellement efficaces et utiles. Vous pourrez vous inspirer de ces scripts détaillés pour écrire vos propres outils, adaptés à vos besoins.

GenresProgrammingReferenceComputer ScienceTechnicalComputersTechnologyNonfiction

405 pages, Paperback

First published November 8, 1990

97 people are currently reading

650 people want to read

About the author

Dale Dougherty

34��books6��followers

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

169 (26%)

4 stars

256 (40%)

3 stars

156 (24%)

2 stars

39 (6%)

1 star

8 (1%)

Displaying 1 - 30 of 30 reviews

Christopher

1,390 reviews207 followers

October 15, 2020

This guide to two of the most commonly used and powerful *nix command-line tools was written back in the 1990s, but it has remained in print (at least as an ebook) simply because these tools have remained broadly the same for decades now. However, resources for learning sed and awk have evolved over the years and, faced with that competition, this book is no longer very important.

Thus, while this book is a convenient introduction to sed, there isn’t much here that won’t be readily covered in free introductions on the web. With regard to awk, Arnold Robbin’s GAWK: Effective AWK Programming (a revised and expanded version of this old book) is freely provided along with GNU Awk as a PDF and covers some significant new features that have appeared in GNU Awk in recent years. So, don’t go out of your way to find this old O’Reilly manual.

Darin

120 reviews18 followers

February 11, 2021

I really don't care for this book. It's probably OK if I wanted to read the entire thing to learn to use sed and awk, but it makes a poor reference. Virtually every time I have turned to this book to learn how to do something in sed or awk, I was left disappointed. The information may be there, but I couldn't find it by browsing the book.

computer-science own unix-linux

Jerry

Author��10 books27 followers

November 11, 2021

In a footnote to I wrote, “I have the same problem with awk. For me it’s usually easier in the long run to write a simple Perl script to handle the cases awk is great at.�� The “same problem�� was that the file command “is an arcane command which is not amenable to obviousness. It produces long command lines with obscure switches that require, for me at least, reading through the man page every time I need to alter it.��

Two days later, I saw this book at the New Braunfels Library book sale. It seemed too serendipitous to pass up a tutorial on a tool I’d just complained about not understanding.

While the tutorial in general was worthwhile, it was maddeningly untested. And it was not a good advertisement for the maintainability of these tools. Throughout the book the author talks about using sed and awk for publishing, including a full tool for creating and maintaining an index. Just now, I wanted to verify what I’d written about branching; the index says branching is on page 23. There is nothing there about branching (and page 23 is far too early to discuss one of the more advanced features of sed). Thinking, okay, one typo is not a big deal, the other end of branching is the label the branch goes to, so I’ll look up “label��. That also is indexed as page 23, and of course is not there. (It’s on page 132.)

The other feature I wanted to look up for this thought was the “hold�� command. It’s indexed as on page 14. Unsurprisingly, page 14 is also far too early for an advanced topic. (It’s on page 123.)

The examples have a tendency to be contrived snippets. It was usually easier to write my own test scripts using the concepts just introduced than to try and decipher the examples. Partly this is just the nature of sed, but it’s also the nature of contrived snippets: without code that’s meant to be run, mistakes are easily missed. For example, the author uses ;grep ' book.* ' bookwords as an example, and it matches the line such as booky. This means that either that line had a space at the end of it that was not specified as part of the line, or the grep command run was not the one specified.

Then there’s:

grep '["[{(]*book[]})\"?:'s]* ' bookwords

Which probably was not run, because had it been the author would have recognized that some of the characters need to be escaped. In fact, I was unable to get it to work at all, because sh doesn’t appear to allow escaping the single quote character.

Other text implies that he’s using sh for the examples, though he never says explicitly that this is the case. This might be because sh was the only real option at the time. The book was published in 1991. File names don’t use extensions to highlight the type of file. Since all files are text files, the extension is used to designate some subtype of what the file is for, such as “region.north��.

All of the database-like examples involve text-based databases, my favorite kind. At the time this was written, there was no free or even inexpensive SQL-based database—mSQL wouldn’t come until 1994, and MySQL until 1995.

Another fascinating relic of the elder times is the lack of memory in some systems for relatively simple scripts. In one footnote, he writes:

I have found a couple of circumstances where larger scripts failed, reporting syntax errors that I could not track down. Removing all the comments made the program run fine. I concur [sic] from this that the size of the program was the problem, which seemed to be a machine-specific limitation of awk. The program, comments included, ran fine on two other systems.

As a tutorial, this was useful, but it has a very specific audience that already knows much about what he’s talking about. In some ways I’m part of that audience—I do a lot of programming already, especially shell programming. But temporally, I’m not. A lot has changed since 1991, so that when he introduces the idea of using sed as a shell script, but doesn’t mention the shebang, the execution bit, or even prefacing it with sh to run it, I don’t know if that’s because some of these weren’t necessary.

He’s also sometimes lazy with his language, such as using the term “global�� to mean both affecting every line, and affecting every instance of a regex target within a line. To illustrate the latter meaning, he wrote that s/CA/California “will change every “CA�� into “California��. But it won’t. It will change the first instance of “CA�� on every line to “California��. If there are multiple instances, only the first will be changed. To make it more confusing, one paragraph down he switched to the global (/g) form without mentioning the switch.

Similarly, the example of running the line ��4T�� through /[0-9]+$/ would not, as the book says, tell us that’s an integer, because the dollar sign at the end means that while the line could start with text and still be an integer, it can’t end with text and still be an integer.

These sorts of things are why, in , I wrote a script to verify that scripts in the text of the book matched the script I actually use on the command line. Any deviation required that I both fix the script in the book and recreate any sample output.

His examples tend to be of very narrow interest, even for when he wrote the book; they are only moreso now. A new edition was published six years later in 1997, which is far closer to the first edition’s publication year than it is to 2021. I often got a distinct BASIC vibe while reading and using the code examples. Some things about it are very rooted in the past, such as the heavily abbreviated and even single-character variable names, filenames without extensions, and the complete lack of consistency about when and how to indent lines. The latter is made worse by the necessity of often writing the entire awk or sed script as a string inside of a shell script.

Sed (as it appeared in 1991) isn’t really designed as a programming tool; it’s more of a filter with some programming features added on. It appears to work best as a series of regular expression-based edits. Programming it is a lot like programming in assembly language, with hold spaces and work spaces and exchanging between the two as if they’re registers. Flow control is handled with GOTO-like branches to labels.

Altering the flow of control makes a script much more difficult to read and understand. In fact, the scripts may be easier to write than they are to read. When you are writing a difficult script, you have the benefit of testing it to see how and why commands work.

This book convinced me that I probably never want to use sed except for very simple sequential edits. Anything else will be more maintainable in Perl. For that matter, even simple sequential edits will probably be more maintainable in Perl if it’s a script; sed’s benefit is the ability to do them on the command line.

One of the weirdest things reading this is that despite coming of age in an era when paragraphs were broken into multiple lines, sed doesn’t have any built-in way of automatically handling paragraphs. This is probably because there was no markdown equivalent at the time making paragraphs a standard format, but it makes for very convoluted scripts in his examples.

This oddly makes sed more useful nowadays when paragraphs and lines can be the same thing, either by having them be the same in the source text or by running them through a markdown unwrapper before running them through sed.

Placing multiple commands on the same line is highly discouraged because sed scripts are difficult enough to read even when each command is written on its own line.

Another oddity of sed is that the hash tag was originally used not for comments but for an option: if the first two characters of a script are ��#n�� it turns off output. Originally, sed only allowed hash-tag lines on the first line; I suspect that because sed looked for ��#n�� and then ignored everything else on the line, people naturally used everything else for comments. Fortunately, modern sed supports comments throughout the file, but you still have to make sure that the first two characters are not ��#n�� unless you want to disable output.

Awk is much more useful as a programming language, although it’s still hobbled (at least as it appeared in 1991) by the necessity of often having to create it as a string in a shell script. This makes code formatting definitions in modern text editors very difficult, if not useless.

At the time it had some very advanced features. Awk apparently although by 1991 when this book was published the author was out of date when he said that “Associative arrays are a distinctive feature of awk��. Both Perl (1987) and Python (1990) had hash arrays and dicts by then. Given that this was before the Internet was commonly used to update installs, however, there likely were a lot of older versions of those languages on production servers. What really made awk distinctive was that it �徱��’t have indexed arrays: all arrays in awk were associative. PHP currently has similar behavior, but PHP wasn’t even in development: it started as a CGI tool for web servers, which �徱��’t exist in 1991.

The author discusses awk, nawk, and gawk; nowadays, it looks as though the features that were unique to gawk and nawk in 1991 have been merged into nawk (which is called “awk�� on my iMac).

At the time he wrote this, the length function apparently only provided the length of strings (variables in awk are both string and numeric, somewhat like PHP depending on the context in which they’re used); it appears now to return the number of elements in an array as well.

Perhaps the most debilitating relic of the times that afflicts awk is its lack of local variables in functions. Non-global variables can be hacked in functions, but the hack is to add every variable to the parameter list of the function they’re used in. I suppose this encourages programmers to keep the number of variables in any function low! But I have a tendency, in these times of overabundant memory, toward proliferating variables so that the name reflects the use of the variable at that point. This is obviously not optimal when all variables in a function must be declared in the parameter list.

For all its problems, the book does a good job of demystifying sed and awk, and may be worth it for that. While I’m unlikely to use sed as a scripting language (as opposed to as a command-line filter in pipes), awk is almost useful. Its biggest problem is the regular necessity of embedding it in a “real�� shell scripting language such as sh. If I’m going to do that, I might as well embed it in Perl, and if I’m going to embed it in Perl, I might as well write it in Perl.

In some ways, writing a script is like devising a hypothesis, given a certain set of facts. You try to prove the validity of the hypothesis by increasing the amount of data that you test it against.

Sean

3 reviews

March 3, 2019

This book helped me gain a better understanding of sed, awk, grep, printf, and Regular Expressions.

Although this book is quite comprehensive, it reminded me of the Manning "* In Action" books. All of the material in the book is covered through a series of examples. I was able to get a good grasp on the concepts and features by following along in the terminal.

One thing that didn't work for me was, many of the examples, especially in the sed section, involved "troff macros." I have never encountered these 'in-the-wild', so the those examples weren't very effective.

Overall, I would recommend this book to anyone who finds themselves repeatedly Googling things like, "bash remove xyz after match", or "bash print only 4th field after pattern", etc. After reading this book there are still times when I need to search for those types of things, but my ability to interpret and understand the Stack Overflow answers is dramatically improved.

Even if I will never write anything I am comfortable calling a "program" in AWK, at least I no longer need to keep scrolling anytime I see a Stack Overflow answer that spans more than one line.

career

Hugh Rawlinson

26 reviews6 followers

August 23, 2020

This is an interesting historical summary of two tools that I hope you get to mostly avoid in your career. Using their base functionality (or whatever portion of them that you care to memorise) as an individual in your own terminal could be a good way to speed up your own tasks. In my experience as a senior software developer in 2020, these are not appropriate tools to use in a team setting, and this book helped me understand why.

Of course, the tools the book describe are unrelated to my rating! I found the book to be very informative, but quite dry reading. Additionally, it felt as if the authors were presenting the tools as if they were perfectly usable and ergonomic, and made no commentary on the idea that at one point they served a common need well but changes to computing over the decades have rendered them less than optimal from an ease of use perspective. Indeed, the authors seemed quite content with some of the more esoteric incantations.

Good book, four stars.

read-about-working-as-a-programmer

Marshall

421 reviews83 followers

March 25, 2017

This is a pretty good overview of the sed and awk UNIX tools. I say tools because that's what people call them, but they're actually programming languages, especially awk. I often underestimate awk, but for line-by-line file I/O, it's practically as capable as Perl, but it's faster, so much simpler, and standardized. I don't even use Perl anymore. People often use it when they should really be using a general purpose language like C or a scripting language like Bash.

This book is a bit dated. Like 20 years old dated. It's still quite good. awk and sed haven't changed, so neither has the book. But the examples are definitely old. A lot of them are for parsing troff, almost to the point that you'd think that's all awk and sed are good for. The problem is, nobody uses troff anymore. And troff is so cryptic that it makes the examples unnecessarily hard to understand.

But I still learned some new things from it, and helped me fix some of my biggest sticking points with awk.

non-fiction technology

Bernie4444

2,462 reviews12 followers

December 28, 2022

This book has saved my bacon

I am here to tell you that on more than one occasion this book has saved my bacon. Several times in different environments I needed to use sed to correct data in flat files. Once I used sed to change the format and numbers when we wanted to match accounting numbers to a different system.

I haven't used that much awk. However, there has been an occasion to transfer awk programs from one UNIX to a different UNIX flavor. I found that the regular awk in this book was newer than the network in the other system. I still keep the book handy in case I get squeezed for time and have to manipulate files. However, I am learning more and more to appreciate PERL on those occasions.

Nathaniel Inman

42 reviews2 followers

August 3, 2022

Dale and Arnold show how sed and awk aren't just simple tools but powerful languages in and of themselves. They describe the historical context for sed and asks creation, provide simple to thorough examples of their use and inherent power. Unfortunately the landscape of software and use-cases sed and awk provide has changed thoroughly from when the book was released 20-30 years ago. More applicable uses to fit within a modern stack would have leveled the book up and provide more longevity. My page tabs are on 11, 60, 80, 87, 90, 144, 151, 159, 203, 209, 213, 258 & 260.

Andrew

43 reviews

September 3, 2018

A great book for getting familiar with a pair of UNIX mainstays.

A few parts are a bit outdated (especially references to different AWK implementations), but it is definitely worth a read to any regular Linux/UNIX user.

humblelibrary

Kerszi

227 reviews1 follower

November 16, 2019

Dosyć stare wydanie książki aż z 2002 roku. Przydałoby się jakieś wznowienie. Jednak bardzo aktualna. Niestety polska wersja jest bardzo ciężko dostępna w Polsce. Ledwo znalazłem w jakiejś bibliotece. Fajnie jakby Helion wznowił. Chociaż ebook.

Alan Gauld

12 reviews

April 6, 2021

Very dated now and relies too much on troff formatting codes in the examples - not many people use those nowadays! A new edition should update to JSON or HTML or somesuch for the examples.

But the core material is good, especially on sed which has few supporting texts.

Mehar Svln

11 reviews1 follower

May 13, 2018

Always knew sed & awk are powerful, but there isn't a good starter guide.

Finally bumped into this one, now i can kick start my sed & awk voyage

2018 programming

KerbenII

14 reviews

Want to read

October 12, 2019

mentioned in UNIX and Linux System Administration Handbook (4th edition)

Jeffrey Sung

95 reviews

February 23, 2017

Solid and gets the job done without being flashy, and without being the easiest reference.

Kinda like sed and awk themselves I suppose.

Bernie4444

2,462 reviews12 followers

October 19, 2023

This book has saved my bacon.

I am here to tell you that on more than one occasion this book has saved my bacon. Several times in different environments I needed to use sed to correct data in flat files. Once I used sed to change the format and numbers when we wanted to match accounting numbers to a different system.

I have not used that much awk. However, there has been an occasion to transfer awk programs from one UNIX to a different UNIX flavor. I found that the regular awk in this book was newer than the network in the other system. I still keep the book handy in case I get squeezed for time and have to manipulate files. However, I am learning more and more to appreciate PERL on those occasions.

Michael

19 reviews

January 21, 2015

I've heard it all, "I could do that using -e in PERL", "Why not formalize the script into compiled code" You know Python Blah blah Ruby blah blah.....The only answer I got is I LIKE AWK! So sue me, I LIKE AWK. It works, it's fun, it's way powerful and it's old school cool. This book is handy BTW. That said I think for the big examples used in this book, I would use C++ or Python. Still, some of my favorite computer moments have been long awk commands that did everything except wash my car. Oh yea, back to the book, it's really not necessary but good to have around and Dougherty really did a decent job. 3.5 stars that I'm rounding up just because I LIKE AWK.

computers non-fiction

Nick Black

Author��2 books871 followers

December 3, 2007

Rarely have I encountered a book so glaring in omissions, so infuriating in its jejune, apish self-congratulation, so replete with galling error. I should have never stolen this from Jonathan Wenger, but should you ever read this, buddy, I'll accept your late thanks. We should've burned this cowpie in the TA lab and played more Online College Jeopardy! GOMBIZ, Dale Dougherty.

It would seem the 2nd Edition has a wholly new author. While the elimination of Dale's cretinous oversight couldn't make things worse, I don't trust anyone who would take this book over.

Chris Maguire

147 reviews6 followers

October 17, 2012

sed & awk is an interesting history lesson; it is clear, well-written and interesting. I personally have no current use for either sed or awk, but I know have a very good idea of what they are capable of so I can consider them tools in my programmer tool belt.

If I needed to do some quick and dirty edits then I might use sed or awk, but for anything involved I would use programming high level language that I already know such as Clojure or Java.

Rustam

178 reviews

March 16, 2007

It's possible that knowing sed and awk is the most unglamorous skill one can possibly have as a software engineer. Nevertheless, I bought this book and have used sed quite frequently through out my career, because I do a lot of crude text parsing. It's a good book, and the colophon is even cooler than the O'Reilly vi book. Long live O'Reilly!

technical

Carl Hamlin

28 reviews

July 6, 2008

The book does what it purports to do, that being to lead the reader through the use of sed & awk. It's a very dry read, however, and doesn't really do much to stimulate interest in the tools it's ostensibly promoting.

linux

carl theaker

931 reviews52 followers

February 16, 2011

Handy reference used with 'The AWK Programming Language'
as it also contains sed.
Used awk for a lot of statistical analysis on large text files.
Handy once you get used to it.

And I need a reference as 16 minutes after you write this
stuff, you can't tell what it is doing.

non-fiction

Nathan

11 reviews2 followers

July 13, 2013

I've been working with UNIX for about 19 years but I've never taken the time to learn sed and I've only worked a bit with awk. This book has help fill in quite a large gap in my knowledge and skills.

Jerry Cheung

15 reviews2 followers

May 31, 2016

This book was a good introduction to what problems are well suited for sed and awk. Dougherty covered the concepts behind line editing, and builds on examples over the chapters. It's tricky to remember the less frequently used features, but the book also serves as a good reference.

Ivan Idris

Author��15 books26 followers

December 26, 2011

Sed and awk are Unix power tools. Actually Awk is more of a programming language. This book is a good tutorial on both tools. Apparently it is one of the most popular books on the subject.