Nixers Book Club - Book #1: The UNIX Programming Environment - Community & Forums Related Discussions
Users browsing this thread: 2 Guest(s)
|
|||
So, venam gave the idea on IRC and some users agreed that it would be nice to start a book club. Every week we read a couple chapters from a book and then debrief and discuss them on the thread until it is done, then we move to another book.
I suggest we begin with The UNIX Programming Environment. It's a classic, it's freely available on the Internet Archive, and it was written by Kernighan himself. The book has 9 chapters, an epilog and some appendices (one of them teaching how to use ed, the standard editor). It covers from shell scripting to C programming with UNIX system calls. It begins light, but reaches its apex on chapter 8 when we are supposed to use UNIX tools such as yacc(1) to write a language and a calculator (which has not survived as a UNIX tool, but exists on plan 9 with the name of hoc(1)). The main challenge of the book is to adapt the old UNIX examples and K&R C code to the modern systems we have nowadays. We can discuss how this can be done, what changed, how changed, and what remains the same. I think it is a good book that, although being old, is worth reading. We can do the first two chapters in a week or two (they are short). And then one chapter per two weeks. We can finish the book in two to three months. So, who is in? Let's start the nixers' book club! See you in a week to discuss the first chapters! (I'll bump this thread then) Deadlines:
|
|||
|
|||
Awesome idea. So start reading chapters 1 & 2 now and discuss in a week?
|
|||
|
|||
|
|||
This is a great idea. I hope it will force me to read a bit more on the backgrounds of unix. If we don't go too fast, I'm in.
|
|||
|
|||
I'm definitely in! It's a fantastic idea, and I'm especially interested in the opinion of the members of the forums about all the topics brought in the book. See you in a week, I assume next Saturday.
|
|||
|
|||
I'm in. d/led the book and looking forward to reading it.
|
|||
|
|||
It may be a good idea (at least for those on archlinux lie me) to install the package group
Code: posix Code: ed |
|||
|
|||
Great book to add to my digital collection! Hopefully I can find the time to follow along.
|
|||
|
|||
It's time to bump this thread.
How was your week? Have you have a good reading? Well, here is my review. The first two chapters introduces most of what we nixers already know. But it introduces a 80's UNIX system, the modern systems are somewhat different from the ones at that time It cites a lot of commands that are not commonly used nowadays, or that have no current relative, or that I simply didn't know they exist, like pr(1), news(1), talk(1), calendar(1), and units(1), a unit conversion program. I was introduced to most of them while I was reading. I think the book uses a good path of topic introduction. The book explains why octal dump was chosen rather than hex dump for the default notation for binary files. That's why the default binary dump utility, od(1), dumps in octal. But modern systems have hexdump(1), which I think is better than od(1). Quote:[...] the emphasis on octal numbers is a holdover from the PDP-11, for which octal was the preferred notation. Hexadecimal is better suited for other machines; the -x option tells od(1) to print in hex. Other thing I know because of the book is that ^D (ctrl-d) does not send or signal EOF, it's the lack of input that "signals" a end of file. So, for example, programs that read from /dev/null get EOF immediatelly, because reads from /dev/null always return zero bytes. Instead, ^D immediately sends the characters typed to the program that is reading from stdin. If I typed nothing, it ends the input. So I can type ^D in the middle of an input line for cat(1), and cat(1) will immediately print the line: Code: $ cat -u One of the differences from old UNIX and modern UNIX is that you can no longer open(2) or read(2) a directory. For example, the book opens a directory with od(1) to see its contents. This is no longer possible in modern systems. One thing I didn't know is that anyone can write on your terminal (check out the permissions for /dev/pts/0). But only the owner can read from it. Thus, other can send you messages with talk(1) and write(1). To prevent this, you could chmod the device, or use mesg(1). Other thing the book explains is the choice of not having file formats as a system feature. The format of a file depends only on how the programs interpret it. Quote:No structure is imposed on a file by the system, and no meaning is attached to its contents (the meaning of the bytes depends only on the programs that interprets the file). At the end of each chapter there's a list of bibliographical references. I still have to check the bibliography cited at the end of the chapters. |
|||
|
|||
I'll give my personal overall perception of the first 2 chapters.
It has a very light tone, the book is really focused on education and practicality. The writing is neither overwhelming nor condescending, it takes no assumptions Chapter 1 is sprinkled with a lot of trivia about how to set the terminal machine. From echo and non-echo, to control characters (see man 7 ascii) and what they do, how to configure the terminal properly (like for the TAB, the DELETE/RUBOUT, how to change @ at-sign and # sharp sign for kill and erase), syncing the baud rate, how to pause and continue the stream (ctrl-s ctrl-q), the break key, and much more. After reading chapter 1 you are left with a deep sense of how much the system was truly multi-user, that it's where Unix shines, this feeling stays true throughout chapter 2. You can really picture yourself on that system, mainly editing some book or paper, and everyone sharing the machine, having fun "outside of normal working hours" on /usr/games, talking from time to time using talkie-walkie system of over-and-out using write(1), trying to keep your "privacy" at the same time, customizing your personalized environment — your shell. If you were new you could use the learn command for computer-aided instruction. Then it dives into some command line tools, especially common text editing ones.
The author comments about ls uniformity Quote:The situation is improving Nope... it hasn't improved... Also about directories and `pwd` Quote:great directories have lesser directories... Also, for cat-v people, the authors clearly tutor users to use cat to print the content of file. Unlike today: File name was still limited to 14 characters. `/usr` was the directory that contains normal users of the system, unlike today. I like the way pipes are described as removing intermediary files. Chapter 2 of the book has more of a comparative tone to it, it focuses on the filesystem and how it differs from other systems of the time. It obviously has to start with the "everything is a file" explanation, and how these files contain bytes, how there's a single file type, and their filenames have no particular meaning but to the program reading them, but the file meaning could be deduced from magic number (file(1)). It also offers an explanation for the octal representation of some values due to it being the preferred notation of the PDP-11, which might have justified a few historical choices. Additionally, it has to explain the newline system, referring to how CR and LF would act on a physical terminal. Unix has no records, no count, no special code for end-of-file. cat today doesn't have the `-u` unbuffered flag anymore. There's a lot of reference to disk blocks as a counting mechanism for how much is left on the system. Today that's still more or less valid but with SSD and NVMe not so much. The du and df commands, for example would list the number of disk blocks, each of them being 512 or 1024. Interestingly, (or not so much), directories were also special files that could be read, but not written to. They had a specific binary format. After that there's a section about permission and security, how they affect differently files and directories, introducing the super-user (root), how to encrypt file, the specific files containing accounts and passwords /etc/passwd, and the set-uid big (patented by Dennis Ritchie apparently). Obviously, removing the "search" permission on /usr/games to turn it off during busy hours. With that, it moves to explain inodes, information node, with all the different date metdata associated with a file.
And how internally the system knows how to access files from inode, and how it explains the linking mechanism, if no inodes link to a file then this file is shredded. The file hierarchy of the system was much simpler than today. One thing that caught my attention was how the mount program was under /etc and not /bin. |
|||
|
|||
Alright, phillbush and venam covered pretty much all the important parts, but here are some quick notes from me:
Funny how some commands such as ls(1) and cat(1) are just as popular these days, while others are rarely seen eg. mail(1), news(1) and write(1). Having to use "o" and "oo" while communicating with write(1) is hilarious. What if we still had to do that on irc? ls(1) used to output just a single column of text, why was that changed? I had no idea that nohup(1) automatically called nice(1), but that's really smart. Funny quotes: Quote:Try ls /usr/games and do whatever comes naturally. Things might be more fun outside of normal working hours. Quote:The word "linefeed" is a synonym for newline, so this sequence is often called "CRLF," which is nearly pronounceable. |
|||
|
|||
I have a couple of notes as well. I tried to focus on the little details (kind of -- maybe not really).
Chapter 1 --------- One of the more interesting aspects of this chapter is how they describe the computing world of that era. Venam summed it up pretty nicely. This made much more impact on me when I first read it, but that was many years ago and I don’t remember anymore … :( Quote:If you don’t know what a file is … Funny, we’ll soon be making this same remark, now that files are no longer really visible on mobile devices. Imagine future kids, who grew up with nothing but a tablet, rediscovering the concept of a file. Quote:There is, however, an older editor called ed Cute, even when this book was written, ed was already considered to be “old(er)”. :) Quote:/usr/mary … /usr/bin Home directories and special directories thrown together? How odd. Chapter 2 --------- Quote:Everything in the UNIX system is a file. Ha, there it is! Take that, Benno Rice! (No, just kidding, he has some valid points in that talk.) To be honest, I’m not 100% sure what this sentence means in the context of this book. I suspect that their point is more about “everything is a file and a file is just a sequence of bytes”. A few paragraphs later, they tell you that other systems had more abstract concepts like “records”. Meaning, you probably were able to tell the system: “Give me record #5” and the kernel would do that. The kernel, not the program running in userspace! Stuff like that doesn’t exist in UNIX. When you want to get the fifth line, you have to figure it out yourself. This applies to pipes as well. I’m really not an expert on Windows, but it appears PowerShell works differently these days: Code: Get-Process notepad,mspaint | Stop-Process This is supposed to kill all running instances of notepad and paint. The way I understand it, this actually passes an object to the second command “Stop-Process”. Totally different from UNIX, where you explicitly have to do something like this: Code: pgrep xiate | while read; do kill "$REPLY"; done This approach certainly makes for a simpler kernel, but it puts the entire burden on userspace programs. Later in the chapter, they imply that the concept of “everything is a file” is extended on hardware as well. However: All the examples they give are simple character devices or block devices. Hardware like sound cards, where you have to negotiate a sample rate and sample format before you can just dump bytes into the device, is more complicated. This is where the “everything is a file” approach begins to become impractical. Let alone the USB stuff that Benno Rice talks about. Quote:ctrl-d A nice explanation of what Ctrl-d does. No, it does not “send an EOF character”, as many people believe. It flushes a buffer. Like phillbush, I originally learned this from this book and have since quoted it a lot. :) Quote:The x field in the permissions of a directory does not mean exection; it means “search”. Execute permissions on a directory determines whether the directory may be searched for a file. That’s remarkably confusing. When I first read that, I thought --x means you can only list directory contents (so you can search for a given file), but not the file contents. The opposite is true and the following sentences clarify it. (21-11-2020, 11:00 AM)phillbush Wrote: One of the differences from old UNIX and modern UNIX is that you can no longer open(2) or read(2) a directory. This is relatively “new”, IIRC. Something like 10 years ago, you could still run cat on a directory on FreeBSD. (21-11-2020, 11:33 AM)venam Wrote: cat today doesn't have the `-u` unbuffered flag anymore. It’s still there on OpenBSD: https://github.com/openbsd/src/blob/mast.../cat.c#L86 |
|||
|
|||
(21-11-2020, 11:40 AM)s0kx Wrote: Having to use "o" and "oo" while communicating with write(1) is hilarious. What if we still had to do that on irc? Oh thank god we don’t. %) We sometimes use write/wall at work, by the way. It’s rare, but it happens. Especially useful when everybody works from home, because if I send my colleague a note in a chat, they might not instantly read it. But they’ll sure notice a “echo hey hold on, we need to talk about that iptables rule first | wall”. :) Normally, I’d shout across the room. |
|||
|
|||
Nice book, and very approachable. The little details covered in the first two chapters are still pretty relevant (and fun to play with). The hints at principles underlying the system design choices are what I found most interesting. They talk about composability (pipes, fd redirection) and how UNIX builds on the notion of breaking down distinctions that separate data. By choosing not to put up safety rails and limits on access you leave the door open to creative use and new ideas.
I was disappointed to see on my machine (Debian, Linux 5.9.0-1-amd64): Code: od -cb . #won't read a directory! :( Quote:universal rule in UNIX ... anywhere you can use ordinary filename, you can use a pathname They mention a 14 character limit on filenames, but don't go into why exactly ... These days ext2, ext3, ext4 filesystems all allow 255 byte names. They talk about nohup and how it automatically calls nice ... Seems this isn't true anymore, at least in gnu coreutils: "This command does not change the niceness (i.e. the priority) of a command." Trivia: Quote:^ is an obsolete synonym for the pipe |
|||
|
|||
(21-11-2020, 11:33 AM)venam Wrote: If you were new you could use the learn command for computer-aided instruction.That is something I found amazing, and that unfortunately we haven't anymore on UNIX systems. But since modern systems are very different from one another, there wouldn't be a single standard learn(1) utility. An interactive learn(1)-like utility would be great to have. (21-11-2020, 11:33 AM)venam Wrote: It also offers an explanation for the octal representation of some values due to it being the preferred notation of the PDP-11, which might have justified a few historical choices.I couldn't find why octal was preferred on PDP-11. I know that octal is useful for dealing with characters (especially with UTF-8), but I don't know if that was the case for PDP-11. The POSIX manual for od(1) explains the differences between od(1) implementations and the proposals of changing it. Unfortunately, hexdump(1) (which is way better than od(1), especially with the -C flag) isn't POSIX. It is a BSD extension (but also available on Linux, I think). (21-11-2020, 11:33 AM)venam Wrote: cat today doesn't have the `-u` unbuffered flag anymore.OpenBSD's has. But -u is silently ignored in other coreutils implementations. (21-11-2020, 11:33 AM)venam Wrote: The du and df commands, for example would list the number of disk blocks, each of them being 512 or 1024.That is still valid on OpenBSD. It uses the environment variable BLOCKSIZE to specify the size of a block. (21-11-2020, 11:33 AM)venam Wrote: Interestingly, (or not so much), directories were also special files that could be read, but not written to. They had a specific binary format. (21-11-2020, 11:40 AM)s0kx Wrote: Funny how some commands such as ls(1) and cat(1) are just as popular these days, while others are rarely seen eg. mail(1), news(1) and write(1).I'm thinking on replacing mutt(1) with mail(1)/mailx(1), but i don't really like the ed(1) interface of those utilities. I have news(1) from plan9port, apparently those utilities have survived into plan 9. I haven't used it though. (21-11-2020, 11:40 AM)s0kx Wrote: ls(1) used to output just a single column of text, why was that changed?And still output just a single column when the output isn't a terminal. Compare Code: $ ls Code: $ ls | cat For comparison, plan 9 has two different file-listing utilities: lc(1) that list in columns, and ls(1) which lists in a single column. I adopted the plan 9 way and I aliased ls to 'ls -1' and lc to 'ls -C'. (21-11-2020, 11:42 AM)vain Wrote: Funny, we’ll soon be making this same remark, now that files are no longer really visible on mobile devices. Imagine future kids, who grew up with nothing but a tablet, rediscovering the concept of a file.Yeah, folders, hierarchical file systems and files will be something strange for future kids. The first time I read the book I found funny the authors having to explain what a file is. (21-11-2020, 11:42 AM)vain Wrote: Home directories and special directories thrown together? How odd.Yeah, I also find this odd. And this still occurs: my system also uses /home/ to place non-user stuff (the directory where the system upgrade is downloaded to). (21-11-2020, 11:42 AM)vain Wrote: That’s remarkably confusing. When I first read that, I thought --x means you can only list directory contents (so you can search for a given file), but not the file contents. The opposite is true and the following sentences clarify it.Until just recently I thought that you could access a file from a directory you have --x permission given that you have the full path of it. |
|||
|
|||
(21-11-2020, 12:55 PM)phillbush Wrote: 'm thinking on replacing mutt(1) with mail(1)/mailx(1), but i don't really like the ed(1) interface of those utilities.I actually set up classic mail(1) with fdm and msmtp on my system few days ago, partly because of this book. Maybe we could try a little group experiment or something? ;) |
|||
|
|||
(21-11-2020, 12:55 PM)phillbush Wrote: That is still valid on OpenBSD. It uses the environment variable BLOCKSIZE to specify the size of a block. I know it's still valid today, I was trying to reason that at the time it was more commonplace to count space left as blocks left and not "bytes" left. The filesystem normally eats up a block at a time, even if your file is smaller than a block, so this actually makes sense. |
|||
|
|||
(21-11-2020, 12:55 PM)phillbush Wrote: Until just recently I thought that you could access a file from a directory you have --x permission given that you have the full path of it. Wait, but that’s correct, isn’t it? Code: $ uname -rs With only “r--”, both OpenBSD and Linux give somewhat strange results: Code: $ uname -rs Code: $ uname -rs Note that OpenBSD’s output of “ls -l foo” is empty, but it works as “ls foo”. Probably does something like stat(2) on the files it finds and, if that fails, silently skips the directory entry. |
|||
|
|||
(21-11-2020, 03:11 PM)vain Wrote: Wait, but that’s correct, isn’t it?You're right, I was confusing stuff. For the next week, we're gonna read the chapters 3 and chapter 4 until page 114 (until (but not including) the section on awk). Then we'll read the remaining of chapter 4 and the chapter 5 on the following week. The next three chapters (3, 4, 5) deal with shell script. Chapter 3 explains the basic of shell. Chapter 4 introduces us to command-line filters, like grep, sed, sort and awk. Chapter 5 deals with advanced topics with shell programming and contains a lot of examples and exercises. We can then discuss the programs we create. 'til next week! |
|||
|
|||
(21-11-2020, 11:33 AM)venam Wrote: how to change @ at-sign and # sharp sign for kill and erase), syncing the baud rate, how to pause and continue the stream (ctrl-s ctrl-q), the break key, and much more. I found this great, trying to imagine what working on these terminals must have been like. An elegant OS... for a more civilised age. |
|||
|
|||
(21-11-2020, 11:42 AM)vain Wrote: A nice explanation of what Ctrl-d does. No, it does not “send an EOF character”, as many people believe. It flushes a buffer. I think it makes much more sense, and is more memorizable, when you think about it from the terminal-processing perspective. The EOT (end of transmission) control character is meant for your terminal, it's to control your terminal and not to control the processing of a file. It tells it you are done typing/editing and to send/flush whatever you got to the Unix server for processing. Once you make that distinction, it's much clearer. |
|||
|
|||
Most has been said but some of my observations (even if redundant):
I remember some file system hierarchy 'wars' but in this case, I must confess that older is really not better. On my systems, only users within the same group can ever message to my terminal, the last triplet of ls -l /dev/myterm is always --- cat silently ignores the -u flag because it is already unbuffered. I can not reproduce the buffered behaviour. Nor can I think of a sensible use-case so it's no great loss, really. I think this was the first time I ever made files with ed (the poems, in particular) and then went back to correct typos with sed. Great fun for bragging rights but I can see why people wrote newer editors. Apparently only hard links existed at the time. I have never really been able to understand the reason for having two types of links but at least the text gave me one clue: hard links can't be made across devices/mount points (soft links can). Is this really the only reason? We 'learn' to type ctrl-d when we're on a new line and the buffer is still empty. Used that way, it behaves like an EOF (or whatever) char would do and we get confused. Had never realised you can type ctrl-d half-way a line and then continue typing the same line (in the cat example). I can see this being useful in google meet and etc., or even on irc (push half a line with ctrl-d to signal you're typing and when you finally press enter the whole line is output) |
|||
|
|||
(21-11-2020, 01:15 PM)s0kx Wrote:phillbush dateline="<a href="tel:1605974128">1605974128</a>' Wrote: 'm thinking on replacing mutt(1) with mail(1)/mailx(1), but i don't really like the ed(1) interface of those utilities.I actually set up classic mail(1) with fdm and msmtp on my system few days ago, partly because of this book. Maybe we could try a little group experiment or something? ;) I was also planning this having just made the switch in aerc from imap to isync with a maildir |
|||
|
|||
It's time to bump this thread.
How was your week? Have you have a good reading? The first exercise of the chapter present us with three different ways of passing a text to a utility. And gives us the following quote about the “cat abuse”: Quote:Over the years the redirection operator < has lost some ground to pipes; people seem to find “cat file |” more natural than “<file”. One thing I hadn't noticed is that when no files match a pattern with metacharacters, the shell passes the string on as through it had been quoted. A digression on echo. I once posted a thread about this very text and how most implementions that use getopt(3) on other utilities should not use it on echo(1): https://nixers.net/Thread-A-digression-on-echo--2311 The chapter also presents pick(1), a program to interactivelly select entries from stdin. pick(1) will be implemented in later chapters. Modern implementations of pick(1) exist, and I recommend you to use: https://github.com/mptre/pick By the time of the book, there was no $() for using the output of a command, only ``. I have no idea how they would do nested $() with backticks. Redirections can occur anywhere on the command. Code: $ echo Hello World >/tmp/junk Code: $ >/tmp/junk echo Hello World This was also the case for environment variable assignment. But since it interfered with dd(1), variable assignments are only accepted at the beginning. (dd(1) (and find(1)) is such an alien command with its different syntax) The chapter doesn't explain about the <> redirection. I don't know if this redirection existed back in the day, but it is very handy. I use it to make xnotify(1) read and write on the standard input, which is a pipe, so it doesn't get EOF when a program closes this pipe. On the chapter on filters, I was introduced to comm(1), which I didn't know about. The three grep(1)s are also introduced. IIRC, plan 9 only has one, with the structural regexps. |
|||
|
|||
I have a lot of similar points of phillbush.
Chapter 3. This chapter, from my perspective, gives a nice overview of the shell as the program that stands on the other end of the terminal. Through the language you get the clear idea that you are sending "request", as they clearly say, to a Unix server and that these will be interpreted and come back, there's a sentiment of distance which we don't have toay. The commands have a specific format, like functions you want to be executed. They are separated by semicolon if on the same line. Then once you are done you press RETURN to send the request to the shell. You can group these commands using parenthesis, you can combine them using pipe, which takes precedence over the semicolon, you can even add a program in the middle to take a snapshot of the stream (like tee for example). You can use the & terminator to tell the shell you don't want to wait for the program to end before getting back control of the shell, to let it execute in the background instead. This & can be used anywhere on the command line, even in the middle. They give a nice example, to get notified of when the tea finishes brewing after 5 min, and to get notified with a bell that will sound at their terminal (\a control character). It's interesting because I do the same but with the `at` command instead. Then the chapter dives into the composability of the shell, especially the IO operations. It goes in detail about each of the special characters interpreted by the shell: < > | ; & One interesting thing is how it allows non-obvious use of redirection, like: Code: $ > junk echo Hello Code: $ echo Hello > junk And talking about less obvious, they say that people seem to find "cat file |" more natural than `<file`. (Also mentioned by phillbush) Another special character is the kleene star, or `*` metacharacter, along with the other regular expressions (table 3.1). Code: echo * And they explain that names that start with dots are ignored to avoid problems with "." and "..", which gives rise to dot files. These metacharacter can be escaped, there's a lot of ways to do that and they go in depth. One of the exercise points the issue with filenames starting with '-', that could be confused as command line arguments. And also points out that filenames cannot contain /, though trickily you could use a utf-8 homoglyph. One thing I wasn't aware of is the {..} expression to run command in current shell, which isn't really practical but good to know. Another weird behavior I wasn't aware of, is that the metacharacter matching would match literally if no file is found, as though it had been quoted. (Seems like phillbush was also surprised by this) Comments are inserted using `#`, that's if the word begin with #. There's a whole section about echo and the argument if it should print a newline or not and what nothing means. If it should add \c or \b instead. In all cases we can use echo -n to suppress the new line (not portable), or \c if POSIX. Additionally, there's a lot of small hints at how easy it is to customize your environment, and you're encouraged to do so. $PS2 is mentioned as the multiline input prompt, which can be modified to taste. The shell being a program like any other, it can be customized, and take arguments and inputs which are shell scripts. Which starts a whole new section about how to write them. These scripts need to be in the $PATH, which introduces the concept of the .profile as the "config" for your shell. Positional parameters/arguments $1 up to $9, and $0 for the name of the program, along with `$*` for all arguments, are also talked about. Writing personal programs from scratch is hard, what's encouraged is to combine pre-existing ones to create others. So the book then goes into multiple utilities and how to integrate them together in scripts with backticks/backquotes (not $() like phillbush mentioned) and IO redirections. NB: grep -y argument is now obsolete and synonym for -i. (Case independent matching) The command `pick` is used a lot, to present arguments one at a time interactively. To facilitate writing scripts you can use variables, like any other programming language. Here they discern the temporary ones and the ones exported globally. You can print them with `set`. Shell scripts also have the ability to load other scripts (like an include), using the . (dot). More shell script IO is introduced. There are 3 default file that are always present, the file descriptor for standard input, standard output, and error. You can redirect one unto the other if needed 2>&1 for example. The heredoc can be used to say that the input is here instead of a fie. In continuation with the shell as a programming language, we see how loops can be written in both long form and compressed form (single line). Code: for var in list of words There's then a final bundle example script, which I personally find truly horrible and bad at explaining the concept reviewed. Still in sync with the idea that the shell is a programming language, and that the environment is very personal, there's a mention of other shells that can be used instead like the C shell. Chapter 4 first part. The chapter focuses on text editing and filtering, what it calls data transformers and programmable filters. There's a small introduction to regex, which probably is better to learn by looking at more modern books. It then shows how this is applied within grep. At the time grep was also distributed in variant programs, egrep and fgrep, which today are the same as grep -E and grep -F, respectively. These variants are deprecated, but are provided for backward compatibility. Other common editing tools are talked about:
And a mention of the dreadful dd, which dates back from tape processing and doesn't follow the convention mentioned. It continues with this example: Code: cat $* | Which I think is also a horrible example because it uses the command 5, which is a custom command introduced before in the book. It's probably a bad idea to alias your command as a number. Finally, sed appears with its usual syntax of matching and acting on match. Code: sed '/pattern/ action' It's talked as an ed on the command line. Trivia: They emphasize that using "sed 3q" to print the first few lines of a file is easier than having a separate program called "head" because it's more instinctive and easier to memorize. Well, maybe in those days when ed was so popular, but definitely not today. There's a lot of good sed examples and table 4.2 has a sed commands summary. Code: sed '/pattern/q' till the first line that match and quit |
|||
|
|||
You guys already made a lot of interesting points. Here’s a couple of notes from me.
Quote:What happens if no files match the pattern? The shell, rather than complaining (as it did in early version), passes the string on as though it had been quoted. I’d love to know the reason for this decision. If there are metacharacters in a string and it doesn’t match anything, why not expand it to “nothing”? This is what “set -o nullglob” does these days. That way, I can be sure that I’m only dealing with existing files. Quote:The UNIX and the Echo Lovely little story. When the book was written, the shell didn’t have a “printf” builtin yet, it appears. According to Wikipedia and gunkies.org, it was only introduced in 1990. GNU coreutils defines “echo” as “display a line of text” today. That’s still slightly ambiguous (depends on “line” being “something followed by \n”, which some people still don’t agree with). OpenBSD’s manpage explicitly states that a newline follows, so that’s much clearer. Quote:chmod +x $1 I have had many heated debates about this topic. Personally, I strongly dislike that they don’t use the quoted version "$1". They omit quotes throughout the entire chapter. It probably didn’t matter back in the day, but IMHO it’s just wrong: $1 is not what has been passed as first argument -- it’s the split version of it. Is this really what the user wants most of the time? Or is it "$1", so it’s exactly what has been passed to the script? The counter argument is in this case: If you use "$1", an empty string will be passed to chmod -- instead of no argument at all. This breaks some of their examples later. Quote:venam on IRC: Ohhhhh, yeah. I was thinking the same thing. (It didn’t strike me as much when I first read the book a couple of years ago. Hmm.) |
|||
|
|||
It's time to bump this thread.
How was your week? Have you have a good reading? Second half of chapter 4 introduces us to awk, a great programming language. That very section of the book motivated me to read another book from Kernighan: The AWK Programming Language, which motivated me to solve the Advent of Code puzzles in AWK. It's very easy to parse input and do simple algorithms on awk, and I recommend anyone to learn it and to try to implement your algorithms in awk before writing the programs in C. The final program of this chapter is an implementation of calendar(1), an application that is not available on Linux but exists on BSDs, the implementation very straight forward. Chapter 5 teaches how to do actual programming in shell script by implementing several examples. cal(1). The first program is a wrapper around cal(1), this example teaches about the case and set built-ins, and introduces us to the basic of shell programming. It is a very straight forward example but that doesn't apply very well to UNIXes of today. In most implementations, you can "cal october", but "cal 10" still prints the calendar for year 10. which(1). The second program is an implementation of which(1), a command that reports which version of a program will be executed when calling it. This example introduces us to test(1), if, and exit status. One implementation is to loop over the directories named in PATH, searching for an executable file of the given name. The loop should use sed(1) for generating the list of directories, and test(1) for testing whether there is a file and it is executable. Quote:The test command is actually one of the clumsier UNIX programs. The author is probably referring to the different implementations of test and how they differ from one another. There is also a test builtin in most shells, with a different syntax from the test(1) program. test(1) is unusual because its sole purpose is to return an exit status. It produces no output and changes no file. A few commands, such as cmp(1) and grep(1), have an option -s that causes them to exist with an appropriate status but suppress all output. watchfor(1) and checkmail(1) Those programs introduce us to loops. There is probably a more elegant solution than looping and sleeping some seconds, like using inotify, on Linux. nohup(1) That is a very interesting example, that implements nohup(1). It teaches about signals, trap built-in, test -t (to test whether the standard output is a terminal), nice(1), and the exec builtin. Phew, that is a lot of stuff to be introduced in a single program. Quote:The command sequence that forms the first argument to trap is like a subroutine call that occurs immediately when the signal happens. When it finishes, the program that was running will resume where it was unless the signal killed it. Therefore, the trap command sequence must explicitly invoke exit, or the shell program will continue to execute after the interrupt overwrite(1) That is another interesting example, and one that should be required by POSIX. It is a very useful program, but because of how shell works, its imput is somewhat clumsy. zap(1). Zap is a simple implementation of pkill(1) or killall(1). It uses pick(1), that will be implemented later. It introduces us to the IFS variable and teaches a lot more of signals and how to kill (send them). pick(1). One of the nicest examples in this chapter. Pick lets you select some items from the input or arguments and output those that were selected. There is a newer implementation of pick(1) that uses fuzzy finding and I recommend anyone to use: https://github.com/mptre/pick This program introduces us to using /dev/tty for reading and writing to the terminal rather than to the stdin or stdout. news(1). This is a simple program to get the newest files from a directory. There is a news(1) in plan 9. get(1) and put(1). This is a simple implementation of the SCCS version control system. Particularly, I did not like those examples, as those programs are clumsy compared to today's version control systems. But maybe they made sense back in the days. Interestingly, GNU has a program called CSSC ("Compatibly Stupid Source Control") to convert the "compatibly stupid" SCCS into CVS. |
|||
|
|||
Rest of Chapter 4
================= Alright, a basic introduction to awk. Probably not a lot of new stuff to learn for must people around here. A powerful tool. My usage has declined over time, interestingly. I often need more than just “process text”. Chapter 5 ========= cal --- The script uses “set `date`” on two occasions. IMHO, this is a very different style of programming than what we (I?) have today. Code: 0) set `date`; m=$2; y=$6 ;; It makes for really short and effective code. If you know what the magical “set” does, it’s pretty good. Most modern scripts would probably use “sed” or parameter expansion to extract the current month and year: Code: now=$(date) Much more clunky. But then again, it’s more explicit (explicit is good, right, because it makes it easier for your successors to read your code) and it doesn’t clobber the original argument vector. It’s probably hard to get this point across: To me, this is “typical” programming style of “those UNIX people”. Concise, “clever”. Maybe this will become more clear when we get to the C examples later on. (I also noticed it a lot when reading “The C Programming Language”.) 5.2: if ------- “if” runs a command and checks its exit status. Such an important point to learn when writing shell scripts. Also highlights how shell scripts really only are the glue between other programs. 5.3: ls in a script ------------------- Argh, they did it. It works in this case, but there are pitfalls. 5.5: overwrite -------------- An academic example to teach more about the shell, isn’t it? Is it really worth it to have this as a command? Just redirect to a temporary file and do the “mv” manually, I’d say. 5.7: pick --------- What always bothered me about pick is that there’s no undo. No way to go back. In 2011, I made a pick on top of a “vipe”: https://www.uninformativ.de/blog/posting...NG-de.html (German article, but you’ll be able to follow the code regardless. Be sure to open the screenshot.) I never really used it, though. Doesn’t fit my workflow … 5.8: news --------- I don’t know. This … this is just horrible. Using ~/.news_time as a terminator in a list? And then even ' not found'? Using “set” to process the output of “ls” … Code: #!/bin/sh A couple of drawbacks:
But yeah, if you insist on including the cat, suddenly the original is clever again, because “ls -t” neatly sorts the files by time. Still, I think it’s an example of a terrible shell script. I’d rather make the script longer instead of relying on hacks like “set X`foo`”. 5.9: get and put ---------------- Did they just implement a basic version of RCS? At least they mentioned the existence of SCCS at the very end of the chapter. :) |
|||
|
|||
(05-12-2020, 11:12 AM)phillbush Wrote: The author is probably referring to the different implementations of test and how they differ from one another. Is this still the case? I was always under the impression that “test” was quite portable. I must admit, though, that I never compared the various programs in detail … |
|||
|
|||
Chapter 4 end (awk section)
It gives a good overview of AWK as a "pattern scanning and processing language". They introduce it like a sed with a more C-like syntax. However, in my opinion, it brings back a sort of record-like view of the file instead of a byte-like view. This is the impression I got and from the way it was explained it seemed to make this association with record systems. It makes a lot of sense when you think about it like that, especially at the conclusion of the chapter when they talk about FFG, flat file system database. Here's the paper about it and a sneak peek: Quote:It consists of a single, unformatted text file in which each line corresponds to a record. K-1 occurence of a separator character divide each record into k variable length fields; The chapter is filled with examples that slowly introduce the AWK inner workings. The first example sells it and shows it's main behavior, you can make an egrep clone: Code: awk '/regex/ { print }' filenames The big difference is in how it splits the file as records and those records into fields, so this is then introduced. You can specify the separation character(-F), the fields themselves, use special variables related to this perception of the file, and has pre and post record processing hooks (BEGIN and END). Quote:$1, $2, ..., $NF (Only field variables begin with $ and needs to be escaped if wanted explicitly with another $) Another advantage of AWK is that it has C-like functions, such as printf, substr, etc. (table 4.5), variables, conditional statements, loops, and associative arrays. Thus, it's a language on its own. Code: awk '{ printf "%4d %s\n", NR, $0 }' Here I wasn't aware of the fold(1) command to wrap line, normally I use fmt(1). So that's a new finding. The chapter culminates with a last calendar example, which isn't particularly clean in my opinion, and as phillbush said, today we don't need the solution they gave. Chapter 5 This chapter is about shell programming, with a learn through example approach. It emphasize that shell should be used for quickly writing solutions — being productive — mostly for personal use and customizing your environment, by making program cooperate together instead of rewriting things from scratch. It starts by making the point that the shell is a programming language like any other, and not only an interactive prompt. However, it's not in denial that the design of this language is clunky, mostly shaped by history. The selling point of the shell is that it gives direct feedback, and so it allows interactive experimentation. Probably something that was uncommon during these times. The chapter then dives into multiple examples that show the syntax of the shell. Some weird features like the shell built-in variables in table 5.1. Code: $#, $*, $@, $-, $?, $$, $! It warns that the pattern matching in the shell isn't the same as sed and awk, so beware when using case match. (table 5.2) Code: case word in Interestingly, there's a lot of discussion about the efficiency of different ways to do something, especially in conditions. They advice using ":" built-in instead of calling true as an external command, or to rely on case match instead of external calls, especially within loops. Calling a program was something you had to think about. The test command is introduced. Code: test -w # check if file exists And exit status are introduced too. There's a whole section about how to install scripts in your user PATH or globally, and how to know "which" version of the command you are using. Another section is about variable syntax and extra possibilities you can have by using special characters inside it (table 5.3). Code: ${var?error!} The next section is about trapping signals and handling them to cleanly exit, especially in long running programs. It mentions these popular signals: Code: 0 shell exit An equivalent of nohup would be: Code: (trap '' 1; command) & It shows a fun example by creating a "zap" command combining pick(1) to interactively kill a process based on its name. One thing that caught my attention that I didn't know about was that you don't have to give for loops a value, it by default loops over `$*`. There's a section talking about the $IFS in the shell, the field separator and how it can be overridden to allow reading files in different way. Which reminds us of AWK view of files as records. One point about the read shell command, apparently at the time it only worked on standard input. So, the following wouldn't work, but it does today: Code: read greeting </etc/passwd One of the nicest example is the pick command: Code: for i In the news command example you have the well-known clunky way to add a char on the left to avoid empty comparison. The final example is quite interesting, a CVS system with "get" and "put" as commands and using diff. It takes us back to how annoying keeping track of changes must have been at the time. I get the same impression as phillbush on this. |
|||