Nixers Book Club - Book #1: The UNIX Programming Environment - Community & Forums Related Discussions

Users browsing this thread: 2 Guest(s)
venam
Administrators
Chapter 6 and 7 introduce C programming with the same philosophy as previously in the book, namely combining program functionalities, using the hard work done by someone else. However, here it focuses on things other than text such as monitoring files and inspecting metadata.

Overall, these two chapters were longer to tackle because of the amount of examples. Many of them have to be modified to work in our current environment, especially some missing headers (especially <unistd.h> and <stdlib.h>) that aren't included by default anymore. The style of programming in C is also very different with the definition of the function parameters put after the function head, and the return type being skipped when it is an int. The lack of curly braces around single statement loops and conditions bugs me, for(;;) instead of while, and the use of exit instead of return. Though it clearly mentions that return can be used in main and will be the exit value, I guess it's to make it more explicit. The programming style feels a bit "hacky" compared to today's stricter standards. It's ironic considering these are the main authors, that shows how people reinterpret things with time. Maybe for the best, maybe for the worst.

C is introduced as the standard language of UNIX systems because the kernel and user programs are written in C. I'm not sure it's a valid reason, but that makes the system coherent.


Chapter 6

The first example, vis, a cat that shows non-printing characters as octal, was justified as useful because sed wasn't able to handle long input. That's interesting, today it certainly can.

It's fun to see the authors emphasize why having macros and functions is important, even though you could write them yourself, because they've been debugged already and are faster. Maybe people argued about writing themselves these functions, I'm not sure.

In the first example, vis, the string.h header is missing for it to compiler on newer C lib. This is something that is redundant throughout the exercises, it seems a lot of the headers were included by default at the time.

Argv and argc are introduced as the command line arguments, FILE IO too along with the idea of the default file constant: stdin, stdout, stderr. There used to be a limit of 20 files opened at the same time, which is extremely small considering today's standards.

They recommend using getopt(3) but don't in their example and go on to parse the arguments manually.

In an example we implement a screen-at-a-time printer because according to the authors there was no standard program to print per screen because UNIX was initially paper based. Wasn't there less/more at the time? Also 24 lines terminal :D .

There's a discussion about what features to include in programs or not. There's no definitive answer but the main principle is that the program shouldn't be hard to debug, don't do too many things, and features should have a reason to be there to not lie unused.

We go on to rewrite the pick command from the previous chapter, but in C this time. Note the use of stderr to ask the user a question, and then select the output on stdout.

I've also noted a weird way of having external function signatures right in the middle of the functions using them when we know we'll define them later:

Code:
FILE *efopen();

There's a section about linting using lint(1), and debugging core dumps using adb and sdb. Which they call arcane but indispensable. To me these both look similar to gdb, which isn't any less arcane.

zap gets rewritten in C because of the problem with spawning too many processes. It uses popen(3) and calls ps to parse its output.

idiff introduces mktemp(3) and unlink(3).

Reading environment variables is taught to be a way to not have to type the same arguments all the time, this is done through getenv(3).

Chapter 7: Unix System Calls

System calls are the lowest level of interaction with a UNIX OS, stuff like IO, inodes, processes, signals and interrupts.

The concept of file is polished a bit here, we define them through the interaction with a file descriptor. By default we have 0,1,2 open and new file descriptors will increment from there. The shell is allowed to change the meaning of the descriptors, which is what allows us to redirect outputs from one to another.
Anything is a file descriptor and on them you can read and write. If we write or read one byte at a time it means it's unbuffered, otherwise it's advised to use a number that is equal to the size of a disc bloc, 512 or 1024 (the BUFSIZ constant).

We can see the effect that using a wrong buffer size has in the table shown, reading a 54KB file was extremely slow if not matching the bloc size.

Then there's the question of what happens when many processes read or write to the same file. I guess this needs to be mentioned in a multi-user system.

We write a program called readslow that continues reading even when read returns 0 (no byte left EOF), which is kind of like tail -f.

The file lifecycle is talked about through: open, creat, close, unlink. Plus their 9 permission bits rwx for group,everyone, owner. A newly created file is said to be directly in writing mode, but I think it depends on the flag given during creation.

In most examples, I find myself adding missing headers. The next one introducing error handling was giving me troubles with sys_nerr and sys_errlist which were not exposed in glib so I used the err.h BSD extension.

Code:
#include <err.h> // BSD extension
#include <errno.h>

The errno is not reset to 0 when things go well, you must reset it manually. But signals are reset automatically, yep things are a bit hacky.

Another example teaches us to seek in a file, and the next one to walk a directory.
At that time the directory was readable directly as a normal file, these days it isn't and we have to rely on a system call.
The modern way to do this is getdirentries present on the BSDs, and a few other systems. Or if we want to have the same structure, struct dirent, as in the book we can use readdir(3) in sys/dir.h.

Now that we have an understanding of some low level structure, the book goes into inodes and the stat system call to inspect them.

Then it goes into spawning processes and signal handling.

We can either use system(3) or the exec family of system calls. These last one would overlay the existing program. This is part of the lifecycle of processes.
And so the technique that allows us to spawn completely new processes is the fork() system call, to regain control after running a program, split it into two copies. The call returns the ID of the child to the parent, and 0 to the child.
The parent can wait for the child, and will receive its exits status.

Fork and exec will let the child inherit the 3 file descriptors, so they have the same file opened. And so we think again of the issue with having multiple process reading or writing to the same file, and the consideration of flushing to avoid issues.

That's when we are shown how we can save these file descriptors into variables and disconnect and reconnect them as we see fit. The following idioms is interesting, closing and duplicating in the lowest unallocated file descriptor:

Code:
close(0); dup(tty);

Signal handling was messy, it was still using signal(2) instead of sigaction(2), so the signal handler was reset everytime it was called so the first line of the handler was always to set it back again.
That reminds me of something I wrote.

One thing I wasn't aware of are these for non-local jumps:

Code:
jmp_buf sjbuf;
setjmp(sjbuf);
longjmp(sjbuf, 0); // non local goto

These were 2 fun chapters, giving insight into how writing C at the time was done and for which reasons.


Messages In This Thread
RE: Nixers Book Club - Book #1: The UNIX Programming Environment - by venam - 12-12-2020, 11:39 AM