Making the best CLI programs - Programming On Unix
venam
(This is part of the podcast discussion extension)

Making the best CLI programs

And here are the episode links

Link of the recording [ http://podcast.nixers.net/feed/download....11-251.mp3 ]

Unix is known for its set of small command line utilities. What are your ideas and features for the best CLI programs. What makes a complete utility.

Show notes
---
https://en.wikipedia.org/wiki/Command-line_interface
http://eng.localytics.com/exploring-cli-best-practices/
http://harmful.cat-v.org/cat-v/unix_prog_design.pdf
http://programmers.stackexchange.com/que...-arguments
http://www.cs.pomona.edu/classes/cs181f/supp/cli.html
http://julio.meroh.net/2013/09/cli-desig...sages.html
http://julio.meroh.net/2013/09/cli-desig...ap-up.html
http://www.catb.org/~esr/writings/taoup/...11s07.html
http://www.catb.org/~esr/writings/taoup/...11s06.html
http://www.catb.org/~esr/writings/taoup/...10s01.html
http://www.catb.org/~esr/writings/taoup/...10s05.html
http://www.catb.org/~esr/writings/taoup/...11s04.html
Writing manpages
Perl6 Parsing command line args
Python Unix CLI template
Docopt
Getopt history

Music: https://dexterbritain.bandcamp.com/album...s-volume-1
jkl
A complete utility is one that does exactly what it says with no extra whistles and bells. (Unlike e.g. most GNU tools.)
ninjacharlie
I kind of wish coreutils had more standardized flags (i.e. -v will always be version, -V would always be verbose, -s would be silent or quiet so there would be no -q flag). Apart from that, the important thing is that the output is easy to parse with grep or cut, there is no extraneous debug information, errors go to stderr, and overall it just works nicely in a pipeline.

Also, I agree with jkl's point that the GNU tools have way too much extra garbage built in. `ls` (for instance), does not need a sort flag. That's what `sort` is for.
z3bra
BEWARE: big post ahead.

TL;DR: act as a filter, easy to parse > easy to read

Command line interfaces (for now) only have to ways for interfacing with the user: stdin and stdout. Both are immutable, as in, what's written to it can't be erased.
To me, a good CLI program should act as a data filter. Taking a continuous stream of data on it's input, and output the same data, "filtered". All kinds of metadata and "behavior tweaking" methods (such as flag) should be given as an argument to modify the default behavior of the tool.

When you write a tool, you are supposedly aware of what the filter should be, so the hardest part is to figure out what the data should be. Here are a few hints I've gathered,
either by taking advices from other people or failing miserably (sometimes both!):

input:
0. Do not mess "data" and "metadata"

Let's take "grep" as an example. Its default behavior is to "filter out" all lines where a PATTERN doesn't appear within a data STREAM.
Here are a few "misimplementations" of how "grep" could have been written:
Code:
$ echo PATTERN | grep FILE0 FILE1
$ echo FILE0 FILE1 | grep PATTERN
$ printf "%s\n%s\n" "PATTERN" "FILE0" | grep

The last one is over-exagerated, but the first two could have been valid implementations. They both suffer the same issue though: they don't make the tool act like a filter (or to reformulate it: they can't operate on STREAMs).

1. Limit the number of flags

I often hear people complaining about one-letter flags limiting you to ONLY 26 flags. But I personally think that 26 flags is already too much!
If your tool can handle this much behavior changes, it is likely too complex, and certainly should be splitted in multiple different tools.
Take "ls" as an example. POSIX specifies 23 flags for it. GNU ls has 40 flags! (not even counting --long-opt only flags):
Code:
$ ls --help | grep -o "\s-[a-zA-Z0-9]" | sort | uniq | wc -l                  
40

That is way too much. For example, the flag -r could be dropped:
Code:
$ ls -1 -r
$ ls -1 | tac

The flags -g and -o don't make much sense either, as they filter the output internally (removing either the OWNER or GROUP column).

These flags were indeed added to help the user, but IMO they get in the way as they make the tool more complex to use (more docs and more code can't be a way to go for "simpler" tools)

2. Assume data will be messed up

As much as possible, avoid rigid protocols and format for your input data, and try to simplify your input as much as possible.
If you write, for example, and MPD client that would read commands on stdin, prefer simple inputs:
Code:
$ echo "volume 50" | ./pgm
$ ./pgm <<COMMAND
{
    "command": {
        "type":  "set",
        "key":   "volume",
        "value": "50"
    }
}
COMMAND
One might argue that JSON is easy to parse, well defined and such. That doesn't make it a nice input format, really.

output:

0. Prefer "parseable" to "readable"

Make your output easy to parse, rather than easy to read. Most unix tools work with lines, and play with separators (the famous $IFS variable). Try to format your output as such (tabulations make a great separator!).
Prefer good documentation to verbosity.
Consider a program that would print out the weather in different city. It could have the two following outputs (reading city names on stdin):

Code:
$ printf '%s\n%s\n' "Mexico" "Oslo" | weather
City: Mexico
Wind speed: 10mph
Humidity: 17%
Raining probability: 2%
Temperature: 35°C

City: Oslo
Wind speed: 22mph
Humidity: 44%
Raining probability: 37%
Temperature: 8°C

$ printf '%s\n%s\n' "Mexico" "Oslo" | weather
Mexico    10    17    2    35
Oslo    22    44    37    8

The second output is pretty "hard" to read from a user POV. Now try to write a quick tool that would display only the city name and raining probability...
As I'm a cool guy, here is the one for the second output:

Code:
$ printf '%s\n%s\n' "Mexico" "Oslo" | weather | cut -f1,4
Mexico 2
Oslo 37

It's easier to format RAW data than strip down a formatted one.

1. Your output is an API

Think well about how you format your output data, because some people will rely on it.
If you take the example above, some people will have a hard time if you suddenly decide to swap the temperature with the humidity value.
Be consistent, document your tools and remain backward compatible as much as possible.

2. Be stupid

Don't try to look cool by checking wether the user is writting to a terminal or not, and change to output accordingly.
This will only confuse the users that will not understand why their parsing tools can't parse what they usually get.

The simpler, the better.
jkl
(23-05-2016, 10:08 AM)ninjacharlie Wrote: Also, I agree with jkl's point that the GNU tools have way too much extra garbage built in. `ls` (for instance), does not need a sort flag. That's what `sort` is for.

My favorite example is GNU's "true" which can - if compiled correctly - return "false". Including --help and, most important, --version. Because, you know, a different version of "true" can lead to different results.
pranomostro
@ninjacharlie:

You are kind of wrong.

If your programs have similar flags, you should consider making it an own utility.

One example for this is the option -h: it is used in du, ls and df, but can be abstracted into one
utility (see git.z3bra.org/human).

Of course, this is not always possible. Your examples with -v (for version), -V (for verbose) -h (help, -s (silent) make sense.
But I think that duplication of flags always carries the danger of replicating features.

I heard an interesting idea a while ago, namely that 3 standard files are not enough.
stdin, stdout and stderr all have distinct purposes.

But very often stderr is misused for status information which doesn't have anything to do with errors.

The author gave a proposal of two other standard file descriptors: usrin and usrout, which can be used for user in- and output
while the program is running. If somebody here knows pick(1), he/she knows the truth (pick lets you select the lines that get sent
down the pipeline):

Some ugly hacking with the tty has to be done.

Here is the medium rant (only read the first half, the second one is about images in the terminal): https://medium.com/@zirakertan/rant-the-...45bb29dac8

I liked the first half. The second is totally infected by "Integrate everything."

Edit: also, z3bra is right with everything. As always (sigh).
z3bra
(28-05-2016, 10:50 AM)pranomostro Wrote: Edit: also, z3bra is right with everything. As always (sigh).

Couldn't agree more with this line!
josuah
(23-05-2016, 10:20 AM)z3bra Wrote:
Code:
$ printf '%s\n%s\n' "Mexico" "Oslo" | weather
Mexico    10    17    2    35
Oslo    22    44    37    8

This would be just as easy (or even easier) by adding a header and with alignment, like for <code>ps</code>:

Code:
City      Wind  Humidity Rain Temperature
Mexico    10    17       2    35
Oslo      22    44       37   8

I liked these long posts!
z3bra
(01-06-2016, 10:40 AM)sshbio Wrote: This would be just as easy (or even easier) by adding a header and with alignment, like for ps:

I'm not quite sure. To be honest, I feel like headers/alignment makes it harder to parse the output. For example, in order to get all the temperature, you'll need:
Code:
$ weather | sed 1d | tr -s ' ' | cut -d\  -f5
35
8

You first need to filter out the header, then, get rid of the alignment, and finally output the 5 field, using a custom separator.
With my solution, all you need is

Code:
$ weather | cut -f5
35
8

Way easier to parse actually, and thus, it should be easy to make a "readable output" right?
Code:
$ weather | sed '1iCity\tWind\tHumidity\tRain\tTemperature' | column -t
henriqueleng
Quote:I'm not quite sure. To be honest, I feel like headers/alignment makes it harder to parse the output. For example, in order to get all the temperature, you'll need:
Code:
$ weather | sed 1d | tr -s ' ' | cut -d\  -f5
35
8

But if you're intending to use awk, the first input shouldn't be a problem. It could be like:

Code:
weather | sed 1d | awk '{print $5}'

awk easily formats columns.



obs.: There must be some bug in the javascript because I can't quote z3bra's last post automatically with the quote button. Trying to quote other posts works fine.




Members  |  Stats  |  Night Mode