New (?) idea for shell scripting - Programming On Unix

Users browsing this thread: 1 Guest(s)
pranomostro
Long time nixers
Hey nixers,
this is going to be a long post, just to prepare you.

I recently had an idea for a new way to use a shell when scripting. The idea itself is pretty basic, it just contains a pipe that is reading from an input file and appending to the same file.

The most basic example for this is to have a file named 'a' containing a single line at the beginning. To make this file infinitely big, just write
Code:
tail -f a >>a

Ok, this isn't very exciting, so I had another idea how to illustrate my idea. Suppose you want to find out how vulnerable a checksum is to to collisions. So you write the checksum of anything into a file, for example
Code:
md5sum | sed 's/  -$//' >sums
and then you start a little shell script with this content:

Code:
#!/usr/bin/env rc
tail -f -n1 sums | ./unbuf 'md5sum | sed -u ''s/  -$//''' >>sums

The content of unbuf is:

Code:
#!/usr/bin/env rc
while(read | eval $"*) {}

Even if you don't know rc, this is fairly easy to understand: tail reads from sums, generates the next checksum from the last one and appends it to the file. unbuf makes sure that a checksum is only generated from one input line. Of course, you will still have to check if there are any doubles, with
Code:
sort sums | uniq -d
but this is a nice way of finding recurring checksums (I have only tried md5sum yet, and after 10,000,000 lines there was no duplicate, so this is fairly good).

My next example is the reason why I thought of this in the beginning.
In his book »Gödel Escher Bach«, Douglas R. Hofstadter describes a logical system called MU.
In this system, you start with one or more sentences containing the letters m, u and u, for example mui or uuimmuiui. You have got 4 rules for modifying sentences:
1: You can append i to a sentence that ends with u (mui->muiu
2: If a string begins with m, you can duplicate everything after the m (muu->muuuu)
3: You can substitute iii by u (miiiu->muu)
4: You can leave out uu (muuiuu->muui)

So I wrote a little script that generates mu expressions by reading from an input file and appending to it:

Code:
#!/usr/bin/env rc
tail -f mu | ./apr | grep --line-buffered -E '^.{,80}$' | uu >>mu

uu is a lua script that prints a line if it wasn't already in the input (unsorted uniq):
Code:
#!/usr/bin/env lua

local c=io.read()
local tab={}
while c ~= nil do
        if tab[c]==nil then
                print(c)
                tab[c]=1
        end
        c=io.read()
end

apr is another lua script where for each input line, each mu rule is applyed one or zero times (apply rules):
Code:
#!/usr/bin/env lua

function rule1(str)
        return string.gsub(str, "^(%a+i)$", "%1u")
end

function rule2(str)
        return str:gsub("^m(%a+)$", "m%1%1")
end

function rule3(str)
        return str:gsub("^(%a+)iii(%a+)$", "%1u%2")
end

function rule4(str)
        return str:gsub("^(%a+)uu(%a+)$", "%1%2")
end

c=io.read()

while c~=nil do
        a, b=rule1(c)
        if b~=0 then print(a) end
        a, b=rule2(c)
        if b~=0 then print(a) end
        a, b=rule3(c)
        if b~=0 then print(a) end
        a, b=rule4(c)
        if b~=0 then print(a) end

        c=io.read()
end
(My lua fu is not very strong, so this script has a fundamental flaw).
If you have got a starting-mu-expression in mu, it generates all (okay, not _all_) possible mu expressions resulting from that shorter than 80 characters.

Of course, this idea has few fundamental flaws: The script does not terminate when there are no input lines left, it just sits there waiting. One solution for that could be to use a program that quits (and closes the pipe) when there have been no input lines for a certain time. One other flaw is that input-output buffering is really bad in this case: when the programs do not print their results immediately, the pipe does not start. Additionally, not many unix tools support explicit line buffering-grep is one exception, for everything else you have to use unbuf.

Nevertheless, in my opinion this a trick that can be very elegant in some ways (especially the mu example) and also quite helpful.
What do you think? Is this useless tinkering with shell syntax in your opinion or could this be used in "real software"? Tell me your opinion!

P.S: I would call it a ring, because it does not have an end or beginning, the data just flows and accumulates in a circular way.
venam
Administrators
Sounds like recursive programming with the shell, very interesting.
This might not be very efficient due to all the IO but it's a nice "hack" I haven't seen before.
It could turn out helpful in situations where you only have access to a shell and not any other programming language interpreter/compiler.

We'll done!

NB: You might have been able to use xargs instead of the "unbuf" script you wrote.
pranomostro
Long time nixers
@venam: it's not exactly recursive, since no program is called from itself. And yes, it isn't that performant, but it is performant enough-for example, muring is quite fast and this implementation is a lot simpler than anything I could imagine in C.
Ah, yeah, I'll try xargs instead of unbuf. Sounds far easier.
I just wanted to share this and hear your opinion about it.
z3bra
Grey Hair Nixers
Buffered input are a huge pain to deal with.. I had a discussion with some good programmers someday, and we came up with an "unbuf" binary which would force unbuffered input. It wouldn't work with every program though, because some explicitely reset it, but it was a nice program to have under the hood.

I'll try to find it again.
pranomostro
Long time nixers
@z3bra: It would be great if you could find that one!
z3bra
Grey Hair Nixers
God bless IRC logs: http://sprunge.us/iFPa
pranomostro
Long time nixers
Thanks, I'll use that :)
pranomostro
Long time nixers
I found out that there is a coreutils utility, stdbuf.
It lets you set the buffer for a program, for example:
Code:
stdbuf -i0 -o512 awk 1
lets you set the input unbuffered for awk.