On Stdio... - GNU/Linux

Users browsing this thread: 3 Guest(s)
NetherOrb
Long time nixers
After reading some posts, and reading man pages on Stdio I grew confused on a few things.

What do files do when interactig with each other? Im assuming this depends on file type, if so, let assume they are bash scripts. Do they use input streams even though none is coming from the keyboard?

Are the streams independent from keyboard/monitor and are these input and output devices merely a way to 'tap' into these streams?
movq
Long time nixers
Are you referring to pipes? Something like this?

Code:
$ ./first_script.sh | grep something | ./second_script.pl

Here's an extremely short rundown.

When a process starts (be it a script or not), it is usually preconnected with three file descriptors: STDIN, STDOUT, STDERR. The process can use these FDs as if they were regular files. Unless your program is an interactive one (like an editor), it doesn't matter if these file descriptors point to actual files or "devices" like a terminal. When you build a pipeline like `foo | bar`, STDOUT of `foo` is fed into STDIN of `bar` by means of a pipe. `bar` then just reads from its STDIN and it just works.

Now, I vaguely remember that somebody already posted a detailed article about files, file descriptors, pipes, and similar concepts. I just can't find it anymore. :-)

I/we can elaborate more on that topic if you like. In the meantime, the following articles on Wikipedia might help you as well:

https://en.wikipedia.org/wiki/Standard_streams
https://en.wikipedia.org/wiki/Pipeline_(Unix)
pranomostro
Long time nixers
Okay, expanding what vain already said:

every file is basically a stream of bytes. You can read bytes from a stream, and you can write bytes
to a stream. When you read a file with the content 'abc', the first byte in the stream is 'a', the second
byte in the stream is 'b' and the third byte in the stream is 'c'.

This holds true for every file, for stdin and stdout and stderr, but also for regular disk files, like
music and image files, text files, source code, databases and so on, as well as pipes and redirections.

A redirection is a way of saving such a byte stream or retrieving it again, you create a byte stream
from a file a to the command cat by typing `cat <a`. The same is true when you redirect into a disk
file, then you can say `cat < a >b`. This does not depend on the file type, since file types do not
really exist on unix. You can treat a file in a certain way when it has a certain layout, but the
type is not intrinsically a direct attribute of the file. A C source code file is the same as an image.

Pipes are a mechanism that you don't need to save your byte stream on disk, but that you can use
it directly. `du -ab | sort -n` is an example for this, it produces a stream of bytes that is then
fed directly into a command, namely `sort`.

Okay, what is stdio? stdio is a part of the C standard library for dealing with such streams. It has support
for reading and writing streams with different functions such as fgets, printf and fwrite.
movq
Long time nixers
Another thing to note:

(06-11-2016, 02:48 PM)pranomostro Wrote: every file is basically a stream of bytes.

I don't know of any *modern* operating system or file system where this statement is *not* true (as long as you consider ordinary files on disk -- I have no idea about "special files" on Windows or stuff like that). Historically, however, there were other approaches. For example, there were record-oriented file systems. The point is, the *operating system* would dictate a certain file format by enforcing a record-based structure. This isn't the case anymore today. Sure, your Linux application in user space can always do the same thing (organize a file in a certain way), but the kernel does not care at all. To the kernel, it's just a stream of bytes, and it's your job to bring order to chaos.

Quoting from "The UNIX Programming Environment" by Kernighan and Pike from 1984 (p. 44):

Quote:The Unix system is unusual in its approach to representing control information, particularly its use of newlines to terminate files. Many systems instead provide "records," one per line, each of which contains not only your data but also a count of the number of characters in the line (and no newline).
[...]
The Unix systems does neither -- there are no records, no record counts, and no bytes in any file that you or your programs did not put there.
[...]
For most purposes, this simple scheme is exactly what is wanted. When a more complicated structure is needed, it can easily be built on top of this; the converse, creating simplicity from complexity, is harder to achieve.
pranomostro
Long time nixers
@vain: Yeah, I was talking about unix only. Correct would be the sentence 'On Unix, every file is basically a stream of bytes, on other systems this might be different'.
NetherOrb
Long time nixers
Thanks to you both.

So wow. Quite literally everything is a file that contains streams of bytes. The file is a notation, or container for bytes, for us to make sense out of the streams right?

Ah and that link that you sent, of wiki talking about streams, is what confused me. The picture they use gave me the impression that stdin is merely for taking in information from the terminal. As if it were special somehow. But as has been said everything is a file. This includes the terminal itself I presume.
apk
Long time nixers
(06-11-2016, 02:48 PM)pranomostro Wrote: Okay, expanding what vain already said:

every file is basically a stream of bytes. You can read bytes from a stream, and you can write bytes
to a stream. When you read a file with the content 'abc', the first byte in the stream is 'a', the second
byte in the stream is 'b' and the third byte in the stream is 'c'.

This holds true for every file, for stdin and stdout and stderr, but also for regular disk files, like
music and image files, text files, source code, databases and so on, as well as pipes and redirections.

A redirection is a way of saving such a byte stream or retrieving it again, you create a byte stream
from a file a to the command cat by typing `cat <a`. The same is true when you redirect into a disk
file, then you can say `cat < a >b`. This does not depend on the file type, since file types do not
really exist on unix. You can treat a file in a certain way when it has a certain layout, but the
type is not intrinsically a direct attribute of the file. A C source code file is the same as an image.

Pipes are a mechanism that you don't need to save your byte stream on disk, but that you can use
it directly. `du -ab | sort -n` is an example for this, it produces a stream of bytes that is then
fed directly into a command, namely `sort`.

Okay, what is stdio? stdio is a part of the C standard library for dealing with such streams. It has support
for reading and writing streams with different functions such as fgets, printf and fwrite.

hey how do u get the orange text thats pretty cool
NetherOrb
Long time nixers
I dont think it was intentional.
venam
Administrators
(07-11-2016, 12:13 AM)dsplayer14 Wrote: hey how do u get the orange text thats pretty cool
It's the anti-xss system of the forums that messes with "<" and ">" characters.
jkl
Long time nixers
(06-11-2016, 02:48 PM)pranomostro Wrote: When you read a file with the content 'abc', the first byte in the stream is 'a', the second
byte in the stream is 'b' and the third byte in the stream is 'c'.

Depending on the character encoding, that is.

--
<mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen
pranomostro
Long time nixers
jkl: Kinda. It still just reads bytes, not codeunits. I should have written 'It first reads the byte 61, then it reads the byte 62, and then the byte 63 (at least in ASCII).'
z3bra
Grey Hair Nixers
Is there any character encoding that would treat 0x61 as something else than 'a' ?
jkl
Long time nixers
In EBCDIC, it's '/'.


(edited for correctness)

--
<mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen
pranomostro
Long time nixers
In UTF16, 'a' is the two bytes 0xfeff 0x0061.

Poor Windows people.
jkl
Long time nixers
In UTF-8, most characters are two bytes.

Poor Unicode people.

--
<mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen
pranomostro
Long time nixers
But not all of them, and UTF-16 has no ASCII backwards compatability.
jkl
Long time nixers
Emojis are UTF-16. I fail to see how this is related to Windows.

--
<mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen
venam
Administrators
This argument is going on forever...
It now deserves its name in history: The Unicode war.
OP has got the basic idea, I don't think there's a need to argue more about this.
It may deserve its own thread if it tackles anyone's interest.
pranomostro
Long time nixers
I think we are just arguing about two different things. I wanted to say that

1. Windows uses UTF-16 almost exclusively
2. UTF-16 has 2 bytes per codepoint (sometimes 3?)
3. And no ASCII backwards compatability.

jkl was making fun of me (understandably) for being snarky.

Well. OP, congrats! You initiated the first tiny flamewar on these forums. Now we can stop.
jkl
Long time nixers
Just when it starts to be interesting... ;o)

--
<mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen