System Calls - Programming On Unix
System calls on Unix
> The most common Unix system calls.
> A quick overview.
System calls are one subject that scares many people. Actually most
of the low level stuffs happening on the operating system scares a lot
of people. I admit, I was a bit afraid of dealing with this subject.
Not because it's hard or anything but because it's something that we're
not used to dealing with everyday, it's like a hidden magic spell.
I was also afraid of dealing with this subject because I thought I
could make mistakes while explaining it and giving other people false
assumptions about the mechanism of their Unix operating systems.
But that's ok...
We'll explain everything slowly.
In this podcast we're discussing system calls on Unix operating systems,
it's gonna be a quick overview of what's happening there.
If you're someone that hardly know anything about them then it's the
episode you need to listen to.
Here we go.
I'm venam and you're listening to the nixers podcast!
#What's a system call
##What it is
Let's go over the definition of what is a system call.
A system call is a way to request a service from the kernel of the
OS from the userland. It's the interface that sits between processes
running on the machine and the operating system.
The services offered by the kernel vary from one OS to another, but as
we'll see there's something that sticks and stays coherent between all
the Unix-like operating systems.
Most importantly the system calls are an abstration layer between the
hardware and the user-space. It's the same system calls for different
hardware architecture, which means you don't have to change anything to
your program in user-space for it to be portable from cpu brand to brand,
It also generalizes functions across programming languages, any
programming language can access the system calls.
For the programmer it's just another function to call.
In this definition, we've mentioned a bunch of reasons why systems
calls are useful but what else can we say about why we have
Couldn't this have been done another way?
So why do we have an interface between the OS, the processes, and the
Let's review our definition of what an operating system kernel is:
> A kernel is the operating system software running in protected mode
> and having access to the hardware's privileged registers. The kernel
> is not a separate process running on the system. It is the guts of
> the operating system, which controls the scheduling of processes to
> achieve multitasking, and provides a set of routines, constantly in
> memory, to which every user-space process has access.
So the kernel is always "In-memory" and scheduling processes for
multi-tasking, and it provides functions. But what's that "protected
Most CPUs, most processors, have a security model built-in or also called
CPU modes. The common one is the rings model which specifies multiple
privileges levels at which a software can be executed. The kernel is
executed in unrestricted access mode, it can do anything allowed by the
cpu, read any part of memory, etc... All other programs run in a layer
above. They are limited to their own address space and can't mess up the
harware devices, they are limited to their level of access to resources.
They are prevented from it at a hardware level.
That is what is meant by protected mode and this is a must in any
multi-tasking operating system.
The concept of rings protection was introduced in Multics, an OS which
highly influenced the development of Unix.
This concept is core to Unix with its preemptive multitasking, where the
cpu clock interrupts rapidly and routinely between processes, switching
control from one to the other.
So what does this have to do with system calls, you might ask.
Programs need to access devices and components otherwise nothing would work.
That's where system calls enter the scene.
They provide well defined and safe implementations for those operations.
The OS handles the highest level of privileges and allow applications
to request access to them via system calls.
The system call initiate an interrupt or also called "trap" which puts
the CPU into elevated privilege and then it passes control to the kernel
which handles arguments and determines if the call should be done or
not. It then does its thing and return to the normal privilege level
and pass back the control to the calling program.
This is similar to multitasking where the cpu switches control using
interrupts, as we've said.
Couldn't this have been done another way? Instead of having a central
unit that controls the access to the important parts of the system and
hardwares, a part that is there for us not to mess up our machine.
Well, no current operating systems do.
There is however a concept called an exokernel where the operating system
doesn't offer a general abstraction of the resources but forces the
application developers to make decisions about those hardware abstractions
instead of the kernel.
That is in opposition with a microkernel and a monolithic architecture.
The monolithic architecture is the most common among Unix-like operating
So, then system calls are there as a way to protect you from yourself
destroying your machine.
That was the why.
Now let's take a look at those system calls, from an outer point of view
and inner point of view.
##What it looks like to the programmer - Library as middle man
As we said, system calls are like a library or API that sits between
normal programs and the kernel.
On Unix-like systems, that API is usually part of an implementation of
the C library (libc), such as glibc, that provides wrapper functions for
the system calls, often named the same as the system calls they invoke.
Those system calls can be implemented across programming languages (partly
because other programming languages have a lower C layer but they can
also be done directly in the language if it has assembly facilities) and
will look to the programmer just like another function. But in actuality,
the code for the function is contained within the kernel.
That C library wrapper, other than exposing an ordinary function to the
outside world, is made in a way that is modular and portable.
It's not actually the C library but the assembly code that is implemented
It works this way:
The wrapper places the arguments to be passed to the system call in the
appropriate register in the appropriate way and also sets a unique system
call number for the kernel to call.
This way the API is portable, those unique system call numbers are stable.
So the call to the library function itself does not cause a switch to
kernel mode directly, it's when this part of the code with the code or
number of the system call that is sent to the kernel that it's executed.
This is highly implementation and platform dependent unlike the number
assigned to the system call itself at that level above.
This level is called the application binary interface and it unstable,
it changes through time. However, the name of the system calls don't,
like we said: They are an abstraction.
##Super little Details
At the low level, there are differences between the Unix-like OS in
the way the system calls are managed and received by the kernel.
The Linux and BSDs both need to have them written in assembly.
FreeBSD supports both the BSD style of system calls and the Linux style.
In the BSD world they use the C calling convention, also known as cdecl,
which stands for C declaration. A declaration that originates from the
C programming language.
That means that any program written in any language can access the kernel, as
long as they can understand C functions.
The kernel is access using int 80h, also both on Linux and the BSDs.
`interrupt vector 0x80`
Specifically the convention for Free|Open|Net|DragonFly]BSD UNIX System
Calls is that they are done by passing the parameters by pushing them
to the stack and then doing the int $0x80 instruction.
int 80h ; Call kernel
push dword mode
push dword flags
push dword path
mov eax, 5
add esp, byte 12
On Linux the difference is in the way the parameters are passed to the
The parameters, however, are not passed on the stack but in EBX, ECX,
EDX, ESI, EDI, EBP: are used for passing 6 parameters to system calls.
The return value is in %eax. All other registers (including EFLAGS)
are preserved across the int 0x80.
So, ABCD, registers, and they're filled in the order of the CPU endian.
mov eax, 5
mov ebx, path
mov ecx, flags
mov edx, mode
For both Linux and BSDs the system call number is passed by filling the
And more generally speaking the arguments are filled just before that
but in different ways.
Let's note that FreeBSD gives you the choice to use the Linux way of doing
system calls only if the kernel has Linux emulation installed.
Moreover you need to specify that a program is branded Linux.
You do that using the brandelf tool:
% brandelf -t Linux filename
As a note here, if you want to write any program in assembly it'll always come
down to this: You wanna interact with your system so you're gonna be doing
the usual jumps and loops but other than that it's all just about filling
registers to do the system calls and managing memory.
Let's now discuss more about the CPUs.
#CPU - The underlying principles
As we've said, system calls in most Unix-like systems are processed in
kernel mode which is done by changing the processor execution mode to
a more privileged one.
This, however, does not mean that there's gonna be a process or context
switch, it's not a switch of process, it's just a temporary delegation
to the kernel while the calling process is waiting for the response.
We've mentioned that earlier.
But what does happen when it's running in a multithreaded application.
As we know, threads in the Unix world are small processes with their
There are many ways to handle this situation.
Most Unix-like OS use the one-to-one model, which means that every threads
get attached to a distinct kernel-level thread during the system call.
This solves the issue of blocking system calls.
Let's mention that there are other ways to do that such as:
All system calls from any user thread in a process are handled by a single
kernel-level thread. Which means every thread has to wait for the other
In this model a pool of user threads is mapped to a pool of kernel
threads. All system calls from a user thread pool are handled by the
threads in their corresponding kernel thread pool
This model implements both many to many and one to one model depending
upon choice made by the kernel. This is found in old versions of IRIX,
HP-UX and Solaris.
Let's go back to the CPU.
Different architectures give out different facilities.
One of them for example is found in the x86 instruction set that contains
the SYSCALL/SYSRET and SYSENTER/SYSEXIT, both implemented by AMD and
Intel vendors. Those are fast control transfer instruction designed to
quickly transfer control to the kernel for a system call without the
overhead of an interrupt
Another one of those nifty mechanism is the old x86 call gate, which allows
programs to directly call a kernel function using a safe control transfer
Let's talk about real examples of system calls and what are the ones
available on most Unix.
Let's read a little excerpt
> On Unix, Unix-like and other POSIX-compliant operating systems, popular
> system calls are open, read, write, close, wait, exec, fork, exit, and
> kill. Many modern operating systems have hundreds of system calls. For
> example, Linux and OpenBSD each have over 300 different calls, NetBSD
> has close to 500, FreeBSD has over 500, while Plan 9 has 51.
POSIX... Wait, we haven't mentioned POSIX yet.
What's that thing?
What's that POSIX thing?
Posix stands for Portable Operating System Interface and it's a family
of standards specified by the IEEE Computer Society for maintaining
compatibility between operating systems.
POSIX standards are about the API of an OS and the command line shells
and utility interfaces of Unix-like OS.
So, it's a standard that is there for the compatibility between Unix-like
OS but as with all standards the list of points to follow is huge and
most just partly fullfil it and that's not really an issue as long as
they take the most important bits and pieces.
All the modern and most popular Unix-like OS are only partly adhering to it.
Only OSX amongst the "new team" is fully compliant.
Other than that you have AIX, HP-UX, IRIX, Solaris, Tru64, UnixWare,
Weirdly enough those are all mostly closed source operating systems
which is suspiciously annoying.
But not being fully compliant doesn't mean that the system isn't Unix-like.
There's another standard called the Single UNIX specification.
It shows if a system can be compliant to be qualified as a "UNIX" trademark.
Again a very commercial way of seeing what Unix really is.
And very few BSD and Linux-based operating systems are submitted for
compliance with the Single UNIX Specification. Also, again, only closed
source Unix-like OSs adhere to it.
Unix is more about the philosophy and the way of working with this
multitasking OS, taking the spirit back from the Bell Labs.
Well, what does that all have to do with system calls?
It turns out that those standards have a set of functions that are
sometimes implemented as system calls.
##Categories of syscalls
POSIX calls can be implemented in the standard libarary or as system call.
It is a specification and does not "know" about syscalls which,
in the POSIX view, are an implementation detail.
Nothing mandates the way they are implemented.
They can even be implemented in non-Unix like OS.
To know which one are system calls you need to see which one overlaps
Let's talk about POSIX.
POSIX is divided in two parts:
The system interfaces, and the commands and utilities
We're not interested in the commands and utilities and only interested
in the system interface, and only if those are also system calls.
There are 5 main categories of system calls:
* Process Control
* File management
* Device Management
* Information Maintenance
They overlap with some of the features in POSIX.
Such as process creation and control in POSIX overlaps with process control,
Clock and timers in POSIX overlaps with information maintenance.
What is to know here is that there are way more POSIX specifications
than would be needed for system calls.
So there's more of a chance that a POSIX specification would not be a
The list of POSIX specifications are quite extensive.
Ranging from thread creation, managing shared memory, pipes, timers,
bus error, signals, etc..
You can read more about those from the links in the show notes.
POSIX.1: Core Services (incorporates Standard ANSI C) (IEEE Std 1003.1-1988)
Process Creation and Control
Floating Point Exceptions
Segmentation / Memory Violations
File and Directory Operations
C Library (Standard C)
I/O Port Interface and Control
POSIX.1b: Real-time extensions (IEEE Std 1003.1b-1993, later appearing as librt—the Realtime Extensions library))
Clocks and Timers
Asynchronous and Synchronous I/O
Memory Locking Interface
POSIX.1c: Threads extensions (IEEE Std 1003.1c-1995)
Thread Creation, Control, and Cleanup
#The most common ones
To find the most common let's do something crazy.
Let's check the source of openbsd, netbsd, linux, and freebsd, and list
the common ones, we can even know if they are in POSIX.
That will answer if the common system calls are all POSIX or if there are
There are 136 common system calls betwen openbsd, netbsd, linux and freebsd.
They are the following:
Only 5 aren't POSIX:
But overall they're mostly POSIX, 97% of the time when they are common
with other OSs.
Categories a system call can be part of:
create process (for example, fork on Unix-like systems, or NtCreateProcess in the Windows NT Native API)
get/set process attributes
wait for time, wait event, signal event
allocate, free memory
create file, delete file
read, write, reposition
get/set file attributes
request device, release device
read, write, reposition
get/set device attributes
logically attach or detach devices
get/set time or date
get/set system data
get/set process, file, or device attributes
create, delete communication connection
send, receive messages
transfer status information
attach or detach remote devices
# Tips and tools
Check the source
Tools such as ktrace (BSD), strace (Linux), DTrace (Solaris), and
truss allow a process to execute from start and report all system calls
the process invokes, or can attach to an already running process and
intercept any system call made by said process if the operation does
not violate the permissions of the user. This special ability of the
program is usually also implemented with a system call, e.g. strace is
implemented with ptrace or system calls on files in procfs.