Benchmarking Pthread - Programming On Unix

Users browsing this thread: 1 Guest(s)
venam
Administrators
Hello nixers,
I've been recently wondering about the effect of different pthread implementations, especially when it comes to scheduling between threads either CPU bound or not, different scheduling algorithms, and most importantly if using process contention scope or system contention scope affects normal usage "snappiness" and how much.

I've been trying to find benchmarks à la TPC or SPEC, but I've only found this 20+ year old paper, Benchmarking Pthreads Performance.
According to it, process bound threads are faster to create, on all 4 machines tested, than system bound ones. However, there's no indication of the effect it would have on the normal computing. You could only guess what it would do.

I find this especially important in a world where single processes now have hundreds of threads for themselves, and if ran in system contention scope could starve other processes.
Especially that Linux only supports PTHREAD_SCOPE_SYSTEM, see man 3 pthread_attr_setscope.

I'm wondering if anyone has more info on this or interesting benchmarks to share, especially from companies switching from Linux to FreeBSD or other types of system with different pthread default behavior. Or software that have modified their software to take advantage of certain facilities.



Brief background info:
There are two types of threads: kernel-level threads and user-level threads.
Kernel-level threads are responsible for kernel stuff and handling system calls.
User-level threads are the threads user create.

User-level threads need to be mapped with kernel-level threads to be able to do anything useful. This could either be a one-to-one, one-to-many, or many-to-many relation. On most Unix-like system it's a one-to-one relation, for every thread there's a kernel thread (other than Solaris which is many-to-one).
See https://en.wikipedia.org/wiki/Light-weight_process

The kernel schedules the priority of the kernel-level thread according to whatever scheme it has in place.
What schedules the user-level thread is the thread library, in our case pthread.

There are two big ways to take care of threads, either all threads from all processes are considered equals, system contention scope. Or threads from a single process are grouped together and considered a single thread block, all processes being scheduled as is, process contention scope.

So if you have a 100 unit of processing time, 5 threads per process, and 3 processes, that would be.
System contention scope: 6.66 units per thread regardless of the process.
Process contention scope: 33.3 units per process, internally the process schedule these 33.3 between 5 threads.

Obviously CPU bounding also plays a role, along with the type of pthread scheduling policy.

Additional link: https://www.icir.org/gregor/tools/pthrea...uling.html