The role of distributions &/or Unix flavors, where does pkg management stands

The role of distributions &/or Unix flavors, where does pkg management stands - Psychology, Philosophy, and Licenses

Users browsing this thread: 1 Guest(s)

	venam Offline \| 06-03-2018, 04:53 AM \| #1

Hello fellow nixers,
This thread is about launching one of those discussion podcast.

The topic this time is: What's The role of distributions &/or Unix flavors, where does package management stands in all that?

EDIT: It has finally been posted here.

Link of the recording [ https://github.com/nixers-projects/podca...3?raw=true http://podcast.nixers.net/feed/download....07-221.mp3 ]

We'll try to schedule it for next week, so hop on the scheduler interface and put all the hours you might be available so that we can choose the best common denominator: https://podcast.nixers.net

If you don't have a key you can PM me anywhere for one.

Relevant threads and articles for your personal research might be found in:
Issue #63 of the newsletter ( https://newsletter.nixers.net/entries.php#63 ) and this week issue #65
the thread "pkg management, what do you expect, what you wished you had" ( https://nixers.net/showthread.php?tid=1883 )
"GoboLinux and Package Management" https://nixers.net/showthread.php?tid=2049
And, obviously, much more, like all https://nixos.org/ or other type of package management or distro management.
More questions to think about:
What is expected from a distro or Unix-flavor
The "from scratch" approach, advantages?
What's the role of package manager?
What's the role of maintainers?
What is the current issue, containers, mini-language-specific-modules, etc..?

Keep your ideas to yourself before the discussion actually take place in the podcast.

EDIT: It has finally been posted here.

	venam Offline \| 15-01-2020, 11:27 AM \| #2

This topic wasn't actually discussed so far and it keeps bugging me as the world of package management and distributions go further apart.
When I get a bit more time I'll come back to it. Meanwhile let's mention for those going to FOSDEM that there's a "Dependency Management" devroom that's going to discuss such topic.

	jkl Offline \| 18-01-2020, 07:34 PM \| #3

I wish there was one way to manage dependencies for all platforms.

--
<mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen

	z3bra Offline \| 19-01-2020, 02:23 PM \| #4

(18-01-2020, 07:34 PM)jkl Wrote: I wish there was one way to manage dependencies for all platforms.

pkgsrc is a step toward that. It is the package manager from netbsd, but can be used on many (posix) platforms, including minix.

Package management is a complex topic as different people have different needs.
The only assumption you can have is that everyone will want the manager to not get in their way. This is why debian has the "apt" frontend for dpkg, or yum/dnf for rpms. This is why most of the time "$TOOL install/remove whatever" will work as expected. They follow the principle of least surprise. Unfortunately, such simplicity in the interface/usage comes at the cost of complexity on the packaging side. The packages are cluttered with metadata, pre/post install scripts and so on and so forth.
This is where the user steps in, as some "power users" will want better control over what they install and have the ability to easily review packages amd their dependencies.

I personally care more about having a simple packaging format, than a good dependency handling, mostly because I prefer software that have the least amount of dependencies. I made my own package manager for this purpose, because it lets me review the softwareI fetch, package jt the way I want and install it where I want, with the privileges I set (most of my tools are installed under my UID, in $HOME/.local).
This obviously comes at the cost of having to fetch updates manually, which (for now) is a bit of a burden. But the simplicity of.packaging overweight this for me.

	venam Offline \| 21-01-2020, 05:33 AM \| #5

(19-01-2020, 02:23 PM)z3bra Wrote: I wish there was one way to manage dependencies for all platforms.

For Linux specifically, there's Flatpack, AppImage, and Snap. So far, in my opinion, Snap is taking the lead. It's easy to use and works on almost all distros.

	jkl Offline \| 21-01-2020, 05:43 AM \| #6

Linux is one platform though.

--
<mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen

	z3bra Offline \| 22-01-2020, 09:28 AM \| #7

Snap is easy to use, but a pain to manage. This is also a huge step backward IMO, as you put the packaging in the hands of the developer. This os the same nonsense as letting devs push to production directly.

	eadwardus Offline \| 22-01-2020, 09:24 PM \| #8

Most of the packing drama comes from the use of dynamic linking; otherwise converting the packages would be more than enough (or patching pkgsrc to build packages). The "new" "solutions" flatpack, appimage, and snap are basically the return to an inferior static linking (sadly, such a model is encouraged by licenses uncertainties). For me, the natural progression of the current insanity is solutions like Nix and Guix.

	jkl Offline \| 23-01-2020, 03:30 AM \| #9

Static linking avoids dependency hell. I actually prefer that.

--
<mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen

	z3bra Offline \| 31-01-2020, 09:41 AM \| #10

Unfortunately, static linking is definitely not manageable nowadays. I gave it a try a few years back, and had a really bad time getting the compiler to behave as I would (gcc is a bitch here, really).

I do agree though that all the new "packaging methods" are really badly done, as you end up with devs shipping their own library versions along with the packages, and you are then dependent on them to push new snaps/flatpack/titi/kaka/whatever when a new patch is needed.
I guess they finally managed to port DLL hell to Linux. This is our future now.

	venam Offline \| 29-03-2020, 02:23 PM \| #11

I finally got the time to write something about this. I thought of recording a podcast but I found it easier to simply post the content here in text/blog form. So here we go, I hope you enjoy the research.

What is a distribution

What are software distributions? You may think you know everything there
is to know about the term software distribution, but take a moment to
think about it, take a step back and try to see the big picture.

We often have in mind the thousands of Linux distributions when we hear
it, however, this is far from limited to Linux, BSD, Berkeley Software
Distribution, has software distribution right in the name. Android,
and iOS are software distributions too.

Actually, it's so prevalent, we may have stopped paying attention to
the concept. We find it hard to put a definition together.
There's definitely the part about distributing software in it. Software
that may be commercial or not, open source or not.
To understand it better maybe investigating what problems software
distributions address would clear things up.

Let's imagine a world before software distributions, does that world
exist? A world where software stays within boundaries, not shared with anyone
outside of it.
Once we break these boundaries and we want to share it, we'll find that we
have to package all the software together in a meaningful way, configure
them so that they work well together, adding some glue in between when
necessary, find the appropriate medium to distribute the bundle, get
it all from one end to another safely, make sure it installs properly,
and follow up on it.

Thus, software distribution is about the mechanism and the community
that takes the burden and decisions to build an assemblage of coherent
software that can be shipped.

The operating system, or kernel if you like, could be, and is often,
part of the collage offered, a software just like others.

The people behind it are called distribution maintainers, or package
maintainers. Their role vary widely, they could write the software that
stores all the packages called the repository, maintain a package manager
with its format, maintain a full operating system installer, package and
upload software they built or that someone else built on a specific time
frame/life cycle, make sure there aren't any malicious code uploaded on
the repository, follow up on the latest security issues and bug reports,
fix third party software to fit the distribution philosophical choices
and configurations, and most importantly test, plan, and make sure
everything holds up together.
These maintainers are the source of trust of the distribution, they
take responsibility for it. In fact, I think it's more accurate to call
them distributors.

Different ways to approach it

There's so many distributions it can make your head spin. The software
world is booming, especially the open source one. For instance, we can
find bifurcations of distributions that get copied by new maintainers
and divert. This creates a tree like aspect, a genealogy of both common
ancestors and/or influences in technical and philosophical choices.
Overall, we now have a vibrant ecosystem where a thing learned on a
branch can help a completely unrelated leaf on another tree. There's
something for everyone.

Target and speciality

So what could be so different between all those software distributions,
why not have a single platform that everyone can build on.

One thing is specialization and differentiation. Each distro caters to
a different audience and is built by a community with its philosophy.

Let's go over some of them:

A distribution can support specific sets and combinations of hardware:
from CPU ISA to peripherals drivers
A distribution may be specifically optimized for a type of environment:
Be it desktop, portable mobile device, servers, warehouse size computers,
embedded devices, virtualised environment, etc..
A distribution can be commercially backed or not
A distribution can be designed for different levels of knowledge in a
domain, professional or not. For instance, security research, scientific
computing, music production, multimedia box, HUD in cars, mobile device
interface, etc..
A distribution might have been certified to follow certain standards
that need to be adhere to in professional settings, for example security
standards and hardening
A distribution may have a single purpose in a commodity machine,
specific machine functionalities such as firewall, a computer cluster,
a router, etc..

That all comes to the raison d'être, the philosophy of the distribution,
it guides every decision the maintainers have to make. It guides how they
configure every software, how they think about security, portability,
comprehensiveness.

For example, if a distribution cares about free software, it's going to
be strict about what software it includes and what licenses it allows
in its repository, having software to check the consistency of licenses
in the core.
Another example is if their goal is to target a desktop audience then
internationalization, ease of use, user friendliness, having a large
number of packages, is going to be prioritized. While, again, if the
target is a real time embedded device, the size of the kernel is going
to be small, configured and optimized for this purpose, and limiting and
choosing the appropriate packages that work in this environment. Or if
it's targeted at advanced users that love having control of their machine,
the maintainers will choose to let the users make most of the decisions,
providing as many packages as possible with the latest version possible,
with a loosely way to install the distribution, having a lot of libraries
and software development tools.

What this means is that a distribution does anything it can to provide
sane defaults that fit its mindset. It composes and configures a layer
of components, a stack of software.

The layering

Distribution maintainers often have at their disposition different blocks
and the ability to choose them, stacking them to create a unit we call a
software distribution. There's a range of approaches to this, they could
choose to have more, or less, included in what they consider the core of
the distribution and what is externally less important to it.
Moreover, sometimes they might even leave the core very small and loose,
instead providing the glue software that makes it easy for the users
to choose and swap the blocks at specific stages in time: installation,
run time, maintenance mode, etc..

So what are those blocks of interdependent components.

The first part is the method of installation, this is what everything
hinges on, the starting point.

The second part is the kernel, the real core of all operating systems
today. But that doesn't mean that the distribution has to enforce
it. Some distributions may go as far as to provide multiple kernels
specialised in different things or none at all.

The third part is the filesystem and file hierarchy, the component that
manages where and how files are spread out on the physical or virtual
hardware. This could be a mix and match where sections of the file system
tree are stored on separate filesystems.

The fourth part is the init system, PID 1. This choice has generated
a lot of contention these days. PID 1 being the mother process of all
other processes on the system. What role it has and what functionalities
it should include is a subject of debate.

The fifth part is composed of the shell utilities, what we sometimes
refer to as the userland or user space, as its the first layer the user
can directly interface with to have control of the operating system, the
place where processes run. The userland implementations on Unix-based
systems usually tries to follow the POSIX standard. There are many such
implementations, also subject of contention.

The sixth part is made up of services and their management. The daemons,
long running processes that keep the system in order. Many argue if the
management functionality should be part of the init system or not.

The seventh part is documentation. Often it is forgotten but it is still
very important.

The last part is about everything else, all the user interfaces and
utilities a user can have and ways to manage them on the system.

Stable releases vs Rolling

There exists a spectrum on which distributions place themselves when
it comes to keeping up to date with the versions of the software they
provide. This most often applies to external third party open source
software.
The spectrum is the following: Do we allow the users to always have the
latest version of every software while running the risk of accidentally
breaking their system, what we call bleeding edge or rolling distro, or
do we take a more conservative approach and take the time to test every
software properly before allowing it in the repository, while not having
all the latest updates, features, and optimizations of those software,
what we call release based distro.

The extreme of the first scenario would be to let users directly download
from the software vendor/creator source code repository, or the opposite,
let the software vendor/creator push directly to the distribution
repository. Which could easily break or conflict with the user's system
or lead to security vulnerability. We'll come back to this later, as
this could be avoided if the software runs in a containerized environment.

When it comes to release distributions, it usually involves having a long
term support stable version that keeps receiving and syncing with the
necessary security updates and bug fixes on the long run while having
another version running a bit ahead testing the future changes. On
specific time frames, users can jump to the latest release of the
distribution, which may involve a lot of changes in both configuration
and software.
Some distributions decide they may want to break ABI or API of the kernel
upon major releases, that means that everything in the system needs to
be rebuilt and reinstalled.

The release cycle, and the rate of updates is really a spectrum.

When it comes to updates, in both cases, the distribution maintainers
have to decide how to communicate and handle them. How to let the users
know what changes. If a user configuration was swapped for a new one or
merged with the new one, or copied aside.
Communication is essential, be it through official channels, logging,
mails, etc.. Communication needs to be bi-directional, users report bugs
and maintainers posts what their decisions are and if users need to be
involved in them. This creates the community around the distribution.

Rolling releases require intensive efforts from package maintainers as
they constantly have to keep up with software developers. Especially
when it comes to the thousands of newest libraries that are part of
recent programming languages and that keep on increasing.

Various users will want precise things out of a system. Enterprise
environments and mission critical tasks will prefer stable releases,
and software developers or normal end users may prefer to have the
ability to use the latest current software.

Interdistribution standard

With all this, can't there be an interdistribution standard that creates
order, and would we want such standard.

At the user level, the differences are not always noticeable, most
of the time everything seems to work as Unix systems are expected to
work.
There's no real standard between distributions other than that they are
more or less following the POSIX standards.

Within the Linux ecosystem, the Free Standards Group tries to improve
interoperability of software by fixing a common Linux ABI, file system
hierarchy, naming conventions, and more. But that's just the tip of the
iceberg when it comes to having something that works interdistributions.

Furthermore, each part of the layering we've seen before could be said
to have its own standards: There are desktop interoperability standards,
filesystem standards, networking standards, security standards, etc..

The biggest player right now when it comes to this is systemd in
association with the free desktop group, it tries to create (force)
an interdistribution standard for Linux distribution.

But again, the big Question: Do we actually want such inter-distribution
standards, can't we be happy with the mix and match we currently
have. Would we profit from such thing?

The package manager and packaging

Let's now pay attention to the package themselves, how we store them, how
we give secure access to them, how we are able to search amongst them,
download them, install them, remove them, and anything related to their
local management, versioning, and configuration.

Method of distribution

How do we distribute software, share them, what's the front-end to
this process.

First of all, where do we store this software.

Historically and still today, software can be shared via physical
medium such as CD-ROM, DVD, USBs, etc.. This is common when it comes
to proprietary vendors to have the distribution come with a piece of
hardware they are selling, it's also common for the procurement of the
initial installation image.
However, with today's hectic software growth, using a physical medium
isn't flexible. Sharing over the internet is more convenient, be it via
FTP, HTTP, HTTPS, a publicly available svn or git repo, via central
website hubs such as Github or appliation stores such the ones Apple
and Google provide.

A requirement is that the storage and the communication to it should be
secure, reliable against failures, and accessible from anywhere. Thus,
replication is often done to avoid failures but also to have a sort of
edge network speeding effect across the world, load balancing. Replication
could be done in multiple ways, it could be a P2P distributed system
for instance.

How we store it and in what format is up to the repository
maintainers. Usually, this is a file system with a software API users
can interact with over the wire. Two main format strategies exist:
source based repositories and binary repositories.

Second of all, who can upload and manage the host of packages. Who has
the right to replicate the repository.

As a source of truth for the users, it is important to make sure the
packages have been verified and secured before being accepted on the
repository.

Many distribution have the maintainers be the only ones that are able
to do this. Giving them cryptographic keys to sign packages and validate
them.

Others have their own users build the packages, send them to a central
hub for automatic or manual verification and then uploaded to the
repository. Each user having their own cryptographic key for signature
verification.

This comes down to an issue of trust and stability. Having the users
upload packages isn't always feasible when using binary packages if the
individual packages are not containerized properly.

There's a third option, the road in between, having the two types, the
core managed by the official distribution maintainers and the rest by
its user community.

Finally, the packages reach the user.

How the user interact with the repository locally and remotely depends on
the package management choices. Do users cache a version of the remote
repository, like is common with the BSD port tree system.
How flexible can it be to track updates, locking versions of software,
allowing downgrades. Can users download from different sources. Can
users have multiple version of the same software on the their machine.

Format

As we've said there are two main philosophy of software sharing format:
source code port-style and pre-built binary packages.

The software that manages those on the user side is called the package
manager, it's the link with the repository. Though, in source based repo
I'm not sure we can call them this way, but regardless I'll still refer
to them as such.
Many distributions create their own or reuse a popular one. It does the
search, download, install, update, and removal of local software. It's
not a small task.

The rule of the book is that if it isn't installed by the package manager
then it won't be aware of its existence. Noting that distributions don't
have to be limited to a single package manager, there could be many.

Each package manager relies on a specific format and metadata to be able
to manage software, be it source or binary formatted. This format can
be composed of a group of files or a single binary file with specific
information segments that together create recipes that help throughout
its lifecycle. Some are easier to put together than others, incidentally
allowing more user contributions.

Here's a list of common information that the package manager needs:

The package name
The version
The description
The dependencies on other packages, along with their versions
The directory layout that needs to be created for the package
Along with the configuration files that it needs and if they should
be overwritten or not
An integrity, or ECC, on all files, such as SHA256
Authenticity, to know that it comes from the trusted source, such as
cryptographic signatures checked against a trusted store on the user's
machine
If this is a group of package, meta package, or a direct one
The actions to take on certain events: pre-installation,
post-installation, pre-removal, and post removal
If there are specific configuration flags or parameter to pass to the
package manager upon installation

So what's the advantage of having pre-compiled binary packages instead
of cloning the source code and compiling ourselves. Won't that remove
a burden from package maintainers.

One advantage is that pre-compiled packages are convenient, it's easier to
download them and run them instantly. It's also hard, if not impossible,
these days, and energy intensive, to compile huge software such as
web browsers.
Another point, is that proprietary software are often already distributed
as binary packages, which would creates a mix of source and binary
packages.

Binary formats are also space efficient as the code is stored in a
compressed archived format. For example: APK, Deb, Nix, ORB, PKG, RPM,
Snap, pkg.tar.gz/xz, etc..
Some package managers may also choose to leave the choice of compression
up to the user and dynamically discern from its configuration file how
to decompress packages.

Let's add that there exists tools, such as "Alien", that facilitate the
job of package maintainers by converting from one binary package format
to another.

Conflict resolution & Dependencies management

Resolving dependencies

One of the hardest job of the package manager is to resolve dependencies.

A package manager has to keep a list of all the packages and their
versions that are currently installed on the system and their
dependencies.
When the user wants to install a package, it has to take as input the
list of dependencies of that package, compare it against the one it
already has and output a list of what needs to be installed in an order
that satisfies all dependencies.

This is a problem that is commonly encountered in the software development
world with build automation utilities such as make. The tool creates a
directed acyclic graph (DAG), and using the power of graph theory and the
acyclic dependencies principle (ADP) tries to find the right order. If
no solution is found, or if there are conflicts or cycles in the graph,
the action should be aborted.

The same applies in reverse, upon removal of the package. We have to make
a decision, do we remove all the other packages that were installed as
a dependency of that single one. What if newer packages depend on those
dependencies, should we only allow the removal of the unused dependencies.

This is a hard problem, indeed.

Versioning

This problem increases when we add the factor of versioning to the mix,
if we allow multiple versions of the same software to be installed on
the system.

If we don't, but allow switching from one version to another, do we also
switch all other packages that depend on it too.

Versioning applies everywhere, not only to packages but to release
versions of the distribution too. A lot of them attach certain version
of packages to specific releases, and consequentially releases may have
different repositories.

The choice of naming conventions also plays a role, it should convey to
users what they are about and if any changes happened.

Should the package maintainer follow the naming convention of the software
developer or should they use their own. What if the name of two software
conflict with one another, this makes it impossible to have it in the
repo, some extra information needs to be added.

Do we rely on semantic versioning, major, minor, patch, or do we rely
on names like so many distributions releases do (toy story, deserts,
etc..), or do we rely on the date it was released, or maybe simply an
incremental number.

All those convey meaning to the user when they search and update packages
from the repository.

Static vs dynamic linking

One thing that may not apply to source based distro, is the decision
between building packages as statically linked to libraries or dynamically
linked.

Dynamic linking is the process in which a program chooses not to include
a library it depends upon in its executable but only a reference to it,
which is then resolved at run-time by a dynamic linker that will load
the shared object in memory upon usage. On the opposite, static linking
means storing the libraries right inside the compiled executable program.

Dynamic linking is useful when many software rely on the same library,
thus only a single instance of the library has to be in memory at a time.
Executables sizes are also smaller, and when it is updated all programs
relying on it get the benefit (as long as the interfaces are the same).

So what does this have to do with distributions and package management.

Package managers in dynamic linking environment have to take care of
the versions of the libraries that are installed and which packages
depend on them. This can create issues if different packages rely on
different versions.

For this reason, some distro communities have chosen to get rid of dynamic
linking altogether and rely on static linking, at least for things that
are not related to the core system.

Another incidental advantage of static linking is that it doesn't have
to resolve dependencies with the dynamic linker, which makes it gain a
small boost in speed.

So static builds simplify the package management process. There
doesn't need to be a complex DAG because everything is self
contained. Additionally, this can allow to have multiple versions of the
same software installed alongside one another without conflicts. Updates
and rollbacks are not messy with static linking.

This gives rise to more containerised software, and continuing on this
path leads to market platforms such as Android and iOS where distribution
can be done by the individual software developers themselves, skipping the
middle-man altogether and giving the ability for increasingly impatient
users to always have the latest version that works for their current
OS. Everything is self-packaged.
However, this relies heavily on the trust of the
repository/marketplace. There needs to be many security mechanisms in
place to not allow rogue software to be uploaded. We'll talk more about
this when we come back to containers

This is great for users and, from a certain perspective, software
developers too as they can directly distribute pre-built packages,
especially when there's a stable ABI for the base system.

All this breaks the classic distribution scheme we're accustomed to on
the desktop.

Is it all roses and butterflies, though.

As we've said, packages take much more space with static linking, thus
wasting resources (storage, memory, power).
Moreover, because it's a model where software developers push directly
to users, this removes the filtering that distribution maintainers have
over the distro, and encourages licenses uncertainties. There's no more
overall philosophies that surrounds the distribution.
There's also the issue of library updates, the weight is on the software
developers to make sure they have no vulnerabilities or bugs in their
code. This adds a veil on which software uses what, all we see is the
end products.

From a software developer using this type of distribution perspective,
this adds extra steps to download the source code of each libraries their
software depends on, and build each one individually. Turning the system
into a source based distro.

Reproducibility

Because package management is increasingly becoming messier the past few
years, a new trend has emerged to put back a sense of order in all this,
reproducibility.

It has been inspired by the world of functional programming and the world
of containers. Package managers that respect reproducibility have each
of their builds asserted to always produce the same output.
They allow for packages of different versions to be installed alongside
one another, each living in its own tree, and it allows normal users
to install packages only them can access. Thus, many users can have
different packages.

They can be used as universal package managers, installed alongside any
other package managers without conflict.

The most prominent example is Nix and Guix, that use a purely functional
deployment model where software is installed into unique directories
generated through cryptographic hashes. Dependencies from each software
are included within each hash, solving the problem of dependency
hell. This approach to package management promises to generate more
reliable, reproducible, and portable packages.

Stateless and verifiable systems

The discussion about trust, portability, and reproducibility can also
be applied to the whole system itself.

When we talked about repositories as marketplaces, where software
developers push directly to it and the users have instant access to the
latest version, we said it was mandatory to have additional measures
for security.

One of them is to containerised, to sandbox every software. Having each
software run in their own space not affecting the rest of the system
resources. This removes the heavy burden of auditing and verifying each
and every software. Many solutions exist to achieve this sandboxing, from
docker, chroot, jails, firejail, selinux, cgroups, etc..

We could also distance the home directory of the users, making them
self-contained, never installing or modifying the globally accessible
places.

This could let us have the core of the system verifiable as it is not
changed, as it stays pristine. Making sure it's secure would be really
easy.

The idea of having the user part of the distro as atomic, movable,
containerized, and the rest reproducible is game changing. But again,
do we want to move to a world where every distro is interchangeable?

Do Distros matter with containers, virtualisation, and specific and universal package managers

It remains to be asked if distributions still have a role today with
all the containers, virtualisation, and specific and universal package
managers.

When it comes to containers, they are still very important as they most
often are the base of the stack the other components build upon.

The distribution is made up of people that work together to build and
distribute the software and make sure it works fine. It isn't the role
of the person managing the container and much more convenient for them
to rely on a distribution.

Another point, is that containers hide vulnerabilities, they aren't
checked after they are put together, while on the other hand, distribution
maintainers, have as a role to communicate and follow up on security
vulnerabilities and other bugs. Community is what solves daunting problems
that everyone shares.
A system administrator building containers can't possibly have the
knowledge to manage and builds hundreds of software and libraries and
ensure they work well together.

If packages are self-contained

Do distributions matter if packages are self-contained?

To an extent they do as they could be in this ecosystem the
providers/distributors of such universal self-contained packages. And
as we've said it is important to keep the philosophy of the distro and
offer a tested toolbox that fits the use case.

What's more probable is that we'll move to a world with multiple package
managers, each trusted for its specific space and purpose. Each with a
different source of philosophical and technical truth.

Programming language package management specific

This phenomena is already exploding in the world of programming language
package management.

The speed and granularity at which software is built today is almost
impossible to follow using the old method of packaging. The old software
release life cycle has been thrown out the window. Thus language-specific
tools were developed, not limited to installing libraries but also
software. We can now refer to the distribution offered package manager as
system-level and others as application-level or specific package managers.

Consequentially, the complexity and conflicts within a system has
exploded, and distribution package managers are finding it pointless
to manage and maintain anything that can already be installed via those
tools. Vice-versa, the specific tool makers are also not interested in
having what they provide included in distribution system-level package
managers.

Package managers that respect reproducibility, such as Nix, that we've
mentioned, handle such cases more cleanly as they respect the idea
of locality, everything residing withing a directory tree that isn't
maintained by the system-level package manager.

Again, same conclusion here, we're stuck with multiple package managers
that have different roles.

Going distro-less

A popular topic in the container world is "distro-less".

It's about replacing everything provided in a distribution, removing
it's customization, or building an image from scratch and maybe relying
on universal package managers or none.

The advantage of such containers is that they are really small and
targeted for a single purpose. This let the sysadmin have full control
of what happens on that box.

However, remember that there's a huge cost to controlling everything,
just like we mentioned earlier. This moves the burden upon the sysadmin
to manage and be responsible to keep up with bugs and security updates
instead of the distribution maintainers

Conclusion

With everything we've presented about distributions, I hope we now have
a clearer picture of what they are providing and their place in our
current times.

What's your opinion on this topic? Do you like the diversity? Which stack
would you use to build a distribution? What's your take on static builds,
having users upload their own software to the repo? Do you have a solution
to the trust issue? How do you see this evolve?

	jkl Offline \| 29-03-2020, 02:27 PM \| #12

Quote:A popular topic in the container world is "distro-less".

It's about replacing everything provided in a distribution, removing
it's customization, or building an image from scratch and maybe relying
on universal package managers or none.

Sounds like a standard Gentoo installation, but now with an additional hipster attitude.

--
<mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen

	venam Offline \| 29-03-2020, 02:54 PM \| #13

(29-03-2020, 02:27 PM)jkl Wrote: Sounds like a standard Gentoo installation, but now with an additional hipster attitude.

If people made money out of selling words, the hipsters would be rich. Wait..

	ckester Offline \| 29-03-2020, 08:54 PM \| #14

(31-01-2020, 09:41 AM)z3bra Wrote: Unfortunately, static linking is definitely not manageable nowadays. I gave it a try a few years back, and had a really bad time getting the compiler to behave as I would (gcc is a bitch here, really).

I'm curious to know what kind of problems you encountered. Like jkl, I prefer static linking except in (some, but not all) cases where a library is truly shared -- not just by a few programs that are out there in the wild, but actually running simultaneously on a typical machine (especially my own).

I.e., I think a lot of so-called "shared libraries" aren't shared at all. The worst examples of that are what I call vanity libraries, where somebody ships an .so that is never used by any programs but his own.

IIRC, the X libraries were the original motivation for shared libraries on Unix. Back in the days when disk space was still a constraint it made sense to share a single copy of those monsters.

	z3bra Offline \| 31-03-2020, 05:56 PM \| #15

(29-03-2020, 08:54 PM)ckester Wrote: I'm curious to know what kind of problems you encountered.

I tried to build my own Linux distro, statically linked against musl. GCC cannot be compiled statically anymore for example, just because of libstdc++, unless you're ready to loose your sanity and a full year dedicated to it.

Another thing is that the work required to patch stuff is tremendous. Static linking require that you link them in the dependency order for example, and nobody takes that into account anymore, so to compile it, you must patch it.
Except many programs now use auto-tools or cmake, which generate the makefiles for you. Handy, but don't bother with the order. Good luck patching that. They might make use of pkgconfig, except when they don't. And even then you will have to patch it because something else will be fucked.

	ckester Offline \| 01-04-2020, 01:08 AM \| #16

Ah, I see. I agree, it's nearly impossible nowadays to take static linking to an absolute extreme. I think even the suckless gang's distros (stali, morpheus, etc.) have accepted the need for some things to be dynamically linked.

All I can say is that I statically link everything I can, and prefer programs that can be over those that can't.

Don't get me started on the abomination that is cmake!

	jkl Offline \| 01-04-2020, 03:56 AM \| #17

One of the good things in the rise of Go is that it usually produces statically linked binaries, just like Pascal did/does. As some people move on from C to Go, the future could be interesting.

--
<mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen

	z3bra Offline \| 02-04-2020, 06:47 PM \| #18

(01-04-2020, 03:56 AM)jkl Wrote: One of the good things in the rise of Go is that it usually produces statically linked binaries, just like Pascal did/does. As some people move on from C to Go, the future could be interesting.

This is totally true, and one of the main reasons I like Go. What kills me though is that the people praising Go's « compile a single binary and run it everywhere » are sometimes the same persons arguing that static linking in C is bad and stupid because sharing libraries is better...

I wish people would reconsider static linking in C thanks to Go.

	josuah Offline \| 08-04-2020, 03:22 PM \| #19

It is surprising that the recent compiled languages go static by default, given that they are "dependency forest languages" : it is really easy to go get some go-get-maintained package rather than packaging it by hand or wait someone else package it for you, and then use it.

So this makes dependencies use other dependencies for free, as the whole chain is automatic and works. Yet the whole thing uses static binaries.

Does the "use dynamic libraries for easier updates" argument still stands when we have dependency-heavy compiled languages that still go static?

	jkl Offline \| 08-04-2020, 03:43 PM \| #20

Define “dependency-heavy”?

--
<mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen

	josuah Offline \| 08-04-2020, 04:06 PM \| #21

jkl: I'd say 2 options combined:

when a package.lock has more than 10 entries and the author says "minimal dependencies"
when a library that pull dependencies on another library is a common practice.

It is not necessarily a bad thing, as long as you do not end-up with string padding libraries. Tools are not good or bad on their own after all.

	josuah Offline \| 08-04-2020, 04:41 PM \| #22

One tip for doing static linking :

It is terribly hard to configure it, so instead of configuring it, it is possible to /not/ build the shared libraries, and only expose the libsomething.a to the compiler, which will pick it.

That is exactly the same compiler flags! it just picks what's available to fulfill -lsomething among what is there. And given ./configure and other autotools works by launching cc nad check the outcome rather than stating out the presence of the files themself, the libsomething.a wil survive through all the piping of the autotools and the craziest of the makefiles.

Yes, you have to configure it from the library's rather than though the program's package, but it works _nicely_!

$ ldd $(which curl)
linux-vdso.so.1 (0x00007ffd9b5f0000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f5fc033b000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f5fc031a000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5fc0159000)
/lib64/ld-linux-x86-64.so.2 (0x00007f5fc0694000)

$ ldd $(which gpg2)
linux-vdso.so.1 (0x00007fff5f738000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2baafad000)
/lib64/ld-linux-x86-64.so.2 (0x00007f2bab4d8000)

Yes, these are not "true static binaries", but looks at the pile of dependency for compiling gpg2 and look-ma-no-configure-flags and still have all of the libraries statically linked !

Next step is to rm libc.so... Hmm, I'll wait a bit if you do not mind. ;)

[edit] BTW, the ./configure flags in autotool style are --enable-shared=no --enable-static=yes [/edit]

	z3bra Offline \| 09-04-2020, 05:22 AM \| #23

(08-04-2020, 04:41 PM)josuah Wrote: BTW, the ./configure flags in autotool style are --enable-shared=no --enable-static=yes

Because it's there does not mean it works. I used that to build around ~50 softwares, and over half of them ended up being dynamically linked.

	josuah Offline \| 09-04-2020, 10:29 AM \| #24

@z3bra: yes, the state is not so bright regarding static linking, and from what you says, you have the bagguage to say that.

But these flags are really meant for the library package, not for the binary program's package :

If no .so ever get built and find / -name '*.so' -o -name '*.so.*' | wc == 0, then if a binary ever comes out, it will not be a dynamic one.

	josuah Offline \| 18-04-2020, 01:48 AM \| #25

(08-04-2020, 03:43 PM)jkl Wrote: Define “dependency-heavy”?

Another definition: When it takes more than 500Mb of memory to compile one binary:

Code:
# github.com/42wim/matterbridge/vendor/[quote="josuah" pid="20594" dateline="1586438961"]

fatal error: runtime: out of memory

But this is normal: this is matterbridge: making the greatest effort to support as many protocols as possible, so *obviously* it is taking a lot of dependencies.

I really like out of memory operations for a single step, a hard limit due to the algorythm and not the load.

It "reminds" me an epoch I did not know, but had a sense through https://en.wikipedia.org/wiki/Black_Mirr...ndersnatch, where resources were limited and you had to work around it (P.S.: heh, no I don't do low-level video graphics, and I am merely as dumb as when I did join that forum for the first time).

	dcli Offline \| 18-04-2020, 11:43 AM \| #26

i use slackware, which has no out-of-the-box package management. several platforms exist - slackbuilds, most notably - but there is no dependency resolution amongst the different platforms. so, i absolutely think twice about building anything big unless i feel like sitting around for a few hours and taking care of the whole dependency tree.

	ckester Offline \| 15-05-2020, 02:27 PM \| #27

(08-04-2020, 03:43 PM)jkl Wrote: Define “dependency-heavy”?

I think he meant something like "uses code from external sources (other than what is provided by the language's standard library or libraries)".

As opposed to "batteries included".

If I understand correctly, updating a statically-built program to incorporate changes in any of its dependencies still requires a rebuild. Otherwise it will still use the old code. With dynamic linking the updates can be transparent (unless the major version changes, usually because the API does), and the using program doesn't always need to be recompiled.

It does put the burden on the static program's maintainer to keep track of changes in the dependencies, but personally I prefer getting the opportunity to test their effects on my code rather than having them deployed "behind my back".

(Edit: Somehow I had missed page 3 of this thread when I wrote this reply. Oh well.)

(Edit2: actually the maintainer of a dynamically-linked program ought to also be tracking changes in the libraries it depends on and testing that they don't adversely impact his program. In my time as a FreeBSD port maintainer I learned that the ports management team only checks that an upgrade of a library doesn't break the build of the programs which depend on it; as far as I know they don't do even the most rudimentary "smoke" test of those programs, let alone more detailed functional or other tests. I expect the same is true for the various package management systems. So the difference between static and dynamic linking is really a wash as far as upgrades go, assuming conscientious maintainers in both cases. Which might be a BIG assumption.)

	venam Offline \| 23-07-2020, 02:03 AM \| #28

I've recorded this episode as a podcast, you can find it in the parent post of the thread.

	jkl Offline \| 10-11-2020, 12:08 PM \| #29

As stated on the IRC, I played a bit with Conan. As it turns out, it is a really nice addition to CMake (which is what it supports best): One could ship a source package that pulls, builds and uses all dependencies one could ever need, even on systems with no own package manager. Not bad, really.

I migrated the ymarks server from my own build script with a large directory of dependencies, dynamically adding cJSON and SQLite3 - and it works just fine. Awesome, really.

The only third-party library still shipped is a header-only web server which isn’t in Conan. It does not have to - one more file is acceptable to me. :)

--
<mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen

	stratex Offline \| 12-11-2020, 08:24 PM \| #30

My thoughts on package management, and overall system complexity:
Millions of developers are writing code, then many other developers are writing more code which relies on top of already written code, each passing day the complexity of the system increases, but our ability to manage it or even comprehend in its entirety decrease drastically.
You can try to use some chroot magic, essentially making some kind of software API to your system, while you yourself sit in a 'clean' and tidy environment, but what does it solves really? The junk is still on your hard drive, it doesn't matter how you run it directly or not.
Another option is to do it your own way, only clean code, only most sane projects, but then good luck with compatibility, or even finishing such monumental project on your own, it will just make you an elite autist while the rest of GNU/Linux will carry on without you.
Third option is to just accept your destiny, there are no ideals in this world. I personally think that sooner or later the whole software stack will 'explode' into our faces, people will die or lose money left and right because of terrible code, or some bug from 2026 in a 9-level deep library/framework system build on top of ChromiumOS. And it will be perceived as absolutely normal situation, or people will outrage, and ask government to step it, I would love to see how the government will regulate and review 500million of LOC. At least programmers are not going to be unemployed for the very, very long time.

1 2 Next »

View a Printable Version