The role of distributions &/or Unix flavors, where does pkg management stands - Psychology, Philosophy, and Licenses
Users browsing this thread: 1 Guest(s)
|
|||
Hello fellow nixers,
This thread is about launching one of those discussion podcast. The topic this time is: What's The role of distributions &/or Unix flavors, where does package management stands in all that? EDIT: It has finally been posted here. Link of the recording [ https://github.com/nixers-projects/podca...3?raw=true http://podcast.nixers.net/feed/download....07-221.mp3 ] We'll try to schedule it for next week, so hop on the scheduler interface and put all the hours you might be available so that we can choose the best common denominator: https://podcast.nixers.net If you don't have a key you can PM me anywhere for one. Relevant threads and articles for your personal research might be found in: Issue #63 of the newsletter ( https://newsletter.nixers.net/entries.php#63 ) and this week issue #65 the thread "pkg management, what do you expect, what you wished you had" ( https://nixers.net/showthread.php?tid=1883 ) "GoboLinux and Package Management" https://nixers.net/showthread.php?tid=2049 And, obviously, much more, like all https://nixos.org/ or other type of package management or distro management. More questions to think about: What is expected from a distro or Unix-flavor The "from scratch" approach, advantages? What's the role of package manager? What's the role of maintainers? What is the current issue, containers, mini-language-specific-modules, etc..? Keep your ideas to yourself before the discussion actually take place in the podcast. EDIT: It has finally been posted here. |
|||
|
|||
This topic wasn't actually discussed so far and it keeps bugging me as the world of package management and distributions go further apart.
When I get a bit more time I'll come back to it. Meanwhile let's mention for those going to FOSDEM that there's a "Dependency Management" devroom that's going to discuss such topic. |
|||
|
|||
I wish there was one way to manage dependencies for all platforms.
-- <mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen |
|||
|
|||
(18-01-2020, 07:34 PM)jkl Wrote: I wish there was one way to manage dependencies for all platforms. pkgsrc is a step toward that. It is the package manager from netbsd, but can be used on many (posix) platforms, including minix. Package management is a complex topic as different people have different needs. The only assumption you can have is that everyone will want the manager to not get in their way. This is why debian has the "apt" frontend for dpkg, or yum/dnf for rpms. This is why most of the time "$TOOL install/remove whatever" will work as expected. They follow the principle of least surprise. Unfortunately, such simplicity in the interface/usage comes at the cost of complexity on the packaging side. The packages are cluttered with metadata, pre/post install scripts and so on and so forth. This is where the user steps in, as some "power users" will want better control over what they install and have the ability to easily review packages amd their dependencies. I personally care more about having a simple packaging format, than a good dependency handling, mostly because I prefer software that have the least amount of dependencies. I made my own package manager for this purpose, because it lets me review the softwareI fetch, package jt the way I want and install it where I want, with the privileges I set (most of my tools are installed under my UID, in $HOME/.local). This obviously comes at the cost of having to fetch updates manually, which (for now) is a bit of a burden. But the simplicity of.packaging overweight this for me. |
|||
|
|||
|
|||
Linux is one platform though.
-- <mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen |
|||
|
|||
Snap is easy to use, but a pain to manage. This is also a huge step backward IMO, as you put the packaging in the hands of the developer. This os the same nonsense as letting devs push to production directly.
|
|||
|
|||
Most of the packing drama comes from the use of dynamic linking; otherwise converting the packages would be more than enough (or patching pkgsrc to build packages). The "new" "solutions" flatpack, appimage, and snap are basically the return to an inferior static linking (sadly, such a model is encouraged by licenses uncertainties). For me, the natural progression of the current insanity is solutions like Nix and Guix.
|
|||
|
|||
Static linking avoids dependency hell. I actually prefer that.
-- <mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen |
|||
|
|||
Unfortunately, static linking is definitely not manageable nowadays. I gave it a try a few years back, and had a really bad time getting the compiler to behave as I would (gcc is a bitch here, really).
I do agree though that all the new "packaging methods" are really badly done, as you end up with devs shipping their own library versions along with the packages, and you are then dependent on them to push new snaps/flatpack/titi/kaka/whatever when a new patch is needed. I guess they finally managed to port DLL hell to Linux. This is our future now. |
|||
|
|||
I finally got the time to write something about this. I thought of recording a podcast but I found it easier to simply post the content here in text/blog form. So here we go, I hope you enjoy the research.
What is a distribution What are software distributions? You may think you know everything there is to know about the term software distribution, but take a moment to think about it, take a step back and try to see the big picture. We often have in mind the thousands of Linux distributions when we hear it, however, this is far from limited to Linux, BSD, Berkeley Software Distribution, has software distribution right in the name. Android, and iOS are software distributions too. Actually, it's so prevalent, we may have stopped paying attention to the concept. We find it hard to put a definition together. There's definitely the part about distributing software in it. Software that may be commercial or not, open source or not. To understand it better maybe investigating what problems software distributions address would clear things up. Let's imagine a world before software distributions, does that world exist? A world where software stays within boundaries, not shared with anyone outside of it. Once we break these boundaries and we want to share it, we'll find that we have to package all the software together in a meaningful way, configure them so that they work well together, adding some glue in between when necessary, find the appropriate medium to distribute the bundle, get it all from one end to another safely, make sure it installs properly, and follow up on it. Thus, software distribution is about the mechanism and the community that takes the burden and decisions to build an assemblage of coherent software that can be shipped. The operating system, or kernel if you like, could be, and is often, part of the collage offered, a software just like others. The people behind it are called distribution maintainers, or package maintainers. Their role vary widely, they could write the software that stores all the packages called the repository, maintain a package manager with its format, maintain a full operating system installer, package and upload software they built or that someone else built on a specific time frame/life cycle, make sure there aren't any malicious code uploaded on the repository, follow up on the latest security issues and bug reports, fix third party software to fit the distribution philosophical choices and configurations, and most importantly test, plan, and make sure everything holds up together. These maintainers are the source of trust of the distribution, they take responsibility for it. In fact, I think it's more accurate to call them distributors. Different ways to approach it There's so many distributions it can make your head spin. The software world is booming, especially the open source one. For instance, we can find bifurcations of distributions that get copied by new maintainers and divert. This creates a tree like aspect, a genealogy of both common ancestors and/or influences in technical and philosophical choices. Overall, we now have a vibrant ecosystem where a thing learned on a branch can help a completely unrelated leaf on another tree. There's something for everyone. Target and speciality So what could be so different between all those software distributions, why not have a single platform that everyone can build on. One thing is specialization and differentiation. Each distro caters to a different audience and is built by a community with its philosophy. Let's go over some of them:
That all comes to the raison d'être, the philosophy of the distribution, it guides every decision the maintainers have to make. It guides how they configure every software, how they think about security, portability, comprehensiveness. For example, if a distribution cares about free software, it's going to be strict about what software it includes and what licenses it allows in its repository, having software to check the consistency of licenses in the core. Another example is if their goal is to target a desktop audience then internationalization, ease of use, user friendliness, having a large number of packages, is going to be prioritized. While, again, if the target is a real time embedded device, the size of the kernel is going to be small, configured and optimized for this purpose, and limiting and choosing the appropriate packages that work in this environment. Or if it's targeted at advanced users that love having control of their machine, the maintainers will choose to let the users make most of the decisions, providing as many packages as possible with the latest version possible, with a loosely way to install the distribution, having a lot of libraries and software development tools. What this means is that a distribution does anything it can to provide sane defaults that fit its mindset. It composes and configures a layer of components, a stack of software. The layering Distribution maintainers often have at their disposition different blocks and the ability to choose them, stacking them to create a unit we call a software distribution. There's a range of approaches to this, they could choose to have more, or less, included in what they consider the core of the distribution and what is externally less important to it. Moreover, sometimes they might even leave the core very small and loose, instead providing the glue software that makes it easy for the users to choose and swap the blocks at specific stages in time: installation, run time, maintenance mode, etc.. So what are those blocks of interdependent components. The first part is the method of installation, this is what everything hinges on, the starting point. The second part is the kernel, the real core of all operating systems today. But that doesn't mean that the distribution has to enforce it. Some distributions may go as far as to provide multiple kernels specialised in different things or none at all. The third part is the filesystem and file hierarchy, the component that manages where and how files are spread out on the physical or virtual hardware. This could be a mix and match where sections of the file system tree are stored on separate filesystems. The fourth part is the init system, PID 1. This choice has generated a lot of contention these days. PID 1 being the mother process of all other processes on the system. What role it has and what functionalities it should include is a subject of debate. The fifth part is composed of the shell utilities, what we sometimes refer to as the userland or user space, as its the first layer the user can directly interface with to have control of the operating system, the place where processes run. The userland implementations on Unix-based systems usually tries to follow the POSIX standard. There are many such implementations, also subject of contention. The sixth part is made up of services and their management. The daemons, long running processes that keep the system in order. Many argue if the management functionality should be part of the init system or not. The seventh part is documentation. Often it is forgotten but it is still very important. The last part is about everything else, all the user interfaces and utilities a user can have and ways to manage them on the system. Stable releases vs Rolling There exists a spectrum on which distributions place themselves when it comes to keeping up to date with the versions of the software they provide. This most often applies to external third party open source software. The spectrum is the following: Do we allow the users to always have the latest version of every software while running the risk of accidentally breaking their system, what we call bleeding edge or rolling distro, or do we take a more conservative approach and take the time to test every software properly before allowing it in the repository, while not having all the latest updates, features, and optimizations of those software, what we call release based distro. The extreme of the first scenario would be to let users directly download from the software vendor/creator source code repository, or the opposite, let the software vendor/creator push directly to the distribution repository. Which could easily break or conflict with the user's system or lead to security vulnerability. We'll come back to this later, as this could be avoided if the software runs in a containerized environment. When it comes to release distributions, it usually involves having a long term support stable version that keeps receiving and syncing with the necessary security updates and bug fixes on the long run while having another version running a bit ahead testing the future changes. On specific time frames, users can jump to the latest release of the distribution, which may involve a lot of changes in both configuration and software. Some distributions decide they may want to break ABI or API of the kernel upon major releases, that means that everything in the system needs to be rebuilt and reinstalled. The release cycle, and the rate of updates is really a spectrum. When it comes to updates, in both cases, the distribution maintainers have to decide how to communicate and handle them. How to let the users know what changes. If a user configuration was swapped for a new one or merged with the new one, or copied aside. Communication is essential, be it through official channels, logging, mails, etc.. Communication needs to be bi-directional, users report bugs and maintainers posts what their decisions are and if users need to be involved in them. This creates the community around the distribution. Rolling releases require intensive efforts from package maintainers as they constantly have to keep up with software developers. Especially when it comes to the thousands of newest libraries that are part of recent programming languages and that keep on increasing. Various users will want precise things out of a system. Enterprise environments and mission critical tasks will prefer stable releases, and software developers or normal end users may prefer to have the ability to use the latest current software. Interdistribution standard With all this, can't there be an interdistribution standard that creates order, and would we want such standard. At the user level, the differences are not always noticeable, most of the time everything seems to work as Unix systems are expected to work. There's no real standard between distributions other than that they are more or less following the POSIX standards. Within the Linux ecosystem, the Free Standards Group tries to improve interoperability of software by fixing a common Linux ABI, file system hierarchy, naming conventions, and more. But that's just the tip of the iceberg when it comes to having something that works interdistributions. Furthermore, each part of the layering we've seen before could be said to have its own standards: There are desktop interoperability standards, filesystem standards, networking standards, security standards, etc.. The biggest player right now when it comes to this is systemd in association with the free desktop group, it tries to create (force) an interdistribution standard for Linux distribution. But again, the big Question: Do we actually want such inter-distribution standards, can't we be happy with the mix and match we currently have. Would we profit from such thing? The package manager and packaging Let's now pay attention to the package themselves, how we store them, how we give secure access to them, how we are able to search amongst them, download them, install them, remove them, and anything related to their local management, versioning, and configuration. Method of distribution How do we distribute software, share them, what's the front-end to this process. First of all, where do we store this software. Historically and still today, software can be shared via physical medium such as CD-ROM, DVD, USBs, etc.. This is common when it comes to proprietary vendors to have the distribution come with a piece of hardware they are selling, it's also common for the procurement of the initial installation image. However, with today's hectic software growth, using a physical medium isn't flexible. Sharing over the internet is more convenient, be it via FTP, HTTP, HTTPS, a publicly available svn or git repo, via central website hubs such as Github or appliation stores such the ones Apple and Google provide. A requirement is that the storage and the communication to it should be secure, reliable against failures, and accessible from anywhere. Thus, replication is often done to avoid failures but also to have a sort of edge network speeding effect across the world, load balancing. Replication could be done in multiple ways, it could be a P2P distributed system for instance. How we store it and in what format is up to the repository maintainers. Usually, this is a file system with a software API users can interact with over the wire. Two main format strategies exist: source based repositories and binary repositories. Second of all, who can upload and manage the host of packages. Who has the right to replicate the repository. As a source of truth for the users, it is important to make sure the packages have been verified and secured before being accepted on the repository. Many distribution have the maintainers be the only ones that are able to do this. Giving them cryptographic keys to sign packages and validate them. Others have their own users build the packages, send them to a central hub for automatic or manual verification and then uploaded to the repository. Each user having their own cryptographic key for signature verification. This comes down to an issue of trust and stability. Having the users upload packages isn't always feasible when using binary packages if the individual packages are not containerized properly. There's a third option, the road in between, having the two types, the core managed by the official distribution maintainers and the rest by its user community. Finally, the packages reach the user. How the user interact with the repository locally and remotely depends on the package management choices. Do users cache a version of the remote repository, like is common with the BSD port tree system. How flexible can it be to track updates, locking versions of software, allowing downgrades. Can users download from different sources. Can users have multiple version of the same software on the their machine. Format As we've said there are two main philosophy of software sharing format: source code port-style and pre-built binary packages. The software that manages those on the user side is called the package manager, it's the link with the repository. Though, in source based repo I'm not sure we can call them this way, but regardless I'll still refer to them as such. Many distributions create their own or reuse a popular one. It does the search, download, install, update, and removal of local software. It's not a small task. The rule of the book is that if it isn't installed by the package manager then it won't be aware of its existence. Noting that distributions don't have to be limited to a single package manager, there could be many. Each package manager relies on a specific format and metadata to be able to manage software, be it source or binary formatted. This format can be composed of a group of files or a single binary file with specific information segments that together create recipes that help throughout its lifecycle. Some are easier to put together than others, incidentally allowing more user contributions. Here's a list of common information that the package manager needs:
So what's the advantage of having pre-compiled binary packages instead of cloning the source code and compiling ourselves. Won't that remove a burden from package maintainers. One advantage is that pre-compiled packages are convenient, it's easier to download them and run them instantly. It's also hard, if not impossible, these days, and energy intensive, to compile huge software such as web browsers. Another point, is that proprietary software are often already distributed as binary packages, which would creates a mix of source and binary packages. Binary formats are also space efficient as the code is stored in a compressed archived format. For example: APK, Deb, Nix, ORB, PKG, RPM, Snap, pkg.tar.gz/xz, etc.. Some package managers may also choose to leave the choice of compression up to the user and dynamically discern from its configuration file how to decompress packages. Let's add that there exists tools, such as "Alien", that facilitate the job of package maintainers by converting from one binary package format to another. Conflict resolution & Dependencies management Resolving dependencies One of the hardest job of the package manager is to resolve dependencies. A package manager has to keep a list of all the packages and their versions that are currently installed on the system and their dependencies. When the user wants to install a package, it has to take as input the list of dependencies of that package, compare it against the one it already has and output a list of what needs to be installed in an order that satisfies all dependencies. This is a problem that is commonly encountered in the software development world with build automation utilities such as make. The tool creates a directed acyclic graph (DAG), and using the power of graph theory and the acyclic dependencies principle (ADP) tries to find the right order. If no solution is found, or if there are conflicts or cycles in the graph, the action should be aborted. The same applies in reverse, upon removal of the package. We have to make a decision, do we remove all the other packages that were installed as a dependency of that single one. What if newer packages depend on those dependencies, should we only allow the removal of the unused dependencies. This is a hard problem, indeed. Versioning This problem increases when we add the factor of versioning to the mix, if we allow multiple versions of the same software to be installed on the system. If we don't, but allow switching from one version to another, do we also switch all other packages that depend on it too. Versioning applies everywhere, not only to packages but to release versions of the distribution too. A lot of them attach certain version of packages to specific releases, and consequentially releases may have different repositories. The choice of naming conventions also plays a role, it should convey to users what they are about and if any changes happened. Should the package maintainer follow the naming convention of the software developer or should they use their own. What if the name of two software conflict with one another, this makes it impossible to have it in the repo, some extra information needs to be added. Do we rely on semantic versioning, major, minor, patch, or do we rely on names like so many distributions releases do (toy story, deserts, etc..), or do we rely on the date it was released, or maybe simply an incremental number. All those convey meaning to the user when they search and update packages from the repository. Static vs dynamic linking One thing that may not apply to source based distro, is the decision between building packages as statically linked to libraries or dynamically linked. Dynamic linking is the process in which a program chooses not to include a library it depends upon in its executable but only a reference to it, which is then resolved at run-time by a dynamic linker that will load the shared object in memory upon usage. On the opposite, static linking means storing the libraries right inside the compiled executable program. Dynamic linking is useful when many software rely on the same library, thus only a single instance of the library has to be in memory at a time. Executables sizes are also smaller, and when it is updated all programs relying on it get the benefit (as long as the interfaces are the same). So what does this have to do with distributions and package management. Package managers in dynamic linking environment have to take care of the versions of the libraries that are installed and which packages depend on them. This can create issues if different packages rely on different versions. For this reason, some distro communities have chosen to get rid of dynamic linking altogether and rely on static linking, at least for things that are not related to the core system. Another incidental advantage of static linking is that it doesn't have to resolve dependencies with the dynamic linker, which makes it gain a small boost in speed. So static builds simplify the package management process. There doesn't need to be a complex DAG because everything is self contained. Additionally, this can allow to have multiple versions of the same software installed alongside one another without conflicts. Updates and rollbacks are not messy with static linking. This gives rise to more containerised software, and continuing on this path leads to market platforms such as Android and iOS where distribution can be done by the individual software developers themselves, skipping the middle-man altogether and giving the ability for increasingly impatient users to always have the latest version that works for their current OS. Everything is self-packaged. However, this relies heavily on the trust of the repository/marketplace. There needs to be many security mechanisms in place to not allow rogue software to be uploaded. We'll talk more about this when we come back to containers This is great for users and, from a certain perspective, software developers too as they can directly distribute pre-built packages, especially when there's a stable ABI for the base system. All this breaks the classic distribution scheme we're accustomed to on the desktop. Is it all roses and butterflies, though. As we've said, packages take much more space with static linking, thus wasting resources (storage, memory, power). Moreover, because it's a model where software developers push directly to users, this removes the filtering that distribution maintainers have over the distro, and encourages licenses uncertainties. There's no more overall philosophies that surrounds the distribution. There's also the issue of library updates, the weight is on the software developers to make sure they have no vulnerabilities or bugs in their code. This adds a veil on which software uses what, all we see is the end products. From a software developer using this type of distribution perspective, this adds extra steps to download the source code of each libraries their software depends on, and build each one individually. Turning the system into a source based distro. Reproducibility Because package management is increasingly becoming messier the past few years, a new trend has emerged to put back a sense of order in all this, reproducibility. It has been inspired by the world of functional programming and the world of containers. Package managers that respect reproducibility have each of their builds asserted to always produce the same output. They allow for packages of different versions to be installed alongside one another, each living in its own tree, and it allows normal users to install packages only them can access. Thus, many users can have different packages. They can be used as universal package managers, installed alongside any other package managers without conflict. The most prominent example is Nix and Guix, that use a purely functional deployment model where software is installed into unique directories generated through cryptographic hashes. Dependencies from each software are included within each hash, solving the problem of dependency hell. This approach to package management promises to generate more reliable, reproducible, and portable packages. Stateless and verifiable systems The discussion about trust, portability, and reproducibility can also be applied to the whole system itself. When we talked about repositories as marketplaces, where software developers push directly to it and the users have instant access to the latest version, we said it was mandatory to have additional measures for security. One of them is to containerised, to sandbox every software. Having each software run in their own space not affecting the rest of the system resources. This removes the heavy burden of auditing and verifying each and every software. Many solutions exist to achieve this sandboxing, from docker, chroot, jails, firejail, selinux, cgroups, etc.. We could also distance the home directory of the users, making them self-contained, never installing or modifying the globally accessible places. This could let us have the core of the system verifiable as it is not changed, as it stays pristine. Making sure it's secure would be really easy. The idea of having the user part of the distro as atomic, movable, containerized, and the rest reproducible is game changing. But again, do we want to move to a world where every distro is interchangeable? Do Distros matter with containers, virtualisation, and specific and universal package managers It remains to be asked if distributions still have a role today with all the containers, virtualisation, and specific and universal package managers. When it comes to containers, they are still very important as they most often are the base of the stack the other components build upon. The distribution is made up of people that work together to build and distribute the software and make sure it works fine. It isn't the role of the person managing the container and much more convenient for them to rely on a distribution. Another point, is that containers hide vulnerabilities, they aren't checked after they are put together, while on the other hand, distribution maintainers, have as a role to communicate and follow up on security vulnerabilities and other bugs. Community is what solves daunting problems that everyone shares. A system administrator building containers can't possibly have the knowledge to manage and builds hundreds of software and libraries and ensure they work well together. If packages are self-contained Do distributions matter if packages are self-contained? To an extent they do as they could be in this ecosystem the providers/distributors of such universal self-contained packages. And as we've said it is important to keep the philosophy of the distro and offer a tested toolbox that fits the use case. What's more probable is that we'll move to a world with multiple package managers, each trusted for its specific space and purpose. Each with a different source of philosophical and technical truth. Programming language package management specific This phenomena is already exploding in the world of programming language package management. The speed and granularity at which software is built today is almost impossible to follow using the old method of packaging. The old software release life cycle has been thrown out the window. Thus language-specific tools were developed, not limited to installing libraries but also software. We can now refer to the distribution offered package manager as system-level and others as application-level or specific package managers. Consequentially, the complexity and conflicts within a system has exploded, and distribution package managers are finding it pointless to manage and maintain anything that can already be installed via those tools. Vice-versa, the specific tool makers are also not interested in having what they provide included in distribution system-level package managers. Package managers that respect reproducibility, such as Nix, that we've mentioned, handle such cases more cleanly as they respect the idea of locality, everything residing withing a directory tree that isn't maintained by the system-level package manager. Again, same conclusion here, we're stuck with multiple package managers that have different roles. Going distro-less A popular topic in the container world is "distro-less". It's about replacing everything provided in a distribution, removing it's customization, or building an image from scratch and maybe relying on universal package managers or none. The advantage of such containers is that they are really small and targeted for a single purpose. This let the sysadmin have full control of what happens on that box. However, remember that there's a huge cost to controlling everything, just like we mentioned earlier. This moves the burden upon the sysadmin to manage and be responsible to keep up with bugs and security updates instead of the distribution maintainers Conclusion With everything we've presented about distributions, I hope we now have a clearer picture of what they are providing and their place in our current times. What's your opinion on this topic? Do you like the diversity? Which stack would you use to build a distribution? What's your take on static builds, having users upload their own software to the repo? Do you have a solution to the trust issue? How do you see this evolve? |
|||
|
|||
Quote:A popular topic in the container world is "distro-less". Sounds like a standard Gentoo installation, but now with an additional hipster attitude. -- <mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen |
|||
|
|||
|
|||
(31-01-2020, 09:41 AM)z3bra Wrote: Unfortunately, static linking is definitely not manageable nowadays. I gave it a try a few years back, and had a really bad time getting the compiler to behave as I would (gcc is a bitch here, really). I'm curious to know what kind of problems you encountered. Like jkl, I prefer static linking except in (some, but not all) cases where a library is truly shared -- not just by a few programs that are out there in the wild, but actually running simultaneously on a typical machine (especially my own). I.e., I think a lot of so-called "shared libraries" aren't shared at all. The worst examples of that are what I call vanity libraries, where somebody ships an .so that is never used by any programs but his own. IIRC, the X libraries were the original motivation for shared libraries on Unix. Back in the days when disk space was still a constraint it made sense to share a single copy of those monsters. |
|||
|
|||
(29-03-2020, 08:54 PM)ckester Wrote: I'm curious to know what kind of problems you encountered. I tried to build my own Linux distro, statically linked against musl. GCC cannot be compiled statically anymore for example, just because of libstdc++, unless you're ready to loose your sanity and a full year dedicated to it. Another thing is that the work required to patch stuff is tremendous. Static linking require that you link them in the dependency order for example, and nobody takes that into account anymore, so to compile it, you must patch it. Except many programs now use auto-tools or cmake, which generate the makefiles for you. Handy, but don't bother with the order. Good luck patching that. They might make use of pkgconfig, except when they don't. And even then you will have to patch it because something else will be fucked. |
|||
|
|||
Ah, I see. I agree, it's nearly impossible nowadays to take static linking to an absolute extreme. I think even the suckless gang's distros (stali, morpheus, etc.) have accepted the need for some things to be dynamically linked.
All I can say is that I statically link everything I can, and prefer programs that can be over those that can't. Don't get me started on the abomination that is cmake! |
|||
|
|||
One of the good things in the rise of Go is that it usually produces statically linked binaries, just like Pascal did/does. As some people move on from C to Go, the future could be interesting.
-- <mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen |
|||
|
|||
(01-04-2020, 03:56 AM)jkl Wrote: One of the good things in the rise of Go is that it usually produces statically linked binaries, just like Pascal did/does. As some people move on from C to Go, the future could be interesting. This is totally true, and one of the main reasons I like Go. What kills me though is that the people praising Go's « compile a single binary and run it everywhere » are sometimes the same persons arguing that static linking in C is bad and stupid because sharing libraries is better... I wish people would reconsider static linking in C thanks to Go. |
|||
|
|||
It is surprising that the recent compiled languages go static by default, given that they are "dependency forest languages" : it is really easy to go get some go-get-maintained package rather than packaging it by hand or wait someone else package it for you, and then use it.
So this makes dependencies use other dependencies for free, as the whole chain is automatic and works. Yet the whole thing uses static binaries. Does the "use dynamic libraries for easier updates" argument still stands when we have dependency-heavy compiled languages that still go static? |
|||
|
|||
Define “dependency-heavy”?
-- <mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen |
|||
|
|||
jkl: I'd say 2 options combined:
when a package.lock has more than 10 entries and the author says "minimal dependencies" when a library that pull dependencies on another library is a common practice. It is not necessarily a bad thing, as long as you do not end-up with string padding libraries. Tools are not good or bad on their own after all. |
|||
|
|||
One tip for doing static linking :
It is terribly hard to configure it, so instead of configuring it, it is possible to /not/ build the shared libraries, and only expose the libsomething.a to the compiler, which will pick it. That is exactly the same compiler flags! it just picks what's available to fulfill -lsomething among what is there. And given ./configure and other autotools works by launching cc nad check the outcome rather than stating out the presence of the files themself, the libsomething.a wil survive through all the piping of the autotools and the craziest of the makefiles. Yes, you have to configure it from the library's rather than though the program's package, but it works _nicely_! $ ldd $(which curl) linux-vdso.so.1 (0x00007ffd9b5f0000) libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f5fc033b000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f5fc031a000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5fc0159000) /lib64/ld-linux-x86-64.so.2 (0x00007f5fc0694000) $ ldd $(which gpg2) linux-vdso.so.1 (0x00007fff5f738000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2baafad000) /lib64/ld-linux-x86-64.so.2 (0x00007f2bab4d8000) Yes, these are not "true static binaries", but looks at the pile of dependency for compiling gpg2 and look-ma-no-configure-flags and still have all of the libraries statically linked ! Next step is to rm libc.so... Hmm, I'll wait a bit if you do not mind. ;) [edit] BTW, the ./configure flags in autotool style are --enable-shared=no --enable-static=yes [/edit] |
|||
|
|||
|
|||
@z3bra: yes, the state is not so bright regarding static linking, and from what you says, you have the bagguage to say that.
But these flags are really meant for the library package, not for the binary program's package : If no .so ever get built and find / -name '*.so' -o -name '*.so.*' | wc == 0, then if a binary ever comes out, it will not be a dynamic one. |
|||
|
|||
(08-04-2020, 03:43 PM)jkl Wrote: Define “dependency-heavy”? Another definition: When it takes more than 500Mb of memory to compile one binary: Code: # github.com/42wim/matterbridge/vendor/[quote="josuah" pid="20594" dateline="1586438961"] But this is normal: this is matterbridge: making the greatest effort to support as many protocols as possible, so *obviously* it is taking a lot of dependencies. I really like out of memory operations for a single step, a hard limit due to the algorythm and not the load. It "reminds" me an epoch I did not know, but had a sense through https://en.wikipedia.org/wiki/Black_Mirr...ndersnatch, where resources were limited and you had to work around it (P.S.: heh, no I don't do low-level video graphics, and I am merely as dumb as when I did join that forum for the first time). |
|||
|
|||
i use slackware, which has no out-of-the-box package management. several platforms exist - slackbuilds, most notably - but there is no dependency resolution amongst the different platforms. so, i absolutely think twice about building anything big unless i feel like sitting around for a few hours and taking care of the whole dependency tree.
|
|||
|
|||
(08-04-2020, 03:43 PM)jkl Wrote: Define “dependency-heavy”? I think he meant something like "uses code from external sources (other than what is provided by the language's standard library or libraries)". As opposed to "batteries included". If I understand correctly, updating a statically-built program to incorporate changes in any of its dependencies still requires a rebuild. Otherwise it will still use the old code. With dynamic linking the updates can be transparent (unless the major version changes, usually because the API does), and the using program doesn't always need to be recompiled. It does put the burden on the static program's maintainer to keep track of changes in the dependencies, but personally I prefer getting the opportunity to test their effects on my code rather than having them deployed "behind my back". (Edit: Somehow I had missed page 3 of this thread when I wrote this reply. Oh well.) (Edit2: actually the maintainer of a dynamically-linked program ought to also be tracking changes in the libraries it depends on and testing that they don't adversely impact his program. In my time as a FreeBSD port maintainer I learned that the ports management team only checks that an upgrade of a library doesn't break the build of the programs which depend on it; as far as I know they don't do even the most rudimentary "smoke" test of those programs, let alone more detailed functional or other tests. I expect the same is true for the various package management systems. So the difference between static and dynamic linking is really a wash as far as upgrades go, assuming conscientious maintainers in both cases. Which might be a BIG assumption.) |
|||
|
|||
I've recorded this episode as a podcast, you can find it in the parent post of the thread.
|
|||
|
|||
As stated on the IRC, I played a bit with Conan. As it turns out, it is a really nice addition to CMake (which is what it supports best): One could ship a source package that pulls, builds and uses all dependencies one could ever need, even on systems with no own package manager. Not bad, really.
I migrated the ymarks server from my own build script with a large directory of dependencies, dynamically adding cJSON and SQLite3 - and it works just fine. Awesome, really. The only third-party library still shipped is a header-only web server which isn’t in Conan. It does not have to - one more file is acceptable to me. :) -- <mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen |
|||
|
|||
My thoughts on package management, and overall system complexity:
Millions of developers are writing code, then many other developers are writing more code which relies on top of already written code, each passing day the complexity of the system increases, but our ability to manage it or even comprehend in its entirety decrease drastically. You can try to use some chroot magic, essentially making some kind of software API to your system, while you yourself sit in a 'clean' and tidy environment, but what does it solves really? The junk is still on your hard drive, it doesn't matter how you run it directly or not. Another option is to do it your own way, only clean code, only most sane projects, but then good luck with compatibility, or even finishing such monumental project on your own, it will just make you an elite autist while the rest of GNU/Linux will carry on without you. Third option is to just accept your destiny, there are no ideals in this world. I personally think that sooner or later the whole software stack will 'explode' into our faces, people will die or lose money left and right because of terrible code, or some bug from 2026 in a 9-level deep library/framework system build on top of ChromiumOS. And it will be perceived as absolutely normal situation, or people will outrage, and ask government to step it, I would love to see how the government will regulate and review 500million of LOC. At least programmers are not going to be unemployed for the very, very long time. |
|||