package management discussion - Programming On Unix

Users browsing this thread: 1 Guest(s)
sth
Long time nixers
hey all

i know a few folks here have developed their own package managers in the past, but i was having a hard time locating any threads specifically about writing package managers / discussions of what features are important and how different people approach implementation of those features.

if you've got a package manager project, or are involved with or just interested in existing package managers, post them here!

i'll start. i'm working on a small "secondary" package manager tentatively called 's3pkg' that uses an AWS S3 bucket/prefix as a repository. https://github.com/guinanseyebrows/s3pkg
my goal is to have an easy way to transfer scripts/small packages that are not included in the stock Amazon Linux repos to the fleet of 150-ish machines i take care of, without the hassle of creating RPMs for each or running an RPM repository server. i know it's kind of overkill to do this but i've been interested in breaking down the package management process for myself for a while, so it's been a fun exercise. right now it just handles search, install and remove, and i'll be the first to admit that it's clunky and some of the functions are not particularly elegant.

right now it's written in POSIX shell (but relies on a few GNUisms, i'll admit) but i am in the process of learning python well enough to rewrite it using boto3, because aws cli commands are dog-slow compared to boto3 API requests.
-------
nvsbl.org
z3bra
Grey Hair Nixers
Now that's one topic I like to discuss !!

I wrote my own pack manager too: pm.

All it does is unpacking tarballs into a $ROOT directory, and write down all the unpacked files for easier removal. Removing a pack means deleting all files created on installation (so listed in the database file), and the database entry (just a directory in $DATA). Updating a package is as simple as removing current version, and installing a new one.

As I see it, package manager should only do that: keep track of what is installed, and their version. The rest might or might not be needed, and should thus be handled outside (check integrity, verify signature, sync repositories, list dependencies, ...).

The key feature is that because it installs to an arbitrary $ROOT, it works alongside distro-integrated package managers. I use it to install everything that is not packaged on the distro I run (crux, debian, void, openbsd), has an incorrect version for my needs, or require recompilation to be configured (eg. suckless tools). I wrote a quick shell script that runs "make; make DESTDIR=$tmp install; tar -C $tmp -cjf name@version.tbz ." to create a package ready for install. Thus needing no build recipe. I create packages interactively and install them this way.

I install everything to either /usr/local or ~/.local, depending on wether I want it global or not.

I worked on a few other tools to make it usable with external repos, but never used it. There is a repo to sync, download and cache remote packs locally. It can even use gopher:// as a hosting protocol ! You would use it like this:

Code:
# printf 'repo "https://repo.your.tld"\n' > /etc/repo.conf
# repo -s # fetch remote pack list
# pm -a -v $(repo sick)
installed sick (v1.2)

Note that for usability reasons, "pm" can call "repo" internally, so that "pm -a -v sick" would have the same effect. "repo" itself can also call "sick" internally, to verify package integrity using crypto keys, but this is well beyond scope here ☺
I think these 3 tools, each having their own purpose, and being useful on their own integrate well together, and have nothing to envy to the big names (apt, dnf, pacman, ..), besides perhaps dependency resolution.

Dependency resolution could be handled by the repo utility itself, assuming the remote pack list would list them.

Thanks for reading this far, you're an awesome person !

tl;dr
I wrote pm, sick and repo.
It features a simple package format: plain tarballs, whose only metadata is in the filename: name@version.tbz.
Installing a package unpack it to $ROOT, and store the list of installed files to $DATA/name/files. Removing files installed is as easy as "cd $ROOT; rm $(<$DATA/name/files)", which is what the pack manager does.
Updating means "remove pack + install new pack".
sth
Long time nixers
z3bra - dependency resolution is not something i've gotten into yet but i plan on building that into my package manifests (maybe a single space-separated list of dependencies that trigger additional runs of the install function... but i know thats going to be a rabbit hole).
if there were a pure C SDK for AWS i would maybe try to learn it and modify pm! alas there is not.
z3bra
Grey Hair Nixers
Dependency resolution is a complex task with a lot of research to back it up. Doing it simply, however, is complex.
The idea I had was to rely on the remote repository to do it. When syncing the repo locally, you just fetch a list files with a format like so:

Code:
libc    1.0    gopher://repo.your.tld/9/libc@1.0.tbz
pm      1.3    gopher://repo.your.tld/9/pm@1.3.tbz      libc,repo
repo    0.1    gopher://repo.your.tld/9/repo@0.1.tbz    libc,sick
sick    1.1    gopher://repo.your.tld/9/sick@1.1.tbz    libc

Then when calling the "repo" command to download a pack, you could ask it to fetch all dependencies using a -d flag, and it would print all resolved packs to stdout rsther than just you pack. It could also be a totally different tool ! After all, repo is responsible for fetching remote packs. A tool could build dependency graph when given a name. This would be easy to do.

The hardest part is documenting these dependencies... But here we reach the package building part of the problem, which is also interesting, but way more complex !
sth
Long time nixers
right now this is what my manifest files look like:
---------------------------------------------------------------------------------
name: mypackage
version: 0.1
description: A short description of the package
{dir} /path/to/destination/folder 0755
{dir} /path/to/configdir 0600
file1 /path/to/destination/folder/file1 0755
file2 /path/to/other/file/folder/file2 0644
file3 /path/to/config/file3 0600
---------------------------------------------------------------------------------

so what I was thinking is adding a line like
depends: extpkg otherpkg differentpkg

then, during the install process, before it installs the requested package, read that line and run the install function for each of those (which would recursively take care of any dependencies of those dependencies). but i have a feeling that might be oversimplifying things or ignoring potential pitfalls.

EDIT: now that i look at that manifest I realize i need to update my docs/manifest_example because this is totally not the current working format :P but it gets the point across.
sth
Long time nixers
(26-05-2020, 07:41 PM)sth Wrote: right now this is what my manifest files look like:
---------------------------------------------------------------------------------
name: mypackage
version: 0.1
description: A short description of the package
{dir} /path/to/destination/folder 0755
{dir} /path/to/configdir 0600
file1 /path/to/destination/folder/file1 0755
file2 /path/to/other/file/folder/file2 0644
file3 /path/to/config/file3 0600
---------------------------------------------------------------------------------

so what I was thinking is adding a line like
depends: extpkg otherpkg differentpkg

then, during the install process, before it installs the requested package, read that line and run the install function for each of those (which would recursively take care of any dependencies of those dependencies). but i have a feeling that might be oversimplifying things or ignoring potential pitfalls.

EDIT: now that i look at that manifest I realize i need to update my docs/manifest_example because this is totally not the current working format :P but it gets the point across.
what i think i may do with my python rewrite is do a recursive dependency check, then create a list of unique dependencies as far as the chain goes that are not already located on the local system before passing that to the install function, just so as not to waste any time checking for duplicates or dependencies that are already satisfied.
venam
Administrators
(26-05-2020, 05:55 PM)sth Wrote: i know a few folks here have developed their own package managers in the past, but i was having a hard time locating any threads specifically about writing package managers / discussions of what features are important and how different people approach implementation of those features.

There are those threads for inspiration: The role of distributions &/or Unix flavors, where does pkg management stands, The big research I did on distros, pkg management, what do you expect, what you wished you had, and GoboLinux and Package Management, but none specifically about writing package managers and the details in it. Please be sure to take a look at those threads too.
z3bra
Grey Hair Nixers
@sth, you bring an interesting point with your manifest file: metadata.

Which one to have ?
How to store them ?
What to do with it ?

I think the bare minimum is the name, version and list of installed files. Take out any of those and the package manager cannot do its duty. I chose to only include these for this reason. I'm however interested in hearing what you include in the manifest, and your reason for it ☺
sth
Long time nixers
metadata
(27-05-2020, 03:49 PM)z3bra Wrote: @sth, you bring an interesting point with your manifest file: metadata.

Which one to have ?
How to store them ?
What to do with it ?

I think the bare minimum is the name, version and list of installed files. Take out any of those and the package manager cannot do its duty. I chose to only include these for this reason. I'm however interested in hearing what you include in the manifest, and your reason for it ☺

other than the future issue of dependencies, the only important thing i include that you didn't mention are file permissions, and that's due to the way i've decided to approach setting up directories.
that being said, i've considered changing it so that all files installed go into a specific directory (like /usr/local/s3pkg/<packagename>/) and then just creating symlinks to the binaries in /usr/local/bin or something. it would probably be a good idea to do that for a 'secondary' package manager like this, to avoid potential conflicts with the system's package manager.
the alternative to that, which is what i have done, is to allow the package maintainer to choose where the files should be installed, but also to keep track of any directories created explicitly for the purposes of installing the package, so that they can be cleanly removed later without affecting parent directories. i believe that info is documented in the MANIFESTS.md file in my repo but maybe it's not too clear.

either way it comes down to an architectural decision, and there are benefits and drawbacks to both approaches :)
z3bra
Grey Hair Nixers
(28-05-2020, 06:05 PM)sth Wrote: i've considered changing it so that all files installed go into a specific directory (like /usr/local/s3pkg/<packagename>/) and then just creating symlinks to the binaries in /usr/local/bin or something.

An even cooler idea would be to use an union FS mount point. Some internet friends made a distro a while back, and they had this idea (certainly ripped off plan9), to install every package in a particular place, say /packages/NAME/{bin,man,lib}, and then union mount it directly on /.

The cool stuff about union mount is that it merge all the mount points together, and you can even specify which one will be written to in can someone writes to /bin. For example in /opt/custom/bin …

It require a lot more logic in the package manager though, as it would then need to handle mountpoints, version and so on.

I could be wrong, but I think the Nix package manager does something similar.
jkl
Long time nixers
The only relevant problem here might be that it affects portability in a negative way. I agree that the historical file system hierarchy in Unix and wannabe-Unix was broken from the moment when binaries were spilling into /usr. Only one of the things where Plan 9 excels: /usr is for users, period.

edit: I shouldn’t type before my third cup of coffee.
z3bra
Grey Hair Nixers
Agreed that plan9 managed to keep things clear. As I said, this is because of the union mounts. The Linux kernel support these too (see overlayfs, but they're barely used. This could definitely close the issue with our 1km long $PATH, by mergin all binary locations to /bin. There is one issue with it though: permissions. plan9 has a namespace for every user, which means that they can mount directories however they want, it will never affect other users. Under Linux distributions however, all users are part of the same namespace, and thus mounting stuff over / require administrative permissions.

The Linux kernel has a decent namespace support though, including mountpoints. It would be interesting to see what would happen if a new namespace where created for the user in his ~/.profile... ?

EDIT: Got my answer: http://www.halfdog.net/Security/2015/Use...WriteExec/
So it works, but can lead to a privilege escalation bug (though the bug is 5 years old, so I hope it's fixed now :)).
Long story short, user in an unprivileged namespace receive the CAP_SYSADMIN capabilities, and can thus mount whatever shit they need in the namespace. The fact it is unprivileged means that no special permission is needed to create a new namespace. I'll try this out !
ckester
Members
(29-05-2020, 03:58 AM)jkl Wrote: edit: I shouldn’t type before my third cup of coffee.

I had to laugh, because on my system the timestamp of your message is shown as 11:58 PM.

Pulling all-nighters again, jkl?
vain
Long time nixers
(26-05-2020, 07:35 PM)z3bra Wrote: Dependency resolution is a complex task with a lot of research to back it up. Doing it simply, however, is complex.

I never tried to write a package manager (because if the distribution doesn't provide a good one, then I tend to not use that distro in the first place), so I might be a bit naïve here: Could you elaborate on what makes dependency resolution complex? Isn't building a dependency tree and then doing topological sort enough? (Possibly by simply resolving dependencies recursively?)

Or did I misunderstand and your point was that it can be hard to do it in a dead simple way? If so, why?
z3bra
Grey Hair Nixers
(31-05-2020, 02:47 AM)vain Wrote: Isn't building a dependency tree and then doing topological sort enough? (Possibly by simply resolving dependencies recursively?)

Well that's already a bit complex. But the most complex part isn't to "find" the dependencies. It is to keep them up to date and accurate when you maintain your own package infrastructure. Because you need to find a clean way to ensure that you actually listed all required dependency. An eventual solution is to build packs in a chroot where you only install those deps for example.
And then you'll also face the "build VS. runtime" dependency problem…

Really, what's complex with dependency resolution, is to get them right in the first place.

Oh. and there is the cyclic dependencies handling problem too. How do you deal with this:

gcc: require libc, libstdc++, make, gcc
libc: require gcc

It is mostly an issue at build time (though you can find cyclic deps at runtime too…), but you'll encounter this fairly often, especially for the components of your toolchain.
sth
Long time nixers
(31-05-2020, 04:14 AM)z3bra Wrote: gcc: require libc, libstdc++, make, gcc
libc: require gcc

this is exactly why i plan on decoupling the dependency resolution from the package installation - check manifests and generate a list of all unique dependencies recursively that are not already installed, then install each package in that list.
it wont avoid every issue but i'm hoping that approach will deal with most common problems.
jkl
Long time nixers
(29-05-2020, 05:24 PM)ckester Wrote: I had to laugh, because on my system the timestamp of your message is shown as 11:58 PM.

Pulling all-nighters again, jkl?

It’s CEST here. :-)
vain
Long time nixers
(31-05-2020, 04:14 AM)z3bra Wrote: But the most complex part isn't to "find" the dependencies. It is to keep them up to date and accurate when you maintain your own package infrastructure.

Ahh, understood. And agreed. One of the many reasons why I don’t want to do this myself. :) I’m very grateful that other people keep track of all that and make their work available to the world for free.
eadwardus
Members
I wrote a package manager: https://github.com/eltanin-os/venus
Complemented by a ports-like {source-based package manager/build system}: https://github.com/eltanin-os/ports

It fetches, does a integrity check, and unpack the files into the $root directory. It's actually a little more complex than one would expect a "simple package manager" to be, it has its own archiver and manifest/config file format.
The "database" is a directory $root/var/pkg with the data files
$root/var/pkg/remote: packages that can be fetched
$root/var/pkg/*: arbitrary sources (such as a cd, another hd, etc.)
$root/var/pkg/local: installed packages
A "database entry" looks like this:
Code:
name:foo
version:1.0
license:MIT
description:dummy package
rdeps{
        # runtime dependencies
        coreutils#*
}
mdeps{
        # construction dependencies
        libc#1.0
}
files{
        # path fletcher32-hash
        bin/file01 86166cc0
}
It's painful to deal with cases where you want to statically link a package, but some of its dependencies can only be linked dynamically, so you need to differ construction/static deps from runtime/dynamic ones.

It has a configuration file under /etc/venus.conf or $HOME/.config/venus.conf
See an example below using /usr/local as $root:
Code:
arch:x86_64
fetch:curl -LO
root:/usr/local
safeurl:https://myrepo.com
uncompress:lzip -dc
url:https://mymirror.com
There are two repository entries to avoid needing to trust a mirror, as the integrity values (crypto hash) and entries would be downloaded from the official repository.

It has no automatic dependency resolution to this moment, although i am considering to add a simple recursive one (similar to the one used by the ports).

About the discussion on separating the packages under /whatever/package_name, like nix/guix/janus/gobo does: It seems a good way to organize things at first, but then you fall to the problem that the entire OS expect a different organization; most of the advantages of separating the packages are lost when you try to solve this (maybe union being an exception, but then you have its own problems)