Debtags for derivative distributions

Sometimes I do cool stuff and I forget to announce it.

Ok, so I recently announced a new Debtags website.

I forgot to say in the announcement that the new website does not only know of Debian packages: see for example this page, at the very bottom it says: "Distributions: oneiric, precise, sid, testing".

This means that already, here and now, debtags.debian.net can be used to tag packages from both Debian and Ubuntu, and can easily be extended to cover the entire Debian ecosystem.

If you are a package maintainer, you will notice that your maintainer page shows your packages from everywhere. If you want to filter things a bit, for example hide obsolete packages from an old Debian Stable or Ubuntu LTS, just click on the "Settings" link on the top right to configure the page.

How it works

The magic is in this mergepackages script, which is run daily, and exports merged Packages files at dde.debian.net. The debtags.debian.net concept of Packages and Sources files are just those all-merged.gz and all-merged-sources.gz.

The merging is simple: that rebuild script processes files in order, and the first version of a package that is found is chosen as the base for the one that will go in the merged Packages file. Some fields like "Description" are just taken from this pivot package, others like Architecture or dependencies are merged into it. It's arbitrary, but works for me: the result has all the packages with all their possible architectures and dependencies, and is ready to be indexed with apt-xapian-index.

At the moment I pull data from Debian and Ubuntu, but you can see that the script can easily be extended to pull data from any Debian-style ftp archive, so any Debian derivative can go in. I've already started negotiations with the Derivatives Census on how to add any Debian derivative and keep the list up to date.

How to export tags for your own distribution

I'll use Ubuntu as an example since the data is already available.

The way you add Debtags to the Ubuntu packages file is just this one:

  1. Get the full reviewed tag database
  2. Optionally filter out those packages that you are not interested in
  3. Tweak this script to build an overrides file.
  4. Give the overrides file to your favourite ftp archive building tool.

The make-overrides is a bit rusty: if you improve it, please send me your changes.

That is it, nothing else required, no excuses, it's ready, here, now!

Hitches and gotchas

This merged Packages file is a bit of a hack, and suffers from name conflicts across distributions, where two different softwares are packaged in two different distributions with the same name.

Ideally, name conflicts should not happen: if a derivative decided to package kate and call it gedit, they deserve to have it tagged uitoolkit::gtk. I think it's rather important that the whole Debian ecosystem works as much as possible with a single package namespace.

However, that reasoning fails if you take time into account: packages get renamed, like git and chromium, and may mean completely different things, for example, if you compare Debian Stable with Debian Sid.

This last is a problem caused by debtags only working with package names but not package versions. I have a strategy in mind based on being able to override the stable tag database using headers in debian/control; it still needs some details sorted out, but I'm confident we will be able to address these issues properly soon enough.

Why stop at the Debian ecosystem?

Why indeed. I'm clearly trying to use FOSDEM, and the CrossDistribution devroom as the venue to discuss just that.