DDE, Debian Data Export

After testing the idea and the prototype with my presentation at Fosdem, I'm proud to announce DDE, Debian Data Export.

The problem

In Debian, we publish all sort of information, using all sorts of data formats (often ad hoc and obscure), in obscure places. Try to think of an application, for example, that wants to access all this information together:

  • Maintainer <-> Source package mapping
  • Popcon rankings
  • Changelogs
  • .desktop files of packages not installed
  • What is in the new queue
  • Package screenshots
  • Some statistics
  • Localisation information
  • uscan status
  • Buildd logs
  • sloccount run results
  • Debian Weather
  • Debian Pure Blend specific information

A nightmare, uh?

The solution

DDE is a way to make it simple to publish and download data. The aim is to be able to access all sorts of Debian information without worrying about data formats, protocols and access control, and to make it easy to discover what data is available.

DDE exports data as a big virtual tree. You can pick a node in the tree by its URL and download all the data that it contains, in a format of your choice: currently it supports JSON/JSONP, YAML, CSV and Python pickled objects.

This means that you can now get Debian data using a trivial HTTP client tool or library, and read it using commonly available decoders: both should be available in almost any programming language nowadays. Since JSON and JSONP are supported, this even includes JavaScript in Ajaxy web pages.

DDE is not a competitor to UDD: UDD is about creating a central location where all the data can be accessed, while DDE is about giving people a simple way to access data or subsets of data.

In a way, DDE and UDD complete each other: the more data enters UDD, the more data is available for DDE. In turn, DDE gives a simple interface to the most popular and useful UDD queries.

The dream

Here are some hints at what can be done with this:

  • Autocompletion in HTML fields
  • Export data to feed external sites like <debtags.debian.net> or <screenshots.debian.net>
  • Have a way for package managers to easily access all sorts of data
  • Have a way to implement fancy tools that can query massive data sets without needing to download them locally

A call for action

You can add data to the DDE tree by just putting a data file in yaml, json or pickle format under ~/.dde (I've written a specific guide to this on the Debian wiki).

If you wish to create new and fancy Debian statistics or compute other sorts of data, or if you already maintain tools that generate Debian data, including but not limited to, for example:

  • Popcon rankings
  • .desktop files of packages not installed
  • Content of the new queue
  • Screenshots
  • All sorts of statistics
  • Debian Weather

Then please have a look at http://wiki.debian.org/DDE/StaticData and try to publish your data in ~/.dde on merkel.debian.org.

For more complicated cases (like accessing a remote database), it is possible to extend DDE via python plugins. You can get in touch with me if you need to go that way.