Pages about Debtags.
fuss-launcher: an application launcher built on apt-xapian-index
Long ago I blogged about using apt-xapian-index to write an application launcher.
Now I just added a couple of new apt-xapian-index plugins that look like they have been made just for that.
In fact, they have indeed been made just for that.
After my blog post in 2008, people from Truelite and the FUSS project took up the challenge and wrote a launcher applet around my example engine.
The prototype has been quite successful in FUSS, and as a consequence I've been asked (and paid) to bring in some improvements.
The result, that I have just uploaded to NEW, is a package called
fuss-launcher:
* New upstream release
- Use newer apt-xapian-index: removed need of local index
- Dragging a file in the launcher shows the applications that can open it
- Remembers the applications launched more frequently
- Allow to set a list of favourite applications
To get it:
apt-get install fuss-launcher(after it passed NEW);- or
git clone http://git.fuss.bz.it/git/launcher.git/andapt-get install python-gtk2 python-xapian python-xdg apt-xapian-index app-install-data
It requires apt-xapian-index >= 0.35.
To try it:
- Make sure your index is up to date, especially if you just installed
app-install-data: just runupdate-apt-xapian-indexas root. - Run
fuss-launcher. - Click on the new tray icon to open the launcher dialog.
- Type some keywords and see the list of matching applications come to life as you type.
It's worth mentioning again that all this work was sponsored by Truelite and the Fuss project, which rocks.
Some screenshots:
When you open the launcher, by default it shows the most frequently started applicationss and the favourite applications:
When you type some keywords, you get results as you type, and context-sensitive completion:
When you drag a file on the launcher you only see the applications that can open that file:
New apt-xapian-index plugins
Besides a fair bit of refactoring and cleanup, I've recently added two new plugins to apt-xapian-index:
app-install
If app-install-data is installed, information about .desktop files will now enter the index.
This allows, for example, to limit query results to only those packages that contain .desktop files, which is quite useful, for example for building desktop-oriented package managers.
aliases
It reads term->aliases mapping from files in /etc/apt-xapian-index/aliases/
or /usr/share/apt-xapian-index/aliases/, and feeds them as
synonyms in the index.
apt-xapian-index ships an example alias file, to give people who know the wrong software names a chance to find the right ones:
# Aliases expanding names of popular applications
excel XToffice::spreadsheet
powerpoint XToffice::presentation
photoshop XTworks-with::image:raster
coreldraw XTworks-with::image:vector
autocad XTworks-with::3dmodel
Notice how it is possible to use index terms that happen to be Debtags tags as synonyms, which yields better results, language independence and extra coolness.
apt-xapian-index now comes with a query tool
I've just uploaded a new version of apt-xapian-index to unstable. Now it comes with a little query tool called axi-cache.
You can search this way:
axi-cache search foo bar baz facet::tag sec:section
In fact, you can use most of the things described here.
You can then say axi-cache more to get more results, or axi-cache again to
retry a search, or axi-cache again wibble wabble to add keywords to the
last search.
This allows to start with a search and tweak it. In order to work it needs to
save the last search so again or more can amend it. Searches are saved in
~/.cache/axi-cache.state.
You can search tags instead of packages by adding --tags.
It will suggest extra terms for the search, and also suggest extra tags.
It can even correct spelling mistakes in the query terms once the index has
been rebuilt with this new version of update-apt-xapian-index.
I need to thank Carl Worth who, with notmuch, reminded me that if I just build a nice interface on top of Xapian's query parser I go quite a long way towards making a Xapian database extremely useful indeed.
axi-cache also integrates with bash-completion so that tab completion is
context-sensitive to the command line being typed:
$ axi-cache search image pro
probability process processors programmability provides
problem processing production pronounced proving
$ axi-cache search kernel pro
problems processor production proved provided
processing processors programming provide provides
Thanks to David Paleino who wrote the bash completion script.
Just for reference, this is the command line help:
$ axi-cache help
Usage: axi-cache [options] command [args]
Query the Apt Xapian index.
Commands:
axi-cache help show a summary of commands
axi-cache search [terms] start a new search
axi-cache again [query] repeat the last search, possibly adding query terms
axi-cache more [count] show more terms from the last search
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-s SORT, --sort=SORT sort by the given value, as listed in /var/lib/apt-
xapian-index/values
--tags show matching tags, rather than packages
--tabcomplete=TYPE suggest words for tab completion of the current
command line (type is 'plain' or 'partial')
If you install the package for the first time, you may need to rebuild the
index by running update-apt-xapian-index as root before using axi-cache.
Introducing apt-xapian-index
apt-xapian-index has just been approved into experimental, and in the next days I'm going to blog more about it.
The package contains a tool called update-apt-xapian-index that indexes
Debian package metadata into a Xapian index located
at /var/lib/apt-xapian-index/index.
The index is read-only, except for update-apt-xapian-index;
however, it is world-readable: every user can query it, all the time, at the
same time, even during index updates.
The index can contain more than package descriptions.
update-apt-xapian-index indexes data using plugins located in
/usr/share/apt-xapian-index/plugins, and any package can add their own. For
example, debtags will provide a plugin to
index tags.
Since Xapian can index numeric values as well, if anyone makes a popcon package that downloads popcon information, they can provide a plugin to index popcon values. If anyone makes an iterating package that downloads ratings, they can provide a plugin to index ratings.
Another plugin could be a specialised Debian stemmer that generates token such as foo out of lib*foo*-dev.
I think you get the idea: it's very extensible. You can have a look at the initial set of plugins in subversion.
The index is also self-documenting, so that one can keep track of all the
intresting things that can be found in it. update-apt-xapian-index does not
only maintain the index, but also the file /var/lib/apt-xapian-index/README
that aggregates documentation provided by the plugins.
To query the index, you just use Xapian. Debian contains Xapian bindings for various languages:
- libxapian-dev for C++
- libsearch-xapian-perl for Perl
- libxapian-ruby1.8 for Ruby
- python-xapian for Python
- tclxapian for Tcl
- php5-xapian for PHP
In the next days I am going to post various example queries and interesting tricks that the index allows you to do.
It's going to be fun.
Next in the series: performing a simple query.
Debtags interesting times
An unbelievable amount of interesting, fun and bleeding-edge things to do are coming out with Debtags.
Erich did some javascript tag cloud.
Interactive tag clouds are probably the coolest thing to provide as a link for packages.debian.org when people click on a tag. They can also replace the current navigator we have at http://debtags.alioth.debian.org/cgi-bin/index.cgi
I started using Xapian, which is cool. It does fast text search with all the cool things like stemming and whatnot. Check out a prototype package search interface using it.
Xapian can also find packages similar to a given one by looking at the description, which is cool. I can already use it to suggest tags, until we manage to put some of the bayesian code we are accumulating into production.
Plus, I had an IRC chat with a Xapian developer: he was really nice. We had one of those neat chats in which really cool ideas keep coming out and it won't be so hard to implement even and wow, that'd be so cool, let's do it!
Then there's the new web-based tag editing interface to do to replace the old one. I can add the smart search idea to it allowing to display tags giving keywords: and that'd give suggestions that automatically follow existing tag practices, which rocks.
The C++ daemon backend that we now have for the website is great to allow to quickly intermix tag updates and queries, and still hope to scale. Maybe it means that soon we finally can start handling debtags mail submissions on Alioth as well.
Distro-wise I have a plan to finally get rid of apt-index-watcher. The plan is already working in my computer, and just needs some more tests before hitting unstable.
I'd also like to have the Python Debtags interface packaged somewhere, so that people who don't grok C++ can have fun with the Debtags data. Work is underway.
Then there's the debtags-updatecontrol experiment, which didn't go quite as I expected, but still went well. I've already sent my first tag override update to aj incorporating data from control files.
Other ideas? For example, implementing Xapian indexing in libept, and finally have a package manager with an amazingly fast search-as-you-type.
And that, as a side effect, would create a Xapian package description in /var/something that is ready for any installed software to use.
Then there's popcon. That's another piece of data to make available to package managers, and I already know how to do it.
With Phil Hands at Debconf6 we had also started designing a way to use the unaggregated popcon database to implement an Amazon-style functionality that would go like "users which have a system similar to yours also have packages X and Y installed: would you like to have a look at them?"
Joey Hess suggested to filter popcon analysis data with debtags, so that one can get suggestions, for example, only limited to game packages. Oh so cool!
And this all feels like just a beginning...
apt-xapian-index: search as you type
I've recently posted:
- an introduction of apt-xapian-index;
- an example of how to query it;
- a way to add simple result filters to the query;
- a way to suggest keywords and tags to use to improve the query.
- a way to search for similar packages.
- a way to implement an adaptive cutoff on result quality.
- a smart way of querying tags
Note that I've rewritten all the old posts to only show the main code snippets: if you were put off by the large lumps of code, you may want to give it another go.
Today I'll show how to implement a very attractive feature for a user interface: search as you type. The idea is that you don't need to press enter to fire up a query: instead, the results materialise in front of your eyes as you type them.
The example I created uses curses, but the idea is good on any interactive user interface.
The main thing to keep in mind with search as you type is that the last word is likely to be partially typed, unless maybe some timeout expired since the user's last keystroke.
Xapian comes into help here, as it allows us to expand the partially typed word into an OR query with all the terms that start with it. This means that if we are typing, for example, "progr", we can turn the query into "program OR programmer OR programming OR programmed [...and so on...]".
I won't show the UI code, except a simple input loop that triggers the query at every keystroke:
def mainloop(self): while True: c = self.win.getch() self.line += chr(c) self.results.update(self.line)
The interesting part is in the update function.
First we split the line in words and convert the words into a query:
# Split the line in words args = self.splitline.split(line) # Convert the words into terms for the query terms = termsForSimpleQuery(args)
Then we expand the last word with all possible completions:
# Since the last word can be partially typed, we add all words that # begin with the last one. terms.extend([x.term for x in db.allterms(args[-1])])
Now we can build the query. Of course you can add all other sorts of things to the query, for example a boolean expression of tag filter like in axi-query-pkgtype.py; Xapian will cope.
# Build the query query = xapian.Query(xapian.Query.OP_OR, terms)
Finally the query. For bonus points you can do the adaptive cutoff trick to discard bad results.
In my case, since I don't implement scrolling of results, I also limit them to what fits in the window:
# Retrieve as many results as we can show mset = enquire.get_mset(0, self.size - 1)
Finally, draw the results on screen:
# Redraw the window self.win.clear() # Header self.win.addstr(0, 0, "%i results found." % mset.get_matches_estimated(), curses.A_BOLD) # Results for y, m in enumerate(mset): # /var/lib/apt-xapian-index/README tells us that the Xapian document data # is the package name. name = m[xapian.MSET_DOCUMENT].get_data() # Get the package record out of the Apt cache, so we can retrieve the short # description pkg = cache[name] # Print the match, together with the short description self.win.addstr(y+1, 0, "%i%% %s - %s" % (m[xapian.MSET_PERCENT], name, pkg.summary)) self.win.refresh()
That's it, try it out.
You can use the wsvn interface to get to the full source code and the module it uses.
You can see a similar technique working in goplay, where it is also integrated with an interactive tag filter.
Next in the series: dynamic tag cloud.
apt-xapian-index: dynamically generated tag clouds
About apt-xapian-index, I have already posted:
- an introduction of apt-xapian-index;
- an example of how to query it;
- a way to add simple result filters to the query;
- a way to suggest keywords and tags to use to improve the query.
- a way to search for similar packages.
- a way to implement an adaptive cutoff on result quality.
- a smart way of querying tags
- how to implement search as you type
Today I'll show how to create tag clouds. Not only that, but I'll show how to implement tag clouds that change as the user types a query.
This example uses python-gtk, and has been created together with Matteo Zandi.
Generating a tag cloud out of any Xapian query is simple, and it is just a matter of presenting into a tag cloud the information that you get with the technique shown in a smart way of querying tags: you get the tags related to the query, and you lay out their names with a font size proportional to their Xapian rank.
For the presentation, we can load pretty names from the Debtags vocabulary in
/var/lib/debtags/vocabulary:
from debian_bundle import deb822 # Facet name -> Short description facets = dict() # Tag name -> Short description tags = dict() for p in deb822.Deb822.iter_paragraphs(open("/var/lib/debtags/vocabulary", "r")): if "Description" not in p: continue desc = p["Description"].split("\n", 1)[0] if "Tag" in p: tags[p["Tag"]] = desc elif "Facet" in p: facets[p["Facet"]] = desc
The query then goes on as usual, and when we get the tags from the eset we
also record their score and normalise it between 0 and 1. I found that
computing the logarithm of scores helps to avoid having a tag cloud with a few
huge tags and a lot of tiny tiny tags:
class Filter(xapian.ExpandDecider): def __call__(self, term): return term[:2] == "XT" def format(k): if k in tags: facet = k.split("::", 1)[0] if facet in facets: return "<i>%s: %s</i>" % (facets[facet], tags[k]) else: return "<i>%s</i>" % tags[k] else: return k taglist = [] maxscore = None for res in enquire.get_eset(15, rset, Filter()): # Normalise the score in the interval [0, 1] weight = math.log(res.weight) if maxscore == None: maxscore = weight tag = res.term[2:] taglist.append( (tag, format(tag), float(weight) / maxscore) ) taglist.sort(key=lambda x:x[0])
Finally, you mark up a gtkhtml2.Document to display in a gtkhtml2 widget:
def mark_text_up(result_list): # 0-100 score, key (facet::tag), description document = gtkhtml2.Document() document.clear() document.open_stream("text/html") document.write_stream("""<html><head> <style type="text/css"> a { text-decoration: none; color: black; } </style> </head><body>""") for tag, desc, score in result_list: document.write_stream('<a href="%s" style="font-size: %d%%">%s</a> ' % (tag, score*150, desc)) document.write_stream("</body></html>") document.close_stream() return document
That's it, try it out.
You can use the git web interface to get to the full source code and the module it uses.
Evaluating programming languages for playing with Debtags
Since having workable bindings for the C++ Debtags libraries seems to be still a bit in the future, I'm planning to build a bit of native infrastructure in some higher level language. First step is seeing what language I could start playing with.
The problem
At the most basic level, in Debtags we have a number of packages, each of which have a set of tags.
The way I usually save tags is a file with the format:
package1, package2: tag1, tag2, tag3
package3: tag1, tag2
That is, every line has a list of packages with the same tags, and the list of their tags.
Since any script I'm going to write has to at least be able to parse the data
into something like a package -> tags hash, then print it out.
Let's see how perl, python and ruby perform.
Tests
C++
The reference point for the experiment will be the C++ implementation,
tagcoll:
$ time tagcoll copy package-tags > /dev/null
real 0m0.421s
user 0m0.412s
sys 0m0.000s
Perl
First attempt is with Perl, creating the script that parses into a hash of
package => set of tags and prints the result.
There are set modules for Perl on CPAN, but I have none handy at the moment. However, since they are implemented using hashes, I can approximate them by using a hash.
Note that I also want to have a different copy of the tag set for every package, so that I can manipulate them in the future without unwanted side effects.
Here is the code:
#!/usr/bin/perl -w
use strict;
my %db;
# Read the tag database
while (<>)
{
chop();
my ($pkgs, $tags) = split(': ');
# Create the tagset using keys of a hash
my %tags = map { $_ => undef } split(', ', $tags);
for my $p (split(', ', $pkgs))
{
# Make a copy of the tagset
$db{$p} = {%tags};
}
}
# Write the tag database
while (my ($pkg, $tags) = each %db)
{
print $pkg, join(', ', keys %$tags), "\n";
}
Here is the running time:
$ time ./parse.pl package-tags > /dev/null
real 0m0.448s
user 0m0.436s
sys 0m0.008s
Not so bad, comparable with tagcoll.
Python
Then comes Python. I'm not much of a Python fancier, but I'm rather attracted
by the new set native type introduced with Python 2.4, which seems to have
most of what I need nice and done.
Here is the script:
#!/usr/bin/python
import sys
input = sys.stdin
if len(sys.argv) > 1:
input = open(sys.argv[1],"r")
# Read the tag database
db = {}
for line in input:
# Is there a way to remove the last character of a line that does not
# make a copy of the entire line?
line = line.rstrip("\n")
pkgs, tags = line.split(": ")
# Create the tag set using the native set
tags = set(tags.split(", "))
for p in pkgs.split(", "):
db[p] = tags.copy()
# Write the tag database
for pkg, tags in db.items():
# Using % here seems awkward to me, but if I use calls to
# sys.stdout.write it becomes a bit slower
print "%s:" % (pkg), ", ".join(tags)
Here is the running time:
$ time ./parse.py package-tags > /dev/null
real 0m0.418s
user 0m0.376s
sys 0m0.036s
I'm pleased, very pleased. Using the native set seems to be not only handy, but efficient.
Ruby
Finally, Ruby. I like to use Ruby. In this case, however, it lacks a native set implementation, although it has a set module which is implemented using a hash.
Here is the script:
#!/usr/bin/ruby
require 'set'
infile = ARGV[0] ? File.new(ARGV[0]) : $stdin
# Read the tag database
db = {}
infile.each_line do |line|
line.chop()
pkgs, tags = line.split(": ")
# Create the set using the Set module
tags = Set.new(tags.split(", "))
pkgs.split(", ").each do |p|
# Is this a copy or a reference? I need to find out.
db[p] = tags
end
end
# Write the tag database
db.each do |key, tags|
# Ouch, Set does not do join by itself
print key, ": ", tags.to_a.join(", ")
end
Here is the running time:
$ time ./parse.rb package-tags > /dev/null
real 0m1.637s
user 0m1.572s
sys 0m0.052s
I hope I got something wrong in the script, but I can't see what.
Results
As much as I don't fancy Python, it looks like it's currently the best choice for playing around with Debtags. I hope the native sets will bring me joy.
If in the future I'll be asked "how come you chose Python for this Debtags thing?", I can point to this page.
libept 0.5.3 hit unstable
I prepared a new toy to play with at Debconf and uploaded it to unstable:
Package: libept-dev
Description: High-level library for managing Debian package information
The library defines a very minimal framework in which many sources of data
about Debian packages can be implemented and queried together.
.
The library includes four data sources:
.
* APT: access the APT database
* Debtags: access the Debtags tag information
* Popcon: access Popcon package scores
* TextSearch: fast Xapian-based full text search on package description
.
This is the development library.
Package: ept-cache
Description: Commandline tool to search the package archive
ept-cache is a simple commandline interface to the functions of libept.
.
It can currently search and display data from four sources:
.
* The APT database
* The Debtags tag information
* Popcon package scores
* A fast Xapian-based full text index on package descriptions
Yes, this finally brings lots of very cool data sources about packages together.
Try this one:
# Check if all data providers are active and give instructions on how
# to activate those that aren't
ept-cache info
# Follow the instructions to activate everything
# Show all GUI image editors, sorted by popularity, in reverse order
ept-cache search image editor -t gui -s p-
If you have the Xapian data provider enabled, the results of a search are given in relevance order, the most relevant first. And also, searches are done with proper stemming, so if you look for image editor it will also find image editing, although it would score image editor higher.
It's also quite lovely to work with it in C++. I'll improvise here a few
examples:
Print name and short description of every package
#include <ept/apt/apt.h>
#include <ept/apt/packagerecord.h>
using namsepace std;
using namespace ept::apt;
void playWithApt()
{
// Apt data source
Apt apt;
// Parser of package records
PackageRecord rec;
// Iterate all package records
for (Apt::record_iterator i = apt.recordBegin();
i != apt.recordEnd(); ++i)
{
rec.scan(*i);
cout << rec.pakcage() << " - " << rec.shortDescription() << endl;
}
}
Show all image editors
#include <ept/debtags/debtags.h>
#include <set>
using namespace ept::debtags;
void playWithDebtags()
{
// Apt data source
Apt apt;
// Parser of package records
PackageRecord rec;
// Debtags data source
Debtags debtags;
if (!debtags.hasData())
return;
set<Tag> tags;
tags.insert(debtags.vocabulary().tagByName("works-with::image:raster"));
tags.insert(debtags.vocabulary().tagByName("use::editing"));
tags.insert(debtags.vocabulary().tagByName("role::program"));
set<string> results = debtags.getItemsHavingTags(tags);
for (set<string>::const_iterator i = results.begin();
i != results.end(); ++i)
{
rec.scan(apt.rawRecord(*i));
cout << rec.pakcage() << " - " << rec.shortDescription() << endl;
}
}
Print all package names, sorted by popularity
#include <ept/popcon/popcon.h>
#include <algorithm>
using namespace ept::popcon;
// STL comparator
struct PopconCompare
{
Popcon& popcon;
bool operator<(const std::string& pkg1, const std::string& pkg2) const
{
return popcon[pkg1] < popocon[pkg2];
}
};
void playWithPopcon()
{
// Apt data source
Apt apt;
// Popcon data source
Popcon popcon;
vector<string> sorted;
if (!popcon.hasData())
return;
// Get all package names in the vector
copy(apt.begin(), apt.end(), back_inserter(sorted));
// Sort it by popularity
sort(sorted.begin(), sorted.end(), PopconCompare(popcon));
// Print it out
for (vector<string>::const_iterator i = sorted.begin();
i != sorted.end(); ++i)
cout << *i << endl;
}
Search for image viewer, but we don't want to view kernel images
#include <xapian.h>
using namespace ept::textsearch;
void playWithXapian()
{
TextSearch textsearch;
vector<string> wanted;
vector<string> notwanted;
Xapian::Enquire enq(textsearch.db());
// This will tokenise the search query into terms, stem them
// and OR them together in a query. Xapian will score higher
// those results in which more ORed terms match, which is what
// we want.
Xapian::Query want = textSearch.makeOrQuery("image viewer");
Xapian::Query dontWant = textSearch.makeOrQuery("linux kernel");
enq.set_query(Xapian::Query(Xapian::Query::OP_AND_NOT, want, dontWant));
// Print the top 20 results, with their relevance percentage
Xapian::MSet matches = enq.get_mset(0, 20);
for (Xapian::MSetIterator i = matches.begin(); i != matches.end(); ++i)
{
// The get_data() of a document is the package name
cout << i.get_document().get_data() << " ("
<< i.get_percent() << "%)" << endl;
}
}
Improving package managers
I noticed two posts on improving package managers none of which mentions Debtags.
Daniel Burrows mentions various issues:
- the current sections in Synaptic are useless
- there are better keyword search technologies than strstr()
- we could use popularity contest data to sort results
- it would be cool to do amazon-like things using popcon data
David Nusinov mentions that the ideal package manager should look like Google, where you search for things using just a simple one line text entry and pick from the results what you want to install.
I should probably do a bit of recap of things that have been going on.
I'll go through that list again:
- The current sections in Synaptic are useless
Agreed. This used to be a bug about this, which has been closed by Debtags more than one year ago. We now have much more useful category data for about 73% of the archive (including experimental), but what we lack is software using it.
Here's a quick trick to try:
- install debtags, and this gives you
an easy to read text file in
/var/lib/debtags/package-tags. - from that file, pick packages that have the tags
role::program,scope::applicationandinterface::x11. - display the results, and use the tags
works-with::*anduse::*to navigate the results.
There is a python-debian package in experimental that has a debtags module you could play with.
Why is that that so far noone has written a simple package manager just for gamers, which uses only the game::* tags?
Do you think Debtags gives you too many tags? Then check out:
- The Debtags smart search, and especially how it does not show you all the tags, but it is able to infer the tags you want from your google-like query (hi David!).
- The Debtags tag editor, and especially the search-as-you type feature on all the tags and the tag search (analogous to the Debtags smart search, but it only searches tags.
- The Debtags tag cloud, and if you don't like that one try to make your own: there are countless ways of generating tag clouds from Debtags data.
To summarise so far, we not only do have better categories, but also a number of cool algorithms to use them, and some interface prototypes as well. Just don't expect me to write a package manager as well: that's a job that so far I decided to leave to someone else. adept gave it a try, with positive results.
- there are better keyword search technologies than strstr()
Indeed, Xapian for example. I use it as part of the backend of the Debtags smart search, and here's our Xapian-powered normal keyword based package search interface which does stemming, indexing and all you want to ask from a serious full text index.
In that page you don't see all the nice features of Xapian, but only the ones that I needed for my Debtags evil plans. Have a look at the documentation and give it a try.
Here is a way to see Xapian's similarity matching in action:
- go to the Go tagging! page
- click on a random untagged package
- the system gives you a rather relevant selection of tags
- look at it again: the package was untagged: how could the web engine possibly figure those tags out?
What is happening under the scenes is that:
- I ask Xapian: "what packages are similar to this one?".
- I aggregate the tags of the resulting packages.
- I rank the tags by how many resulting packages have them.
While we are on this topic, why don't we decide that we maintain a Xapian index
of our package descriptions in, for example, /var/lib/apt/fulltext/, so that
various applications can share it?
- we could use popularity contest data to sort results
Indeed. Anyone would like to implement this little "popcon" tool? Having the data easily accessible locally can encourage people to use them.
The Debtags Go tagging! page already uses popcon data to show the most common untagged packages at the top, with double reason: it shows packages that more people are likely to know (and therefore likely to categorise) and it pushes for the most common packages to be tagged more urgently.
- it would be cool to do amazon-like things using popcon data
Indeed.
Anyone volunteers to implement a prototype? The full unaggregated (but
anonymised) popcon data are accessible to every Debian Developer on the host
gluck.debian.org in the directory
/org/popcon.debian.org/popcon-mail/popcon-entries.
Ideally one can do many interesting things with this concept: besides tag suggestions, one could identify the packages that are most representative of an installed system, and also offer negative suggestions like: "people who have packages like yours usually don't have this package: would you like to remove it?".
There is more than all this that could be done. Recently, almost by accident,
I had the idea of querying packages by example, like pointing to a
file and find packages that can work with
it. I've asked
Jeroen to have
Mole collect info on all files that could
possibly get installed in /usr/lib/mime/packages/ (as suggested by Bernhard
R. Link),
to see if that prototype can be made more accurate.
Query by similarity would be nice: I don't like this program, but what else do we have that does the same job? This is best implemented using Debtags data, since it directly maps to semantic properties. Note that you don't have to show a single tag to the user to implement this kind of interface. Do we have a way to point at the X window of an application and get the name of the package that installed it? Wouldn't it be about time to have it?
Why don't we have a system updater utility that shows the Debian weather?
Why aren't more people playing with semantic web?
But more generally, the problem with package managers is that we seem to be irrationally compulsive in wanting to make the one and only big easy and complete interface for everyone. Other more reasonable people would tell you that if you have two very different kinds of users you may want to consider having two different user interfaces.
Ubuntu for example installs by default 3 package manager interfaces: Synaptic; the thing that you access from the application menu to add applications to it; and the update manager. Does it sound like a waste? To me it makes lots of sense.
We have lots of interesting, usable metadata; we have algorithms; we have prototypes; we have ideas for lots of cool, implementable features. The question is, are we able to write applications that just combines what is needed from all this treasure to provide the right interface(s) for our base(s) of users?
Even if my English in 2004 wasn't easy to understand, a read here might still be useful.
There is so much really cool stuff to be written, just within reach.



