Pages about Debtags.
apt-xapian-index: dynamically generated tag clouds
About apt-xapian-index, I have already posted:
- an introduction of apt-xapian-index;
- an example of how to query it;
- a way to add simple result filters to the query;
- a way to suggest keywords and tags to use to improve the query.
- a way to search for similar packages.
- a way to implement an adaptive cutoff on result quality.
- a smart way of querying tags
- how to implement search as you type
Today I'll show how to create tag clouds. Not only that, but I'll show how to implement tag clouds that change as the user types a query.
This example uses python-gtk, and has been created together with Matteo Zandi.
Generating a tag cloud out of any Xapian query is simple, and it is just a matter of presenting into a tag cloud the information that you get with the technique shown in a smart way of querying tags: you get the tags related to the query, and you lay out their names with a font size proportional to their Xapian rank.
For the presentation, we can load pretty names from the Debtags
vocabulary in /var/lib/debtags/vocabulary:
from debian_bundle import deb822 # Facet name -> Short description facets = dict() # Tag name -> Short description tags = dict() for p in deb822.Deb822.iter_paragraphs(open("/var/lib/debtags/vocabulary", "r")): if "Description" not in p: continue desc = p["Description"].split("\n", 1)[0] if "Tag" in p: tags[p["Tag"]] = desc elif "Facet" in p: facets[p["Facet"]] = desc
The query then goes on as usual, and when we get the tags from
the eset we also record their score and normalise it
between 0 and 1. I found that computing the logarithm of scores
helps to avoid having a tag cloud with a few huge tags and a lot of
tiny tiny tags:
class Filter(xapian.ExpandDecider): def __call__(self, term): return term[:2] == "XT" def format(k): if k in tags: facet = k.split("::", 1)[0] if facet in facets: return "<i>%s: %s</i>" % (facets[facet], tags[k]) else: return "<i>%s</i>" % tags[k] else: return k taglist = [] maxscore = None for res in enquire.get_eset(15, rset, Filter()): # Normalise the score in the interval [0, 1] weight = math.log(res.weight) if maxscore == None: maxscore = weight tag = res.term[2:] taglist.append( (tag, format(tag), float(weight) / maxscore) ) taglist.sort(key=lambda x:x[0])
Finally, you mark up a gtkhtml2.Document to display
in a gtkhtml2 widget:
def mark_text_up(result_list): # 0-100 score, key (facet::tag), description document = gtkhtml2.Document() document.clear() document.open_stream("text/html") document.write_stream("""<html><head> <style type="text/css"> a { text-decoration: none; color: black; } </style> </head><body>""") for tag, desc, score in result_list: document.write_stream('<a href="%s" style="font-size: %d%%">%s</a> ' % (tag, score*150, desc)) document.write_stream("</body></html>") document.close_stream() return document
That's it, try it out.
You can use the git web interface to get to the full source code and the module it uses.
Posted Tue 12 Aug 2008 11:21:18 CESTHandling concurrent update-apt-xapian-index runs
So, update-apt-xapian-index takes a bit of a long time to run. When you install the package, if the index isn't there then postinst will rebuild it in background, good.
Now, suppose you have an advanced package manager that after installing apt-xapian-index, detects that it's installed, detects that there is no database and runs an update with a nice GUI feedback. If this happened while the update is running in the background, the usual thing to do would be to abort the update; however, the package manager wouldn't be able to know when the indexing has finished and it can enable the advanced features.
Let's not do the usual thing, then:
# `progress' is the object that does progress reporting. Commandline options # can choose a silent one, a machine readable one or a user readable one. # Lock the session so that we prevent concurrent updates lockfd = os.open(XAPIANDBLOCK, os.O_RDWR | os.O_CREAT) lockpyfd = os.fdopen(lockfd) try: fcntl.lockf(lockpyfd, fcntl.LOCK_EX | fcntl.LOCK_NB) # `ServerProgress` is a proxy progress reporter that listens for # connections on a Unix socket, and broadcasts progress status updates both # to the local progress reporter and to whoever connects to the socket. progress = ServerProgress(progress) except IOError, e: if e.errno == errno.EACCES or e.errno == errno.EAGAIN: progress.notice("Another update is already running: showing its progress.") # `ClientProgress' is another proxy class that connects to the socket # forwards progress updates to whatever the user has chosen as the # local progress indicator. childProgress = ClientProgress(progress) childProgress.loop() sys.exit(0) else: raise
What happens here is that if the upload is in progress, the second update-apt-xapian-index will not be able to lock a lockfile. But instead of exiting, it connects to the other indexer via a unix socket, and shows the progress just as if it were the one doing the indexing.
So, now, suppose I run update-apt-xapian-index:
# update-apt-xapian-index
Reading Debtags database: done.
Rebuilding Xapian index... 20%
And suppose I run another one:
# update-apt-xapian-index
Another update is already running: showing its progress.
Rebuilding Xapian index... 20%
And suppose I run yet another one:
# update-apt-xapian-index --batch-mode
notice: Another update is already running: showing its progress.
begin: Rebuilding Xapian index
progress: 19/100
progress: 20/100
And suppose I run yet another one (running with -q shows nothing, but it will wait until the master indexer has finished):
# update-apt-xapian-index -q
This way, the advanced package manager (adept 3 does this) will show a progress indicator and correctly let the user know when indexing has finished... only, if an update is already running, the progress indicator will not start at 0% because the update was already in progress, and the user will think: "wow, I've been lucky!".
The server side of the socket only uses nonblocking I/O, so there's no risk of slave indexers causing the master indexer to hang.
It's been simple to implement, and brillant. It is in
apt-xapian-index 0.15, which I have just uploaded to
sid. Shame it's a bit late for lenny.
apt-xapian-index: search as you type
I've recently posted:
- an introduction of apt-xapian-index;
- an example of how to query it;
- a way to add simple result filters to the query;
- a way to suggest keywords and tags to use to improve the query.
- a way to search for similar packages.
- a way to implement an adaptive cutoff on result quality.
- a smart way of querying tags
Note that I've rewritten all the old posts to only show the main code snippets: if you were put off by the large lumps of code, you may want to give it another go.
Today I'll show how to implement a very attractive feature for a user interface: search as you type. The idea is that you don't need to press enter to fire up a query: instead, the results materialise in front of your eyes as you type them.
The example I created uses curses, but the idea is good on any interactive user interface.
The main thing to keep in mind with search as you type is that the last word is likely to be partially typed, unless maybe some timeout expired since the user's last keystroke.
Xapian comes into help here, as it allows us to expand the partially typed word into an OR query with all the terms that start with it. This means that if we are typing, for example, "progr", we can turn the query into "program OR programmer OR programming OR programmed [...and so on...]".
I won't show the UI code, except a simple input loop that triggers the query at every keystroke:
def mainloop(self):
while True:
c = self.win.getch()
self.line += chr(c)
self.results.update(self.line)
The interesting part is in the update function.
First we split the line in words and convert the words into a query:
# Split the line in words
args = self.splitline.split(line)
# Convert the words into terms for the query
terms = termsForSimpleQuery(args)
Then we expand the last word with all possible completions:
# Since the last word can be partially typed, we add all words that
# begin with the last one.
terms.extend([x.term for x in db.allterms(args[-1])])
Now we can build the query. Of course you can add all other sorts of things to the query, for example a boolean expression of tag filter like in axi-query-pkgtype.py; Xapian will cope.
# Build the query
query = xapian.Query(xapian.Query.OP_OR, terms)
Finally the query. For bonus points you can do the adaptive cutoff trick to discard bad results.
In my case, since I don't implement scrolling of results, I also limit them to what fits in the window:
# Retrieve as many results as we can show
mset = enquire.get_mset(0, self.size - 1)
Finally, draw the results on screen:
# Redraw the window
self.win.clear()
# Header
self.win.addstr(0, 0, "%i results found." % mset.get_matches_estimated(), curses.A_BOLD)
# Results
for y, m in enumerate(mset):
# /var/lib/apt-xapian-index/README tells us that the Xapian document data
# is the package name.
name = m[xapian.MSET_DOCUMENT].get_data()
# Get the package record out of the Apt cache, so we can retrieve the short
# description
pkg = cache[name]
# Print the match, together with the short description
self.win.addstr(y+1, 0, "%i%% %s - %s" % (m[xapian.MSET_PERCENT], name, pkg.summary))
self.win.refresh()
That's it, try it out.
You can use the wsvn interface to get to the full source code and the module it uses.
You can see a similar technique working in goplay, where it is also integrated with an interactive tag filter.
Next in the series: dynamic tag cloud.
Posted Tue 06 Nov 2007 23:02:46 CETapt-xapian-index: smart way of querying tags
I've recently posted:
- an introduction of apt-xapian-index;
- an example of how to query it;
- a way to add simple result filters to the query;
- a way to suggest keywords and tags to use to improve the query.
- a way to search for similar packages.
- a way to implement an adaptive cutoff on result quality.
Note that I've rewritten all the old posts to only show the main code snippets: if you were put off by the large lumps of code, you may want to give it another go.
Today I'll show how to implement a really good way of searching for Debtags tags. When I say really good, I mean the sort of good that after you run it you wonder how could it possibly manage to do it.
The idea is simple: you run a package search, but instead of showing the resulting packages, you ask Xapian to suggest tags like we saw in axi-query-expand.py.
For extra points, I'll use an adaptive cutoff in chosing the packages that go in the rset.
So, let's ask the user to enter some keywords to look for tags, and use them to run a normal package query:
# Build the base query query = xapian.Query(xapian.Query.OP_OR, termsForSimpleQuery(args)) # Perform the query enquire = xapian.Enquire(db) enquire.set_query(query)
Now, instead of showing the results of the query, we ask Xapian what are the tags in the index that are most relevant to this search.
First, we pick some representative packages for the expand:
# Use an adaptive cutoff to avoid to pick bad results as references matches = enquire.get_mset(0, 1) topWeight = matches[0].weight enquire.set_cutoff(0, topWeight * 0.7) # Select the first 10 documents as the key ones to use to compute relevant # terms rset = xapian.RSet() for m in enquire.get_mset(0, 10): rset.add_document(m[xapian.MSET_DID])
Then we define the filter that only keeps tags:
# Filter out all the keywords that are not tags class Filter(xapian.ExpandDecider): def __call__(self, term): "Return true if we want the term, else false" return term[:2] == "XT"
Then we print the tags:
# This is the "Expansion set" for the search: the 10 most relevant terms that # match the filter eset = enquire.get_eset(10, rset, Filter()) # Print out the results for res in eset: print "%.2f %s" % (res.weight, res.term[2:])
That's it. We turned a package search into a tag search, and this allows us to search for tags using keywords that are not present in the tag descriptions at all:
$ ./axi-query-tags.py explore the dungeons
27.50 game::rpg:rogue
26.14 use::gameplaying
17.53 game::rpg
10.27 uitoolkit::ncurses
...
$ ./axi-query-tags.py total world domination
7.55 use::gameplaying
5.68 x11::application
5.35 interface::x11
5.05 game::strategy
...
You can use the wsvn interface to get to the full source code and the module it uses.
You can see a similar technique working in the Debtags tag editor: enter a package, then choose "Available tags: search".
Next in the series: search as you type.
Posted Tue 06 Nov 2007 23:02:46 CETapt-xapian-index: adaptive quality cutoff
I've recently posted:
- an introduction of apt-xapian-index;
- an example of how to query it;
- a way to add simple result filters to the query;
- a way to suggest keywords and tags to use to improve the query.
- a way to search for similar packages.
Note that I've rewritten all the old posts to only show the main code snippets: if you were put off by the large lumps of code, you may want to give it another go.
Today I'll show how to implement an adaptive cutoff to get rid of the worse results.
Recall that Xapian shows results by decreasing order of quality, and as we pull more an more results out, we reach a point where the matches are so approximated that they look random.
This can be a problem if we want to change the order of the result, for example we may want to sort by package size, or by popcon popularity. There are many scenarios in which a really bad match could end up at the top of the results.
For most cases, you just want to say "discard all results whose quality is less than 70%". But sometimes you have queries that OR lots of terms, and even your top result, while still being a very good result, may be below the cutoff you decided.
Implementing an adaptive cutoff is extremely simple: first, you get the quality estimate of the top result:
# Retrieve the first result, and check its relevance
matches = enquire.get_mset(0, 1)
topWeight = matches[0].weight
Then you tell Xapian that you want a cutoff value that is, for example, 70% of that:
# Tell Xapian that we only want results that are at least 70% as good as that
enquire.set_cutoff(0, topWeight * 0.7)
Finally, you repeat the query. If you want, you can go for bigger result sets, as the cutoff will make it so that if you have lots of results, they will very likely be all good results:
matches = enquire.get_mset(0, 200)
This is it.
You can use the wsvn interface to get to the full source code and the module it uses.
Next in the series: smart way of querying tags.
Posted Sat 27 Oct 2007 21:51:43 CESTapt-xapian-index: performing a simple query
I've recently posted an introduction of apt-xapian-index.
Today I'll show how to make simple queries to apt-xapian-index. If you feel like reimplementing my examples in another language, let me know and I'll include it to the post.
What I'm going to build is a replacement for apt-cache
search that:
- is much faster than apt-cache search
- scores results by relevance, so you get the best matches first
- does stemming of search terms, so it matches 'edit' when you type 'editing'
- understands debtags tags.
This is just a beginning: in future blog posts I'll show how to enhance a search with interesting advanced features.
First thing, we need to import the Xapian module, found in the
package python-xapian. Documentation on the Python
Xapian API can be found in
/usr/share/doc/python-xapian.
import xapian
Then we open the apt-xapian-index database:
# Instantiate a xapian.Database object for read only access to the index db = xapian.Database("/var/lib/apt-xapian-index/index")
Now we build a query from the command line arguments. We'll
assume that if an argument is in the form foo::bar,
then it's a Debtags tag instead of a normal keyword.
For normal keywords, we also search for the stemmed version, so that we can, for example, find "editing" when the user searches for "edit".
# Stemmer function to generate stemmed search keywords stemmer = xapian.Stem("english") # Build the terms that will go in the query terms = [] for word in args: if word.islower() and word.find("::") != -1: # According to /var/lib/apt-xapian-index/README, Debtags tags are # indexed with the 'XT' prefix. terms.append("XT"+word) else: # If it is not a Debtags tag, then we consider it a normal keyword. word = word.lower() # The index stores keyword all in lowercase terms.append(word) # If the word has a stemmed version, add it to the query. # /var/lib/apt-xapian-index/README tells us that stemmed terms have a # 'Z' prefix. stem = stemmer(word) if stem != word: terms.append("Z"+stem)
Now we have the terms for the query, and we can create a query that ORs them together.
One may ask, why OR and not AND? The reason is that, contrarily to apt-cache, Xapian scores results according to how well they matched.
Matches that match all the terms will score higher than the others, so if we build an OR query what we really have is an AND query that gracefully degenerates to closer matches when they run out of perfect results.
This allows stemmed searches to work nicely: if you look for 'editing', then the query will be 'editing OR Zedit'. Packages with the word 'editing' will match both and score higher, and packages with the word 'edited' will still match 'Zedit' and get included in the results.
# OR the terms together into a Xapian query. # query = xapian.Query(xapian.Query.OP_OR, terms)
We then run the query. Queries are run through a
xapian.Enquire object:
# Perform the query
enquire = xapian.Enquire(db)
enquire.set_query(query)
The Enquire object returns results as an mset. An mset represents a view of the result set, and can be iterated to access the resulting documents. Here we iterate the mset and output the result of the query, looking up short descriptions with apt:
# Display the top 20 results, sorted by how well they match matches = enquire.get_mset(0, 20) print "%i results found." % matches.get_matches_estimated() print "Results 1-%i:" % matches.size() for m in matches: # /var/lib/apt-xapian-index/README tells us that the Xapian document data # is the package name. name = m[xapian.MSET_DOCUMENT].get_data() # Get the package record out of the Apt cache, so we can retrieve the short # description pkg = cache[name] # Print the match, together with the short description print "%i%% %s - %s" % (m[xapian.MSET_PERCENT], name, pkg.summary)
This is it.
You can use the wsvn interface to get to the full source code.
You can run the code passing keywords and Debtags tags. Try running it as:
./axi-query-simple.py role::program image edit
./axi-query-simple.py role::program game::arcade
./axi-query-simple.py kernel image
You can search Debtags tags using debtags
tagsearch. In a later blog post, I'll show how to use
apt-xapian-index to implement a
better-than-you-would-ever-have-thought-possible tag search.
Next in the series: adding simple result filters to the query.
Posted Fri 26 Oct 2007 16:43:21 CESTapt-xapian-index: searching for similar packages
I've recently posted:
- an introduction of apt-xapian-index;
- an example of how to query it;
- a way to add simple result filters to the query;
- a way to suggest keywords and tags to use to improve the query.
Today I'll show how to abuse Xapian to show a list of packages similar to a given one.
This time I'll try just linking to the code in wsvn and showing in the blog only show the most important bits.
So, we have a package name, and we want to show what are the packages similar to that one.
To do it, we simply build a big OR query with all the terms indexed for that package: Xapian will show us the packages whose terms are most similar, and that does the trick.
This works because Xapian gives us the best results first, therefore even if no package except the given one will give an exact match, we still get the nearest matches first.
In order to get the list of indexed terms given a package name we need to do two things:
- Get the Xapian document for the package.
- Get the termlist of the document.
To get the Xapian document we search for a term that only that document can have. In the index, the package name is indexed with the special prefix "XP", so we can search for that:
def docForPackage(pkgname): "Get the document corresponding to the package with the given name" # Query the term with the package name query = xapian.Query("XP"+pkgname) enquire = xapian.Enquire(db) enquire.set_query(query) # Get the top result only matches = enquire.get_mset(0, 1) if matches.size() == 0: return None else: m = matches[0] return m[xapian.MSET_DOCUMENT]
Then we build the big term list, by iterating the termlist of the document:
# Build a term list with all the terms in the given packages terms = [] # Get the document corresponding to the package name doc = docForPackage(pkgname) if not doc: continue # Retrieve all the terms in the document for t in doc.termlist(): if len(t.term) < 2 or t.term[:2] != 'XP': terms.append(t.term)
Note that it's trivial to fetch terms from more than one document, if you want to query "all packages a bit like this one and a bit like that one", although that's less of a useful feature.
Lastly, we build the final query:
# Build the big OR query query = xapian.Query(xapian.Query.OP_AND_NOT, # Terms we want xapian.Query(xapian.Query.OP_OR, terms), # AND NOT the input packages xapian.Query("XP"+pkgname))
I add an AND_NOT part with the input package name so that we don't get in the output the package that we asked for.
This is it:
$ ./axi-query-similar.py debtags
20309 results found.
Results 1-20:
33% debtags-edit - GUI browser and editor for Debian Package Tags
27% tagcolledit - GUI editor for tagged collections
25% libtagcoll2-dev - Functions used to manipulate tagged collections (development version)
24% tagcoll - Commandline tool to perform operations on tagged collections
19% packagesearch - GUI for searching packages and viewing package information
18% doodle - Desktop Search Engine (client)
18% doodled - Desktop Search Engine (daemon)
18% libept0 - High-level library for managing Debian package information
18% upgrade-system - system upgrader from Konflux
18% libept-dev - High-level library for managing Debian package information
17% ept-cache - Commandline tool to search the package archive
16% tracker-utils - metadata database, indexer and search tool - commandline tools
[...]
You can use the wsvn interface to get to the full source code and the module it uses.
Next in the series: adaptive quality cutoff.
Posted Fri 26 Oct 2007 16:40:33 CESTIntroducing apt-xapian-index
apt-xapian-index has just been approved into experimental, and in the next days I'm going to blog more about it.
The package contains a tool called
update-apt-xapian-index that indexes Debian package
metadata into a Xapian index
located at /var/lib/apt-xapian-index/index.
The index is read-only, except for
update-apt-xapian-index; however, it is
world-readable: every user can query it, all the time, at the same
time, even during index updates.
The index can contain more than package descriptions.
update-apt-xapian-index indexes data using plugins
located in /usr/share/apt-xapian-index/plugins, and
any package can add their own. For example, debtags will provide a
plugin to index tags.
Since Xapian can index numeric values as well, if anyone makes a popcon package that downloads popcon information, they can provide a plugin to index popcon values. If anyone makes an iterating package that downloads ratings, they can provide a plugin to index ratings.
Another plugin could be a specialised Debian stemmer that generates token such as foo out of libfoo-dev.
I think you get the idea: it's very extensible. You can have a look at the initial set of plugins in subversion.
The index is also self-documenting, so that one can
keep track of all the intresting things that can be found in it.
update-apt-xapian-index does not only maintain the
index, but also the file
/var/lib/apt-xapian-index/README that aggregates
documentation provided by the plugins.
To query the index, you just use Xapian. Debian contains Xapian bindings for various languages:
- libxapian-dev for C++
- libsearch-xapian-perl for Perl
- libxapian-ruby1.8 for Ruby
- python-xapian for Python
- tclxapian for Tcl
- php5-xapian for PHP
In the next days I am going to post various example queries and interesting tricks that the index allows you to do.
It's going to be fun.
Next in the series: performing a simple query.
Posted Fri 26 Oct 2007 15:52:32 CESTapt-xapian-index: suggesting new terms for improving the query
I've recently posted an introduction of apt-xapian-index, an example of how to query it and a way to add simple result filters to the query.
Today I'll show how to use a very interesting feature of Xapian: computing a list of the best terms to use to improve the query. I'll expand apt-query-pkgtype.py to write, after the results, the suggested terms.
If you feel like reimplementing my examples in another language, let me know and I'll include it to the post.
Let's start with a few definitions taken from the Xapian glossary:
RSet (Relevance Set)
The Relevance Set (RSet) is the set of documents which have been marked by the user as relevant. They can be used to suggest terms that the user may want to add to the query (these terms form an ESet), and also to adjust term weights to reorder query results.
ESet (Expand Set)
The Expand Set (ESet) is a ranked list of terms that could be used to expand the original query. These terms are those which are statistically good differentiators between relevant and non-relevant documents.
What I'm doing now is: after printing the normal result, I build an rset with all the results. For better results one can ask the user to pick the results they like (make a GUI with that: it should rock!), but starting with the top results in the RSet is a good enough default:
# Now, we ask Xapian what are the terms in the index that are most relevant to # this search. This can be used to suggest to the user the most useful ways of # refining the search. # Select the first 10 documents as the key ones to use to compute relevant # terms (matches is the mset returned by a normal query) rset = xapian.RSet() for m in matches: rset.add_document(m[xapian.MSET_DID])
Then I use the RSet to compute the ESet, and display the results to the user:
# This is the "Expansion set" for the search: the 10 most relevant terms eset = enquire.get_eset(10, rset) # Print it out. Note that some terms have a prefix from the database: can we # filter them out? Indeed: Xapian allow to give a filter to get_eset. # Read on... print print "Terms that could improve the search:", print ", ".join(["%s (%.2f%%)" % (res.term, res.weight) for res in eset])
You can also abuse this feature to show what are the tags that are most related to the search results. This allows you to turn a search based on keywords to a search based on semantic attributes, which would be an absolutely stunning feature in a GUI.
We can do it thanks to Xapian allowing to specify a filter for the output of get_eset. This filter filters out all the keywords that are not tags, or that were in the list of query terms:
class Filter(xapian.ExpandDecider): def __call__(self, term): "Return true if we want the term, else false" return term[:2] == "XT"
I just rerun get_eset like above, but adding the
filter, to show only a suggestion of tags:
# This is the "Expansion set" for the search: the 10 most relevant terms that # match the filter eset = enquire.get_eset(10, rset, Filter()) # Print out the resulting tags print print "Tags that could improve the search:", print ", ".join(["%s (%.2f%%)" % (res.term[2:], res.weight) for res in eset]) sys.exit(0)
This is it.
You can use the wsvn interface to get to the full source code and the module it uses. Try running it as:
axi-query-expand.py --type=gui image editor
axi-query-expand.py --type=cmdline image editor
axi-query-expand.py --type=game image editor
you will see that as the results change, the suggestions change as well.
I'm happy to see that Xapian often suggests tags: this means that tags are useful discriminators in search results, and that all the work done on Debtags has been a good work.
Notice that the suggestions change as you change any of the terms, the custom filter, and the tags you put in the search. Try to imagine how such a thing could improve Synaptic. What if I told you that Xapian is fast enough that all of this can happen live while you type?
Next in the series: searching for similar packages.
Posted Fri 26 Oct 2007 15:50:18 CESTapt-xapian-index: adding simple result filters to the query
I've recently posted an introduction of apt-xapian-index and an example of how to query it.
Today I'll show how to add simple Debtags-based result filters
to a query. I'll expand axi-query-simple.py
with a --type comman line option that allows to filter
for package type. As a starter I'll implement filtering for "game",
"gui", "cmdline" or "editor".
If you feel like reimplementing my examples in another language, let me know and I'll include it to the post.
The filters are implemented using Xapian queries. Xapian allows to build queries as well as combine existing queries using a large amount of operators.
This allows us to create a little database of ready made queries:
# This is our little database of simple Debtags filters we provide: the name # entered by the user in "--type" maps to a piece of Xapian query filterdb = dict( # We can do simple AND queries... game = xapian.Query(xapian.Query.OP_AND, ('XTuse::gameplaying', 'XTrole::program')), # Or we can do complicate binary expressions... gui = xapian.Query(xapian.Query.OP_AND, xapian.Query('XTrole::program'), xapian.Query(xapian.Query.OP_OR, 'XTinterface::x11', 'XTinterface::3d')), cmdline = xapian.Query(xapian.Query.OP_AND, 'XTrole::program', 'XTinterface::commandline'), editor = xapian.Query(xapian.Query.OP_AND, 'XTrole::program', 'XTuse::editing')) # Feel free to invent more
It's now trivial to lookup a query and AND it to the main query:
# Build the query by ORing together various terms query = xapian.Query(xapian.Query.OP_OR, termlist) # If a filter was requested, AND it with the query if extrafilter: query = xapian.Query(xapian.Query.OP_AND, query, filterdb[extrafilter])
That's it. Simple, trivial feature to add to a search, but very useful and simple to use.
You can use the wsvn interface to get to the full source code and the module it uses. Try running it as:
axi-query-pkgtype.py --type=gui image editor
axi-query-pkgtype.py --type=cmdline image editor
axi-query-pkgtype.py --type=game image editor
you will see that the results change radically.
What are the games that match "image editor", you would ask?
74% xjig - An X11 jigsaw puzzle
69% wmpuzzle - WindowMaker dock app 4x4 puzzle
66% xflip - programs to mirror-image or melt your display
Jigsaws, of course 
The code is now colourised thanks to Simon Ward who pointed me at the syntax plugin for ikiwiki.
Next in the series: suggesting new terms for improving the query.
Posted Fri 26 Oct 2007 15:50:18 CEST