About apt-xapian-index, I have already posted:

Today I'll show how to create tag clouds. Not only that, but I'll show how to implement tag clouds that change as the user types a query.

This example uses python-gtk, and has been created together with Matteo Zandi.

axi-searchcloud screenshot

Generating a tag cloud out of any Xapian query is simple, and it is just a matter of presenting into a tag cloud the information that you get with the technique shown in a smart way of querying tags: you get the tags related to the query, and you lay out their names with a font size proportional to their Xapian rank.

For the presentation, we can load pretty names from the Debtags vocabulary in /var/lib/debtags/vocabulary:

from debian_bundle import deb822

# Facet name -> Short description
facets = dict()
# Tag name -> Short description
tags = dict()
for p in deb822.Deb822.iter_paragraphs(open("/var/lib/debtags/vocabulary", "r")):
    if "Description" not in p: continue
    desc = p["Description"].split("\n", 1)[0]
    if "Tag" in p:
        tags[p["Tag"]] = desc
    elif "Facet" in p:
        facets[p["Facet"]] = desc

The query then goes on as usual, and when we get the tags from the eset we also record their score and normalise it between 0 and 1. I found that computing the logarithm of scores helps to avoid having a tag cloud with a few huge tags and a lot of tiny tiny tags:

class Filter(xapian.ExpandDecider):
    def __call__(self, term):
        return term[:2] == "XT"

def format(k):
    if k in tags:
        facet = k.split("::", 1)[0]
        if facet in facets:
            return "<i>%s: %s</i>" % (facets[facet], tags[k])
        else:
            return "<i>%s</i>" % tags[k]
    else:
        return k

taglist = []
maxscore = None
for res in enquire.get_eset(15, rset, Filter()):
    # Normalise the score in the interval [0, 1]
    weight = math.log(res.weight)
    if maxscore == None: maxscore = weight
    tag = res.term[2:]
    taglist.append(
        (tag, format(tag), float(weight) / maxscore)
    )
taglist.sort(key=lambda x:x[0])

Finally, you mark up a gtkhtml2.Document to display in a gtkhtml2 widget:

def mark_text_up(result_list):
    # 0-100 score, key (facet::tag), description
    document = gtkhtml2.Document()
    document.clear()
    document.open_stream("text/html")
    document.write_stream("""<html><head>
<style type="text/css">
a { text-decoration: none; color: black; }
</style>
</head><body>""")
    for tag, desc, score in result_list:
        document.write_stream('<a href="%s" style="font-size: %d%%">%s</a> ' % (tag, score*150, desc))
    document.write_stream("</body></html>")
    document.close_stream()
    return document

That's it, try it out.

You can use the git web interface to get to the full source code and the module it uses.

debian debtags eng pdo

2009-06-06 00:57:39+02:00