Package suggestions

I tried generating a list of 10 suggested packages for every package in the archive. You cat get it at http://people.debian.org/~enrico/Suggestions.gz and build nice things with it.

Generation took two hours of a nice 18 process on the busy gluck, but it can be run in any faster/less busy machine that has a local copy of the unaggregated popcon data.

This is (roughly) the generating algorithm:

  1. Take a package P.
  2. Query P on a Xapian index that indexes popcon submissions as documents and their packages as words.
  3. Get the first 20 resulting popcon votes.
  4. Score each package mentioned in those 20 results by a combination of how many votes mention it, its TFIDF scores in the various votes and the Xapian relevance score of the resulting votes.
  5. Take the top 10 packages as suggestions for P.

The code can be fetched with:

bzr branch http://people.debian.org/~enrico/2007-01/popcon/