Filtering planet entries
Here is how to setup liferea not to show me some entries in Planet Debian:
- Create a script that reads the rss from stdin, removes the entries you don't want and then writes the rss to stdout;
- From the feed properties in liferea, choose the source tab, enable the conversion filter and point that at your script.
Now you just need a simple script that filters the RSS. Here is mine:
#!/usr/bin/python
# Copyright (C) 2007 Enrico Zini <enrico@debian.org>
# This software is licensed under the therms of the GNU General Public
# License, version 2 or later.
import libxml2, re
# What links we should filter out
unwanted = re.compile(r"^(http://feed1.example.com|http://feed2.example.com)")
doc = libxml2.parseFile("-")
root = doc.getRootElement()
# Create an xpath context and register the namespaces
xpc = doc.xpathNewContext()
for d in root.nsDefs():
if d.name == None:
xpc.xpathRegisterNs("rss", d.content)
else:
xpc.xpathRegisterNs(d.name, d.content)
# Remove unwanted items from the channel list
for x in xpc.xpathEval("/rdf:RDF/rss:channel/rss:items/rdf:Seq/rdf:li"):
res = x.nsProp("resource", "http://www.w3.org/1999/02/22-rdf-syntax-ns#")
if unwanted.match(res):
x.unlinkNode()
x.freeNode()
# Remove unwanted items from the item list
for x in xpc.xpathEval("/rdf:RDF/rss:item"):
res = x.nsProp("about", "http://www.w3.org/1999/02/22-rdf-syntax-ns#")
if unwanted.match(res):
x.unlinkNode()
x.freeNode()
# Serialize the result
print doc.saveFormatFile("-", True)
Now, getting to this simple script took some spitting blood. Basically, in Debian we seem to have lots of simple libraries for:
- parsing rss, but not outputting it;
- outputting rss, but not parsing it;
- pasing and outputting rss, but not modifying it.
I tried, in order:
- The standard
ruby rss module, after seeing
this. However,
rss.channel.itemsdoesn't seem to be a normal array anymore, and I could not find any documentation on how to modify it. - python-feedparser allows you to read rss and change it, but not to serialize it.
-
libxml-rss-perl can read, modify and serialize, but serializing loses all the content of the items. Try this script and see:
#!/usr/bin/perl -w use strict; use warnings; use XML::RSS; my $rss = new XML::RSS; $rss->parsefile("/tmp/rss10.xml"); print $rss->as_string;Update: Nemui Ailin told me that with the most recent upstream version it works. I've reported the bug
-
libxml-rsslite-perl does not serialize. Plus, it parses rss via crude regexps and its manpage has a longish list of things that can go wrong.
- libmrss0-dev has only a README that points to example files that are not packaged. I reported it as a bug.
- The description of any other module that I could find that would mention rss was quite clearly showing that it didn't support one of the three (read, edit, reserialize) features that I needed. With a quick look at the code, I couldn't find out if cl-rss supported serialisation.