Pages in English.
My rule to see if a framework is worth of attention
I came up with a little rule:
In order to be worth of any attention, a framework must be stable enough that I can charge money to train people to use it.
This probably applies to other kinds of software stacks, libraries, development environments and, well, to most software applications.
In the context of python web frameworks, this means that:
- If it changes API all the time it is not worth of attention, because my customers won't get value for their money, as they'd continuously need retraining and rewriting their software.
- If I see lots of DeprecationWarnings it is not worth of attention, because my customers will see them and blame me for teaching them deprecated stuff.
- If fixes for bugs affecting the stable version are only distributed "in a
recent git" or "in the next development version", and they are not
backported into a new bugfix-only stable release, then it is not worth of
attention, because:
- my customers' business is to develop their own products based on the framework.
- My customers' business is not to be maintaning in-house stable updates of the framework. Although if the framework's community is nice enough they might end up giving a hand.
- If it requires virtualenv or can only be obtained through easy_install it is
not worth of attention, because:
- my customers are not interested in maintaning custom deployment environments over time.
- My customers are not interested in tracking each and every single library's upstream development to keep their production system free of bugs.
- My customers are used to getting software through a proper distribution which also takes care of security updates.
- I am paid to teach them how to use a framework, not a custom python-only package management system.
- In my experience, if distributions have trouble keeping packages up to date, upstream is doing something fundamentally wrong.
In light of this rule, I regret to notice that I see very few python web frameworks worth of any attention.
On python stable APIs
There is a theory which states that if ever anyone discovers exactly what the Universe is for and why it is here, it will instantly disappear and be replaced by something even more bizarre and inexplicable.
There is another theory which states that this has already happened.
In Debian testing:
/usr/lib/python2.6/dist-packages/sqlalchemy/types.py:547: SADeprecationWarning: The Binary type has been renamed to LargeBinary.
In Debian Lenny:
ImportError: cannot import name LargeBinary
I was starting to think that SQLAlchemy wasn't too bad, since I've been using it for 6 months and I haven't seen its API change yet.
But there it is, a beautiful reminder that SQLAlchemy, too, is part of the marvelously autistic Python ecosystem.
Released cfget 0.16
I have released version 0.16 of cfget.
It is just a little bugfix release as I found a bug in the new expression parser, and while I was at it I simplified its code quite a bit.
On python, frameworks and TOOWTDI
The Python world is ridden with frameworks, microframeworks, metaframeworks and their likes. They are often very clever things, but more often than not they are a tool of despair.
A very peculiar thing about Python web frameworkish things is that there are so many of them. There's cherrypy (in its various API redesigns), fapws, gunicorn, bottle and flask, paste, werkzeug and flup, tornado, pylons, turbogears 1 and 2, django, repoze who, what and whatnot, all the myriad of rendering engines and buffet as a metathing on top of them, diesel, twisted, and I apologise if I don't spend my day listing and hyperlinking them all, I hope I made my point.
Frameworks are supposed to standardise some aspects of programming; the nice thing about standards is that there are so many of them to choose from, and they all suck, so I'll make my own.
But wasn't Python supposed to be the world of TIOOWTDI?
Ok, everbody knows it isn't. Just in the standard library there are 2 implementations of pickle and 2 urllibs. But people like the TIOOWTDI idea.
I believe the reason people like the TIOOWTDI idea is because it creates a framework. It standardises some aspects of programming, and defines building blocks that guarantee that people doing similar jobs will be using similar sets of components.
Let's take for example the datetime module in the standard library. It is an embarassing example of a badly designed module, so embarassing that the standard library documentation continuously fails to document its fundamental design flaws and common work-arounds hoping that noone notices them, but as a consequence each poor soul starting to use it for nontrivial things has to google for hours in despair to rediscover in how many ways it's broken.
But still, datetime works as a structure to hold those values that make a date, time, or full UTC timestamp. For that job it's become the standard, and as such it's an important component of the Python TIOOWTDI framework: one can use it to exchange datetimes among different libraries: for example ORMs are using it instead of rolling their own, which makes database programming so much easier when date/time is involved.
Even if the implementation is far from perfect, once we apply TIOOWTDI to dates and timestamps, python code from different authors can exchange dates without worries. This is much better than having 3 different superior datetime libraries and having to convert date objects from one to another when passing values from a web form to an ORM.
There is an often overlooked Python framework. The Pyton framework. It's called TIOOWTDI.
All the micro-mini-midi-maxi-meta-frameworks that people scatter around, are, or should be, just experiments, proofs of concept, competing ideas waiting to be distilled in The Only One Way, bringing the Python experience one step forward.
What is unfortunate is that this last distillation thing happens so rarely that people get used to the idea of having to use proof of concept code to get things done.
Update:
this post apparently wasn't very clear, so here is some clarification:
- that python, plus the idea of TIOOWTDI, in fact generate a framework, the framework;
- that framework currently includes very little web stuff;
- the web stuff is currently shipped in countless prototypes and proof of concept things;
- something from all the prototypes people develop ought to be brought into the TIOOWTDI "framework";
- otherwise people get used (or worse, forced) to use proof of concept throwaway code to get their job done.
Released cfget 0.15
I have released version 0.15 of cfget.
cfget is a tool to extract values from ini-style config files. A trivial thing, really.
It is also simple to install: it is a single python executable and it has no dependencies besides the python standard library.
It is trivial and simple, but because of the complex requirements (and sponsorship) of ISAC - CNR it has recently accumulated quite a set of features, and it manages to get a remarkable lot of things done.
There are several news since 0.8, worth a rather major announcement:
Added --dump=pickle
Now all the contents of the configuration files, plus all the contents generated by cfget plugins, can be dumped in pickle format.
This provides a quick and dirty way to load all cfget-generated data into a python dict:
data = pickle.loads( subprocess.Popen( ["cfget", "--dump=pickle"], stdout=subprocess.PIPE).communicate()[0])
It sounds like a rather complicated way to read a configuration file, but if you have various plugins that compute nontrivial derived configuration values, that quick&dirty hack could be quite useful.
Curly brace expansion
Suppose that you have a config file like this:
[general] mode = show [show] command = display [edit] command = gimp
And you want to get the command from the section indicated by general/mode.
Now you can, very easily: cfget '{general/mode}/command'
When it notices curly braces, cfget will literally replace them with the result of querying their contents, then parse the expression again.
Simple expression support
ISAC - CNR are using cfget to configure the run of a rather complicated physical model, and use plugins to derive all sorts of values form the base configuration.
This works, but there are times when adding a function to a plugin sounds like
overkill: for example, sometimes one needs foo/bar + 1, or just the hour of a
timestamp.
For those simple cases, I've added support for simple expressions:
- operators:
+,-,*,/,** - grouping with parentesis
- function calls (
int(),round()), with the possibility to define new functions via plugins.
So to compute a middle point one can now do this:
cfget "round((pos/start + pos/end) / 2)"
It needs a space around arithmetic operators to avoid conflicts with characters used to refer to configuration values, but with the space the expressions look nicer, so the result is that it generally does the right thing.
Just as a scary thought, curly braces work with and in expressions:
cfget "values/val{round((pos/end - pos/start) / 2)} + 1"
Although if someone ends up having a hairy thing like that, it is worth considering to replace it with a dynamic value computed using a plugin.
Computing time offsets between EXIF and GPS
I like the idea of matching photos to GPS traces. In Debian there is gpscorrelate but it's almost unusable to me because of bug #473362 and it has an awkward way of specifying time offsets.
Here at SoTM10 someone told me that
exiftool gained -geosync and -geotag
options. So it's just a matter of creating a little tool that shows a photo and
asks you to type the GPS time you see in it.
Apparently there are no bindings or GIR files for gtkimageview in Debian, so I'll have to use C.
Here is a C prototype:
/* * gpsoffset - Compute EXIF time offset from a photo of a gps display * * Use with exiftool -geosync=... -geotag trace.gpx DIR * * Copyright (C) 2009--2010 Enrico Zini <enrico@enricozini.org> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ #define _XOPEN_SOURCE /* glibc2 needs this */ #include <time.h> #include <gtkimageview/gtkimageview.h> #include <libexif/exif-data.h> #include <stdio.h> #include <stdlib.h> static int load_time(const char* fname, struct tm* tm) { ExifData* exif_data = exif_data_new_from_file(fname); ExifEntry* exif_time = exif_data_get_entry(exif_data, EXIF_TAG_DATE_TIME); if (exif_time == NULL) { fprintf(stderr, "Cannot find EXIF timetamp\n"); return -1; } char buf[1024]; exif_entry_get_value(exif_time, buf, 1024); //printf("val2: %s\n", exif_entry_get_value(t2, buf, 1024)); if (strptime(buf, "%Y:%m:%d %H:%M:%S", tm) == NULL) { fprintf(stderr, "Cannot match EXIF timetamp\n"); return -1; } return 0; } static time_t exif_ts; static GtkWidget* res_lbl; void date_entry_changed(GtkEditable *editable, gpointer user_data) { const gchar* text = gtk_entry_get_text(GTK_ENTRY(editable)); struct tm parsed; if (strptime(text, "%Y-%m-%d %H:%M:%S", &parsed) == NULL) { gtk_label_set_text(GTK_LABEL(res_lbl), "Please enter a date as YYYY-MM-DD HH:MM:SS"); } else { time_t img_ts = mktime(&parsed); int c; int res; if (exif_ts < img_ts) { c = '+'; res = img_ts - exif_ts; } else { c = '-'; res = exif_ts - img_ts; } char buf[1024]; if (res > 3600) snprintf(buf, 1024, "Result: %c%ds -geosync=%c%d:%02d:%02d", c, res, c, res / 3600, (res / 60) % 60, res % 60); else if (res > 60) snprintf(buf, 1024, "Result: %c%ds -geosync=%c%02d:%02d", c, res, c, (res / 60) % 60, res % 60); else snprintf(buf, 1024, "Result: %c%ds -geosync=%c%d", c, res, c, res); gtk_label_set_text(GTK_LABEL(res_lbl), buf); } } int main (int argc, char *argv[]) { // Work in UTC to avoid mktime applying DST or timezones setenv("TZ", "UTC"); const char* filename = "/home/enrico/web-eddie/galleries/2010/04-05-Uppermill/P1080932.jpg"; gtk_init (&argc, &argv); struct tm exif_time; if (load_time(filename, &exif_time) != 0) return 1; printf("EXIF time: %s\n", asctime(&exif_time)); exif_ts = mktime(&exif_time); GtkWidget* window = gtk_window_new(GTK_WINDOW_TOPLEVEL); GtkWidget* vb = gtk_vbox_new(FALSE, 0); GtkWidget* hb = gtk_hbox_new(FALSE, 0); GtkWidget* lbl = gtk_label_new("Timestamp:"); GtkWidget* exif_lbl; { char buf[1024]; strftime(buf, 1024, "EXIF time: %Y-%m-%d %H:%M:%S", &exif_time); exif_lbl = gtk_label_new(buf); } GtkWidget* date_ent = gtk_entry_new(); res_lbl = gtk_label_new("Result:"); GtkWidget* view = gtk_image_view_new(); GdkPixbuf* pixbuf = gdk_pixbuf_new_from_file(filename, NULL); gtk_box_pack_start(GTK_BOX(hb), lbl, FALSE, TRUE, 0); gtk_box_pack_start(GTK_BOX(hb), date_ent, TRUE, TRUE, 0); gtk_signal_connect(GTK_OBJECT(date_ent), "changed", (GCallback)date_entry_changed, NULL); { char buf[1024]; strftime(buf, 1024, "%Y-%m-%d %H:%M:%S", &exif_time); gtk_entry_set_text(GTK_ENTRY(date_ent), buf); } gtk_widget_set_size_request(view, 500, 400); gtk_image_view_set_pixbuf(GTK_IMAGE_VIEW(view), pixbuf, TRUE); gtk_container_add(GTK_CONTAINER(window), vb); gtk_box_pack_start(GTK_BOX(vb), view, TRUE, TRUE, 0); gtk_box_pack_start(GTK_BOX(vb), hb, FALSE, TRUE, 0); gtk_box_pack_start(GTK_BOX(vb), exif_lbl, FALSE, TRUE, 0); gtk_box_pack_start(GTK_BOX(vb), res_lbl, FALSE, TRUE, 0); gtk_widget_show_all(window); gtk_main (); return 0; }
And here is its simple makefile:
CFLAGS=$(shell pkg-config --cflags gtkimageview libexif) LDFLAGS=$(shell pkg-config --libs gtkimageview libexif) gpsoffset: gpsoffset.c
It's a simple prototype but it's a working prototype and seems to do the job for me.
I currently cannot find out why after I click on the text box, there seems to be no way to give the focus back to the image viewer so I can control it with keys.
There is another nice algorithm to compute time offsets to be implemented: you choose a photo taken from a known place and drag it on that place on a map: you can then look for the nearest point on your GPX trace and compute the time offset from that.
I have seen that there are programs for geotagging photos that implement all such algorithms, and have a nice UI, but I haven't seen any in Debian.
Are there any such softwares that can be packaged?
If not, the interpolation and annotation tasks can now already be performed by exiftool, so it's just a matter of building a good UI, and I would love to see someone picking up the task.
Searching OSM nodes in Spatialite
Third step of my SoTM10 pet project: finding the POIs.
I put together a query to find all nodes with a given tag inside a bounding box, and also a query to find all the tag values for a given tag name inside a bounding box.
The result is this simple POI search engine:
# # poisearch - simple geographical POI search engine # # Copyright (C) 2010 Enrico Zini <enrico@enricozini.org> # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # from pysqlite2 import dbapi2 as sqlite class PoiDB(object): def __init__(self): self.db = sqlite.connect("pois.db") self.db.enable_load_extension(True) self.db.execute("SELECT load_extension('libspatialite.so')") self.oldsearch = [] self.bbox = None def set_bbox(self, xmin, xmax, ymin, ymax): '''Set bbox for searches''' self.bbox = (xmin, xmax, ymin, ymax) def tagid(self, name, val): '''Get the database ID for a tag''' c = self.db.cursor() c.execute("SELECT id FROM tag WHERE name=? AND value=?", (name, val)) res = None for row in c: res = row[0] return res def tagnames(self): '''Get all tag names''' c = self.db.cursor() c.execute("SELECT DISTINCT name FROM tag ORDER BY name") for row in c: yield row[0] def tagvalues(self, name, use_bbox=False): ''' Get all tag values for a given tag name, optionally in the current bounding box ''' c = self.db.cursor() if self.bbox is None or not use_bbox: c.execute("SELECT DISTINCT value FROM tag WHERE name=? ORDER BY value", (name,)) else: c.execute("SELECT DISTINCT tag.value FROM poi, poitag, tag" " WHERE poi.rowid IN (SELECT pkid FROM idx_poi_geom WHERE (" " xmin >= ? AND xmax <= ? AND ymin >= ? AND ymax <= ?) )" " AND poitag.tag = tag.id AND poitag.poi = poi.id" " AND tag.name=?", self.bbox + (name,)) for row in c: yield row[0] def search(self, name, val): '''Get all name:val tags in the current bounding box''' # First resolve the tagid tagid = self.tagid(name, val) if tagid is None: return c = self.db.cursor() c.execute("SELECT poi.name, poi.data, X(poi.geom), Y(poi.geom) FROM poi, poitag" " WHERE poi.rowid IN (SELECT pkid FROM idx_poi_geom WHERE (" " xmin >= ? AND xmax <= ? AND ymin >= ? AND ymax <= ?) )" " AND poitag.tag = ? AND poitag.poi = poi.id", self.bbox + (tagid,)) self.oldsearch = [] for row in c: self.oldsearch.append(row) yield row[0], simplejson.loads(row[1]), row[2], row[3] def count(self, name, val): '''Count all name:val tags in the current bounding box''' # First resolve the tagid tagid = self.tagid(name, val) if tagid is None: return c = self.db.cursor() c.execute("SELECT COUNT(*) FROM poi, poitag" " WHERE poi.rowid IN (SELECT pkid FROM idx_poi_geom WHERE (" " xmin >= ? AND xmax <= ? AND ymin >= ? AND ymax <= ?) )" " AND poitag.tag = ? AND poitag.poi = poi.id", self.bbox + (tagid,)) for row in c: return row[0] def replay(self): for row in self.oldsearch: yield row[0], simplejson.loads(row[1]), row[2], row[3]
Problem 3 solved: now on to the next step, building a user interface for it.
Importing OSM nodes into Spatialite
Second step of my SoTM10 pet project: creating a searchable database with the points. What a fantastic opportunity to learn Spatialite.
Learning Spatialite is easy. For example, you can use the two tutorials with catchy titles that assume your best wish in life is to create databases out of shapefiles using a pre-built, i386-only executable GUI binary downloaded over an insecure HTTP connection.
To be fair, the second of those tutorials is called "An almost Idiot's Guide", thus expliciting the requirement of being an almost idiot in order to happily acquire and run software in that way.
Alternatively, you can use A quick tutorial to SpatiaLite which is so quick it has examples that lead you to write SQL queries that trigger all sorts of vague exceptions at insert time. But at least it brought me a long way forward, at which point I could just cross reference things with PostGIS documentation to find out the right way of doing things.
So, here's the importer script, which will probably become my reference example for how to get started with Spatialite, and how to use Spatialite from Python:
#!/usr/bin/python # # poiimport - import nodes from OSM into a spatialite DB # # Copyright (C) 2010 Enrico Zini <enrico@enricozini.org> # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # import xml.sax import xml.sax.handler from pysqlite2 import dbapi2 as sqlite import simplejson import sys import os class OSMPOIReader(xml.sax.handler.ContentHandler): ''' Filter SAX events in a OSM XML file to keep only nodes with names ''' def __init__(self, consumer): self.consumer = consumer def startElement(self, name, attrs): if name == "node": self.attrs = attrs self.tags = dict() elif name == "tag": self.tags[attrs["k"]] = attrs["v"] def endElement(self, name): if name == "node": lat = float(self.attrs["lat"]) lon = float(self.attrs["lon"]) id = int(self.attrs["id"]) #dt = parse(self.attrs["timestamp"]) uid = self.attrs.get("uid", None) uid = int(uid) if uid is not None else None user = self.attrs.get("user", None) self.consumer(lat, lon, id, self.tags, user=user, uid=uid) class Importer(object): ''' Create the spatialite database and populate it ''' TAG_WHITELIST = set(["amenity", "shop", "tourism", "place"]) def __init__(self, filename): self.db = sqlite.connect(filename) self.db.enable_load_extension(True) self.db.execute("SELECT load_extension('libspatialite.so')") self.db.execute("SELECT InitSpatialMetaData()") self.db.execute("INSERT INTO spatial_ref_sys (srid, auth_name, auth_srid," " ref_sys_name, proj4text) VALUES (4326, 'epsg', 4326," " 'WGS 84', '+proj=longlat +ellps=WGS84 +datum=WGS84" " +no_defs')") self.db.execute("CREATE TABLE poi (id int not null unique primary key," " name char, data text)") self.db.execute("SELECT AddGeometryColumn('poi', 'geom', 4326, 'POINT', 2)") self.db.execute("SELECT CreateSpatialIndex('poi', 'geom')") self.db.execute("CREATE TABLE tag (id integer primary key autoincrement," " name char, value char)") self.db.execute("CREATE UNIQUE INDEX tagidx ON tag (name, value)") self.db.execute("CREATE TABLE poitag (poi int not null, tag int not null)") self.db.execute("CREATE UNIQUE INDEX poitagidx ON poitag (poi, tag)") self.tagid_cache = dict() def tagid(self, k, v): key = (k, v) res = self.tagid_cache.get(key, None) if res is None: c = self.db.cursor() c.execute("SELECT id FROM tag WHERE name=? AND value=?", key) for row in c: self.tagid_cache[key] = row[0] return row[0] self.db.execute("INSERT INTO tag (id, name, value) VALUES (NULL, ?, ?)", key) c.execute("SELECT last_insert_rowid()") for row in c: res = row[0] self.tagid_cache[key] = res return res def __call__(self, lat, lon, id, tags, user=None, uid=None): # Acquire tag IDs tagids = [] for k, v in tags.iteritems(): if k not in self.TAG_WHITELIST: continue for val in v.split(";"): tagids.append(self.tagid(k, val)) # Skip elements that don't have the tags we want if not tagids: return geom = "POINT(%f %f)" % (lon, lat) self.db.execute("INSERT INTO poi (id, geom, name, data)" " VALUES (?, GeomFromText(?, 4326), ?, ?)", (id, geom, tags["name"], simplejson.dumps(tags))) for tid in tagids: self.db.execute("INSERT INTO poitag (poi, tag) VALUES (?, ?)", (id, tid)) def done(self): self.db.commit() # Get the output file name filename = sys.argv[1] # Ensure we start from scratch if os.path.exists(filename): print >>sys.stderr, filename, "already exists" sys.exit(1) # Import parser = xml.sax.make_parser() importer = Importer(filename) handler = OSMPOIReader(importer) parser.setContentHandler(handler) parser.parse(sys.stdin) importer.done()
Let's run it:
$ ./poiimport pois.db < pois.osm
SpatiaLite version ..: 2.4.0 Supported Extensions:
- 'VirtualShape' [direct Shapefile access]
- 'VirtualDbf' [direct Dbf access]
- 'VirtualText' [direct CSV/TXT access]
- 'VirtualNetwork' [Dijkstra shortest path]
- 'RTree' [Spatial Index - R*Tree]
- 'MbrCache' [Spatial Index - MBR cache]
- 'VirtualFDO' [FDO-OGR interoperability]
- 'SpatiaLite' [Spatial SQL - OGC]
PROJ.4 Rel. 4.7.1, 23 September 2009
GEOS version 3.2.0-CAPI-1.6.0
$ ls -l --si pois*
-rw-r--r-- 1 enrico enrico 17M Jul 9 23:44 pois.db
-rw-r--r-- 1 enrico enrico 37M Jul 9 16:20 pois.osm
$ spatialite pois.db
SpatiaLite version ..: 2.4.0 Supported Extensions:
- 'VirtualShape' [direct Shapefile access]
- 'VirtualDbf' [direct DBF access]
- 'VirtualText' [direct CSV/TXT access]
- 'VirtualNetwork' [Dijkstra shortest path]
- 'RTree' [Spatial Index - R*Tree]
- 'MbrCache' [Spatial Index - MBR cache]
- 'VirtualFDO' [FDO-OGR interoperability]
- 'SpatiaLite' [Spatial SQL - OGC]
PROJ.4 version ......: Rel. 4.7.1, 23 September 2009
GEOS version ........: 3.2.0-CAPI-1.6.0
SQLite version ......: 3.6.23.1
Enter ".help" for instructions
spatialite> select id from tag where name="amenity" and value="fountain";
24
spatialite> SELECT poi.name, poi.data, X(poi.geom), Y(poi.geom) FROM poi, poitag WHERE poi.rowid IN (SELECT pkid FROM idx_poi_geom WHERE (xmin >= 2.56 AND xmax <= 2.90 AND ymin >= 41.84 AND ymax <= 42.00) ) AND poitag.tag = 24 AND poitag.poi = poi.id;
Font Picant de la Cellera|{"amenity": "fountain", "name": "Font Picant de la Cellera"}|2.616045|41.952449
Font de Can Pla|{"amenity": "fountain", "name": "Font de Can Pla"}|2.622354|41.974724
Font de Can Ribes|{"amenity": "fountain", "name": "Font de Can Ribes"}|2.62311|41.979193
It's impressive: I've got all sort of useful information for the whole of Spain in just 17Mb!
Let's put it to practice: I'm thirsty, is there any water fountain nearby?
spatialite> SELECT count(1) FROM poi, poitag WHERE poi.rowid IN (SELECT pkid FROM idx_poi_geom WHERE (xmin >= 2.80 AND xmax <= 2.85 AND ymin >= 41.97 AND ymax <= 42.00) ) AND poitag.tag = 24 AND poitag.poi = poi.id; 0
Ouch! No water fountains mapped in Girona... yet.
Problem 2 solved: now on to the next step, trying to show the results in some usable way.
Filtering nodes out of OSM files
I have a pet project here at SoTM10: create a tool for searching nearby POIs while offline.
The idea is to have something in my pocket (FreeRunner or N900), which doesn't require an internet connection, and which can point me at the nearest fountains, post offices, atm machines, bars and so on.
The first step is to obtain a list of POIs.
In theory one can use Xapi but all the known Xapi servers appear to be down at the moment.
Another attempt is to obtain it by filtering all nodes with the tags we want out of a planet OSM extract. I downloaded the Spanish one and set to work.
First I tried with xmlstarlet, but it ate all the RAM and crashed my laptop, because for some reason, on my laptop the Linux kernels up to 2.6.32 (don't now about later ones) like to swap out ALL running apps to cache I/O operations, which mean that heavy I/O operations swap out the very programs performing them, so the system gets caught in some infinite I/O loop and dies. Or at least this is what I've figured out so far.
So, we need SAX. I put together this prototype in Python, which can process a nice 8MB/s of OSM data for quite some time with a constant, low RAM usage:
#!/usr/bin/python # # poifilter - extract interesting nodes from OSM XML files # # Copyright (C) 2010 Enrico Zini <enrico@enricozini.org> # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # import xml.sax import xml.sax.handler import xml.sax.saxutils import sys class XMLSAXFilter(xml.sax.handler.ContentHandler): ''' A SAX filter that is a ContentHandler. There is xml.sax.saxutils.XMLFilterBase in the standard library but it is undocumented, and most of the examples using it you find online are wrong. You can look at its source code, and at that point you find out that it is an offensive practical joke. ''' def __init__(self, downstream): self.downstream = downstream # ContentHandler methods def setDocumentLocator(self, locator): self.downstream.setDocumentLocator(locator) def startDocument(self): self.downstream.startDocument() def endDocument(self): self.downstream.endDocument() def startPrefixMapping(self, prefix, uri): self.downstream.startPrefixMapping(prefix, uri) def endPrefixMapping(self, prefix): self.downstream.endPrefixMapping(prefix) def startElement(self, name, attrs): self.downstream.startElement(name, attrs) def endElement(self, name): self.downstream.endElement(name) def startElementNS(self, name, qname, attrs): self.downstream.startElementNS(name, qname, attrs) def endElementNS(self, name, qname): self.downstream.endElementNS(name, qname) def characters(self, content): self.downstream.characters(content) def ignorableWhitespace(self, chars): self.downstream.ignorableWhitespace(chars) def processingInstruction(self, target, data): self.downstream.processingInstruction(target, data) def skippedEntity(self, name): self.downstream.skippedEntity(name) class OSMPOIHandler(XMLSAXFilter): ''' Filter SAX events in a OSM XML file to keep only nodes with names ''' PASSTHROUGH = ["osm", "bound"] TAG_WHITELIST = set(["amenity", "shop", "tourism", "place"]) def startElement(self, name, attrs): if name in self.PASSTHROUGH: self.downstream.startElement(name, attrs) elif name == "node": self.attrs = attrs self.tags = [] self.propagate = False elif name == "tag": if self.tags is not None: self.tags.append(attrs) if attrs["k"] in self.TAG_WHITELIST: self.propagate = True else: self.tags = None self.attrs = None def endElement(self, name): if name in self.PASSTHROUGH: self.downstream.endElement(name) elif name == "node": if self.propagate: self.downstream.startElement("node", self.attrs) for attrs in self.tags: self.downstream.startElement("tag", attrs) self.downstream.endElement("tag") self.downstream.endElement("node") def ignorableWhitespace(self, chars): pass def characters(self, content): pass # Simple stdin->stdout XMl filter parser = xml.sax.make_parser() handler = OSMPOIHandler(xml.sax.saxutils.XMLGenerator(sys.stdout, "utf-8")) parser.setContentHandler(handler) parser.parse(sys.stdin)
Let's run it:
$ bzcat /store/osm/spain.osm.bz2 | pv | ./poifilter > pois.osm [...] $ ls -l --si pois.osm -rw-r--r-- 1 enrico enrico 19M Jul 10 23:56 pois.osm $ xmlstarlet val pois.osm pois.osm - valid
Problem 1 solved: now on to the next step: importing the nodes in a database.
A golden rule in Debian things
If there's something that doesn't go the way you think, write:
I don't understand, please explain me
instead of:
You don't understand, I'll explain you
In the first case, you'll likely get an answer that starts with "I'm sorry". In the second case, you'll likely FAIL.