Software
Released cfget 0.16
I have released version 0.16 of cfget.
It is just a little bugfix release as I found a bug in the new expression parser, and while I was at it I simplified its code quite a bit.
Released cfget 0.15
I have released version 0.15 of cfget.
cfget is a tool to extract values from ini-style config files. A trivial thing, really.
It is also simple to install: it is a single python executable and it has no dependencies besides the python standard library.
It is trivial and simple, but because of the complex requirements (and sponsorship) of ISAC - CNR it has recently accumulated quite a set of features, and it manages to get a remarkable lot of things done.
There are several news since 0.8, worth a rather major announcement:
Added --dump=pickle
Now all the contents of the configuration files, plus all the contents generated by cfget plugins, can be dumped in pickle format.
This provides a quick and dirty way to load all cfget-generated data into a python dict:
data = pickle.loads( subprocess.Popen( ["cfget", "--dump=pickle"], stdout=subprocess.PIPE).communicate()[0])
It sounds like a rather complicated way to read a configuration file, but if you have various plugins that compute nontrivial derived configuration values, that quick&dirty hack could be quite useful.
Curly brace expansion
Suppose that you have a config file like this:
[general] mode = show [show] command = display [edit] command = gimp
And you want to get the command from the section indicated by general/mode.
Now you can, very easily: cfget '{general/mode}/command'
When it notices curly braces, cfget will literally replace them with the result of querying their contents, then parse the expression again.
Simple expression support
ISAC - CNR are using cfget to configure the run of a rather complicated physical model, and use plugins to derive all sorts of values form the base configuration.
This works, but there are times when adding a function to a plugin sounds like
overkill: for example, sometimes one needs foo/bar + 1, or just the hour of a
timestamp.
For those simple cases, I've added support for simple expressions:
- operators:
+,-,*,/,** - grouping with parentesis
- function calls (
int(),round()), with the possibility to define new functions via plugins.
So to compute a middle point one can now do this:
cfget "round((pos/start + pos/end) / 2)"
It needs a space around arithmetic operators to avoid conflicts with characters used to refer to configuration values, but with the space the expressions look nicer, so the result is that it generally does the right thing.
Just as a scary thought, curly braces work with and in expressions:
cfget "values/val{round((pos/end - pos/start) / 2)} + 1"
Although if someone ends up having a hairy thing like that, it is worth considering to replace it with a dynamic value computed using a plugin.
Computing time offsets between EXIF and GPS
I like the idea of matching photos to GPS traces. In Debian there is gpscorrelate but it's almost unusable to me because of bug #473362 and it has an awkward way of specifying time offsets.
Here at SoTM10 someone told me that
exiftool gained -geosync and -geotag
options. So it's just a matter of creating a little tool that shows a photo and
asks you to type the GPS time you see in it.
Apparently there are no bindings or GIR files for gtkimageview in Debian, so I'll have to use C.
Here is a C prototype:
/* * gpsoffset - Compute EXIF time offset from a photo of a gps display * * Use with exiftool -geosync=... -geotag trace.gpx DIR * * Copyright (C) 2009--2010 Enrico Zini <enrico@enricozini.org> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ #define _XOPEN_SOURCE /* glibc2 needs this */ #include <time.h> #include <gtkimageview/gtkimageview.h> #include <libexif/exif-data.h> #include <stdio.h> #include <stdlib.h> static int load_time(const char* fname, struct tm* tm) { ExifData* exif_data = exif_data_new_from_file(fname); ExifEntry* exif_time = exif_data_get_entry(exif_data, EXIF_TAG_DATE_TIME); if (exif_time == NULL) { fprintf(stderr, "Cannot find EXIF timetamp\n"); return -1; } char buf[1024]; exif_entry_get_value(exif_time, buf, 1024); //printf("val2: %s\n", exif_entry_get_value(t2, buf, 1024)); if (strptime(buf, "%Y:%m:%d %H:%M:%S", tm) == NULL) { fprintf(stderr, "Cannot match EXIF timetamp\n"); return -1; } return 0; } static time_t exif_ts; static GtkWidget* res_lbl; void date_entry_changed(GtkEditable *editable, gpointer user_data) { const gchar* text = gtk_entry_get_text(GTK_ENTRY(editable)); struct tm parsed; if (strptime(text, "%Y-%m-%d %H:%M:%S", &parsed) == NULL) { gtk_label_set_text(GTK_LABEL(res_lbl), "Please enter a date as YYYY-MM-DD HH:MM:SS"); } else { time_t img_ts = mktime(&parsed); int c; int res; if (exif_ts < img_ts) { c = '+'; res = img_ts - exif_ts; } else { c = '-'; res = exif_ts - img_ts; } char buf[1024]; if (res > 3600) snprintf(buf, 1024, "Result: %c%ds -geosync=%c%d:%02d:%02d", c, res, c, res / 3600, (res / 60) % 60, res % 60); else if (res > 60) snprintf(buf, 1024, "Result: %c%ds -geosync=%c%02d:%02d", c, res, c, (res / 60) % 60, res % 60); else snprintf(buf, 1024, "Result: %c%ds -geosync=%c%d", c, res, c, res); gtk_label_set_text(GTK_LABEL(res_lbl), buf); } } int main (int argc, char *argv[]) { // Work in UTC to avoid mktime applying DST or timezones setenv("TZ", "UTC"); const char* filename = "/home/enrico/web-eddie/galleries/2010/04-05-Uppermill/P1080932.jpg"; gtk_init (&argc, &argv); struct tm exif_time; if (load_time(filename, &exif_time) != 0) return 1; printf("EXIF time: %s\n", asctime(&exif_time)); exif_ts = mktime(&exif_time); GtkWidget* window = gtk_window_new(GTK_WINDOW_TOPLEVEL); GtkWidget* vb = gtk_vbox_new(FALSE, 0); GtkWidget* hb = gtk_hbox_new(FALSE, 0); GtkWidget* lbl = gtk_label_new("Timestamp:"); GtkWidget* exif_lbl; { char buf[1024]; strftime(buf, 1024, "EXIF time: %Y-%m-%d %H:%M:%S", &exif_time); exif_lbl = gtk_label_new(buf); } GtkWidget* date_ent = gtk_entry_new(); res_lbl = gtk_label_new("Result:"); GtkWidget* view = gtk_image_view_new(); GdkPixbuf* pixbuf = gdk_pixbuf_new_from_file(filename, NULL); gtk_box_pack_start(GTK_BOX(hb), lbl, FALSE, TRUE, 0); gtk_box_pack_start(GTK_BOX(hb), date_ent, TRUE, TRUE, 0); gtk_signal_connect(GTK_OBJECT(date_ent), "changed", (GCallback)date_entry_changed, NULL); { char buf[1024]; strftime(buf, 1024, "%Y-%m-%d %H:%M:%S", &exif_time); gtk_entry_set_text(GTK_ENTRY(date_ent), buf); } gtk_widget_set_size_request(view, 500, 400); gtk_image_view_set_pixbuf(GTK_IMAGE_VIEW(view), pixbuf, TRUE); gtk_container_add(GTK_CONTAINER(window), vb); gtk_box_pack_start(GTK_BOX(vb), view, TRUE, TRUE, 0); gtk_box_pack_start(GTK_BOX(vb), hb, FALSE, TRUE, 0); gtk_box_pack_start(GTK_BOX(vb), exif_lbl, FALSE, TRUE, 0); gtk_box_pack_start(GTK_BOX(vb), res_lbl, FALSE, TRUE, 0); gtk_widget_show_all(window); gtk_main (); return 0; }
And here is its simple makefile:
CFLAGS=$(shell pkg-config --cflags gtkimageview libexif) LDFLAGS=$(shell pkg-config --libs gtkimageview libexif) gpsoffset: gpsoffset.c
It's a simple prototype but it's a working prototype and seems to do the job for me.
I currently cannot find out why after I click on the text box, there seems to be no way to give the focus back to the image viewer so I can control it with keys.
There is another nice algorithm to compute time offsets to be implemented: you choose a photo taken from a known place and drag it on that place on a map: you can then look for the nearest point on your GPX trace and compute the time offset from that.
I have seen that there are programs for geotagging photos that implement all such algorithms, and have a nice UI, but I haven't seen any in Debian.
Are there any such softwares that can be packaged?
If not, the interpolation and annotation tasks can now already be performed by exiftool, so it's just a matter of building a good UI, and I would love to see someone picking up the task.
Searching OSM nodes in Spatialite
Third step of my SoTM10 pet project: finding the POIs.
I put together a query to find all nodes with a given tag inside a bounding box, and also a query to find all the tag values for a given tag name inside a bounding box.
The result is this simple POI search engine:
# # poisearch - simple geographical POI search engine # # Copyright (C) 2010 Enrico Zini <enrico@enricozini.org> # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # from pysqlite2 import dbapi2 as sqlite class PoiDB(object): def __init__(self): self.db = sqlite.connect("pois.db") self.db.enable_load_extension(True) self.db.execute("SELECT load_extension('libspatialite.so')") self.oldsearch = [] self.bbox = None def set_bbox(self, xmin, xmax, ymin, ymax): '''Set bbox for searches''' self.bbox = (xmin, xmax, ymin, ymax) def tagid(self, name, val): '''Get the database ID for a tag''' c = self.db.cursor() c.execute("SELECT id FROM tag WHERE name=? AND value=?", (name, val)) res = None for row in c: res = row[0] return res def tagnames(self): '''Get all tag names''' c = self.db.cursor() c.execute("SELECT DISTINCT name FROM tag ORDER BY name") for row in c: yield row[0] def tagvalues(self, name, use_bbox=False): ''' Get all tag values for a given tag name, optionally in the current bounding box ''' c = self.db.cursor() if self.bbox is None or not use_bbox: c.execute("SELECT DISTINCT value FROM tag WHERE name=? ORDER BY value", (name,)) else: c.execute("SELECT DISTINCT tag.value FROM poi, poitag, tag" " WHERE poi.rowid IN (SELECT pkid FROM idx_poi_geom WHERE (" " xmin >= ? AND xmax <= ? AND ymin >= ? AND ymax <= ?) )" " AND poitag.tag = tag.id AND poitag.poi = poi.id" " AND tag.name=?", self.bbox + (name,)) for row in c: yield row[0] def search(self, name, val): '''Get all name:val tags in the current bounding box''' # First resolve the tagid tagid = self.tagid(name, val) if tagid is None: return c = self.db.cursor() c.execute("SELECT poi.name, poi.data, X(poi.geom), Y(poi.geom) FROM poi, poitag" " WHERE poi.rowid IN (SELECT pkid FROM idx_poi_geom WHERE (" " xmin >= ? AND xmax <= ? AND ymin >= ? AND ymax <= ?) )" " AND poitag.tag = ? AND poitag.poi = poi.id", self.bbox + (tagid,)) self.oldsearch = [] for row in c: self.oldsearch.append(row) yield row[0], simplejson.loads(row[1]), row[2], row[3] def count(self, name, val): '''Count all name:val tags in the current bounding box''' # First resolve the tagid tagid = self.tagid(name, val) if tagid is None: return c = self.db.cursor() c.execute("SELECT COUNT(*) FROM poi, poitag" " WHERE poi.rowid IN (SELECT pkid FROM idx_poi_geom WHERE (" " xmin >= ? AND xmax <= ? AND ymin >= ? AND ymax <= ?) )" " AND poitag.tag = ? AND poitag.poi = poi.id", self.bbox + (tagid,)) for row in c: return row[0] def replay(self): for row in self.oldsearch: yield row[0], simplejson.loads(row[1]), row[2], row[3]
Problem 3 solved: now on to the next step, building a user interface for it.
Importing OSM nodes into Spatialite
Second step of my SoTM10 pet project: creating a searchable database with the points. What a fantastic opportunity to learn Spatialite.
Learning Spatialite is easy. For example, you can use the two tutorials with catchy titles that assume your best wish in life is to create databases out of shapefiles using a pre-built, i386-only executable GUI binary downloaded over an insecure HTTP connection.
To be fair, the second of those tutorials is called "An almost Idiot's Guide", thus expliciting the requirement of being an almost idiot in order to happily acquire and run software in that way.
Alternatively, you can use A quick tutorial to SpatiaLite which is so quick it has examples that lead you to write SQL queries that trigger all sorts of vague exceptions at insert time. But at least it brought me a long way forward, at which point I could just cross reference things with PostGIS documentation to find out the right way of doing things.
So, here's the importer script, which will probably become my reference example for how to get started with Spatialite, and how to use Spatialite from Python:
#!/usr/bin/python # # poiimport - import nodes from OSM into a spatialite DB # # Copyright (C) 2010 Enrico Zini <enrico@enricozini.org> # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # import xml.sax import xml.sax.handler from pysqlite2 import dbapi2 as sqlite import simplejson import sys import os class OSMPOIReader(xml.sax.handler.ContentHandler): ''' Filter SAX events in a OSM XML file to keep only nodes with names ''' def __init__(self, consumer): self.consumer = consumer def startElement(self, name, attrs): if name == "node": self.attrs = attrs self.tags = dict() elif name == "tag": self.tags[attrs["k"]] = attrs["v"] def endElement(self, name): if name == "node": lat = float(self.attrs["lat"]) lon = float(self.attrs["lon"]) id = int(self.attrs["id"]) #dt = parse(self.attrs["timestamp"]) uid = self.attrs.get("uid", None) uid = int(uid) if uid is not None else None user = self.attrs.get("user", None) self.consumer(lat, lon, id, self.tags, user=user, uid=uid) class Importer(object): ''' Create the spatialite database and populate it ''' TAG_WHITELIST = set(["amenity", "shop", "tourism", "place"]) def __init__(self, filename): self.db = sqlite.connect(filename) self.db.enable_load_extension(True) self.db.execute("SELECT load_extension('libspatialite.so')") self.db.execute("SELECT InitSpatialMetaData()") self.db.execute("INSERT INTO spatial_ref_sys (srid, auth_name, auth_srid," " ref_sys_name, proj4text) VALUES (4326, 'epsg', 4326," " 'WGS 84', '+proj=longlat +ellps=WGS84 +datum=WGS84" " +no_defs')") self.db.execute("CREATE TABLE poi (id int not null unique primary key," " name char, data text)") self.db.execute("SELECT AddGeometryColumn('poi', 'geom', 4326, 'POINT', 2)") self.db.execute("SELECT CreateSpatialIndex('poi', 'geom')") self.db.execute("CREATE TABLE tag (id integer primary key autoincrement," " name char, value char)") self.db.execute("CREATE UNIQUE INDEX tagidx ON tag (name, value)") self.db.execute("CREATE TABLE poitag (poi int not null, tag int not null)") self.db.execute("CREATE UNIQUE INDEX poitagidx ON poitag (poi, tag)") self.tagid_cache = dict() def tagid(self, k, v): key = (k, v) res = self.tagid_cache.get(key, None) if res is None: c = self.db.cursor() c.execute("SELECT id FROM tag WHERE name=? AND value=?", key) for row in c: self.tagid_cache[key] = row[0] return row[0] self.db.execute("INSERT INTO tag (id, name, value) VALUES (NULL, ?, ?)", key) c.execute("SELECT last_insert_rowid()") for row in c: res = row[0] self.tagid_cache[key] = res return res def __call__(self, lat, lon, id, tags, user=None, uid=None): # Acquire tag IDs tagids = [] for k, v in tags.iteritems(): if k not in self.TAG_WHITELIST: continue for val in v.split(";"): tagids.append(self.tagid(k, val)) # Skip elements that don't have the tags we want if not tagids: return geom = "POINT(%f %f)" % (lon, lat) self.db.execute("INSERT INTO poi (id, geom, name, data)" " VALUES (?, GeomFromText(?, 4326), ?, ?)", (id, geom, tags["name"], simplejson.dumps(tags))) for tid in tagids: self.db.execute("INSERT INTO poitag (poi, tag) VALUES (?, ?)", (id, tid)) def done(self): self.db.commit() # Get the output file name filename = sys.argv[1] # Ensure we start from scratch if os.path.exists(filename): print >>sys.stderr, filename, "already exists" sys.exit(1) # Import parser = xml.sax.make_parser() importer = Importer(filename) handler = OSMPOIReader(importer) parser.setContentHandler(handler) parser.parse(sys.stdin) importer.done()
Let's run it:
$ ./poiimport pois.db < pois.osm
SpatiaLite version ..: 2.4.0 Supported Extensions:
- 'VirtualShape' [direct Shapefile access]
- 'VirtualDbf' [direct Dbf access]
- 'VirtualText' [direct CSV/TXT access]
- 'VirtualNetwork' [Dijkstra shortest path]
- 'RTree' [Spatial Index - R*Tree]
- 'MbrCache' [Spatial Index - MBR cache]
- 'VirtualFDO' [FDO-OGR interoperability]
- 'SpatiaLite' [Spatial SQL - OGC]
PROJ.4 Rel. 4.7.1, 23 September 2009
GEOS version 3.2.0-CAPI-1.6.0
$ ls -l --si pois*
-rw-r--r-- 1 enrico enrico 17M Jul 9 23:44 pois.db
-rw-r--r-- 1 enrico enrico 37M Jul 9 16:20 pois.osm
$ spatialite pois.db
SpatiaLite version ..: 2.4.0 Supported Extensions:
- 'VirtualShape' [direct Shapefile access]
- 'VirtualDbf' [direct DBF access]
- 'VirtualText' [direct CSV/TXT access]
- 'VirtualNetwork' [Dijkstra shortest path]
- 'RTree' [Spatial Index - R*Tree]
- 'MbrCache' [Spatial Index - MBR cache]
- 'VirtualFDO' [FDO-OGR interoperability]
- 'SpatiaLite' [Spatial SQL - OGC]
PROJ.4 version ......: Rel. 4.7.1, 23 September 2009
GEOS version ........: 3.2.0-CAPI-1.6.0
SQLite version ......: 3.6.23.1
Enter ".help" for instructions
spatialite> select id from tag where name="amenity" and value="fountain";
24
spatialite> SELECT poi.name, poi.data, X(poi.geom), Y(poi.geom) FROM poi, poitag WHERE poi.rowid IN (SELECT pkid FROM idx_poi_geom WHERE (xmin >= 2.56 AND xmax <= 2.90 AND ymin >= 41.84 AND ymax <= 42.00) ) AND poitag.tag = 24 AND poitag.poi = poi.id;
Font Picant de la Cellera|{"amenity": "fountain", "name": "Font Picant de la Cellera"}|2.616045|41.952449
Font de Can Pla|{"amenity": "fountain", "name": "Font de Can Pla"}|2.622354|41.974724
Font de Can Ribes|{"amenity": "fountain", "name": "Font de Can Ribes"}|2.62311|41.979193
It's impressive: I've got all sort of useful information for the whole of Spain in just 17Mb!
Let's put it to practice: I'm thirsty, is there any water fountain nearby?
spatialite> SELECT count(1) FROM poi, poitag WHERE poi.rowid IN (SELECT pkid FROM idx_poi_geom WHERE (xmin >= 2.80 AND xmax <= 2.85 AND ymin >= 41.97 AND ymax <= 42.00) ) AND poitag.tag = 24 AND poitag.poi = poi.id; 0
Ouch! No water fountains mapped in Girona... yet.
Problem 2 solved: now on to the next step, trying to show the results in some usable way.
Filtering nodes out of OSM files
I have a pet project here at SoTM10: create a tool for searching nearby POIs while offline.
The idea is to have something in my pocket (FreeRunner or N900), which doesn't require an internet connection, and which can point me at the nearest fountains, post offices, atm machines, bars and so on.
The first step is to obtain a list of POIs.
In theory one can use Xapi but all the known Xapi servers appear to be down at the moment.
Another attempt is to obtain it by filtering all nodes with the tags we want out of a planet OSM extract. I downloaded the Spanish one and set to work.
First I tried with xmlstarlet, but it ate all the RAM and crashed my laptop, because for some reason, on my laptop the Linux kernels up to 2.6.32 (don't now about later ones) like to swap out ALL running apps to cache I/O operations, which mean that heavy I/O operations swap out the very programs performing them, so the system gets caught in some infinite I/O loop and dies. Or at least this is what I've figured out so far.
So, we need SAX. I put together this prototype in Python, which can process a nice 8MB/s of OSM data for quite some time with a constant, low RAM usage:
#!/usr/bin/python # # poifilter - extract interesting nodes from OSM XML files # # Copyright (C) 2010 Enrico Zini <enrico@enricozini.org> # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # import xml.sax import xml.sax.handler import xml.sax.saxutils import sys class XMLSAXFilter(xml.sax.handler.ContentHandler): ''' A SAX filter that is a ContentHandler. There is xml.sax.saxutils.XMLFilterBase in the standard library but it is undocumented, and most of the examples using it you find online are wrong. You can look at its source code, and at that point you find out that it is an offensive practical joke. ''' def __init__(self, downstream): self.downstream = downstream # ContentHandler methods def setDocumentLocator(self, locator): self.downstream.setDocumentLocator(locator) def startDocument(self): self.downstream.startDocument() def endDocument(self): self.downstream.endDocument() def startPrefixMapping(self, prefix, uri): self.downstream.startPrefixMapping(prefix, uri) def endPrefixMapping(self, prefix): self.downstream.endPrefixMapping(prefix) def startElement(self, name, attrs): self.downstream.startElement(name, attrs) def endElement(self, name): self.downstream.endElement(name) def startElementNS(self, name, qname, attrs): self.downstream.startElementNS(name, qname, attrs) def endElementNS(self, name, qname): self.downstream.endElementNS(name, qname) def characters(self, content): self.downstream.characters(content) def ignorableWhitespace(self, chars): self.downstream.ignorableWhitespace(chars) def processingInstruction(self, target, data): self.downstream.processingInstruction(target, data) def skippedEntity(self, name): self.downstream.skippedEntity(name) class OSMPOIHandler(XMLSAXFilter): ''' Filter SAX events in a OSM XML file to keep only nodes with names ''' PASSTHROUGH = ["osm", "bound"] TAG_WHITELIST = set(["amenity", "shop", "tourism", "place"]) def startElement(self, name, attrs): if name in self.PASSTHROUGH: self.downstream.startElement(name, attrs) elif name == "node": self.attrs = attrs self.tags = [] self.propagate = False elif name == "tag": if self.tags is not None: self.tags.append(attrs) if attrs["k"] in self.TAG_WHITELIST: self.propagate = True else: self.tags = None self.attrs = None def endElement(self, name): if name in self.PASSTHROUGH: self.downstream.endElement(name) elif name == "node": if self.propagate: self.downstream.startElement("node", self.attrs) for attrs in self.tags: self.downstream.startElement("tag", attrs) self.downstream.endElement("tag") self.downstream.endElement("node") def ignorableWhitespace(self, chars): pass def characters(self, content): pass # Simple stdin->stdout XMl filter parser = xml.sax.make_parser() handler = OSMPOIHandler(xml.sax.saxutils.XMLGenerator(sys.stdout, "utf-8")) parser.setContentHandler(handler) parser.parse(sys.stdin)
Let's run it:
$ bzcat /store/osm/spain.osm.bz2 | pv | ./poifilter > pois.osm [...] $ ls -l --si pois.osm -rw-r--r-- 1 enrico enrico 19M Jul 10 23:56 pois.osm $ xmlstarlet val pois.osm pois.osm - valid
Problem 1 solved: now on to the next step: importing the nodes in a database.
Released nodm 0.7
I have released version 0.7 of nodm.
It only fixes one silly typo in autotools, which made it fail to build on Fedora.
Temporarily disabling file caching
Does it happen to you that you cp a big, big file (say, similar in order of magnitude to the amount of RAM) and the system becomes rather unusable?
It looks like Linux is saying "let's cache this", and as you copy it will try to free more and more ram in order to cache the big file you're copying. In the end, all the RAM is full with file data that you are not going to need.
This varies according to how /proc/sys/vm/swappiness is set.
I learnt about posix_fadvise and I tried to play with it. The result is this
preloadable library that
hooks into open(2) and fadvises everything as POSIX_FADV_DONTNEED.
It is all rather awkward. fadvise in that way will discard existing cache pages if the file is already cached, which is too much. Ideally one would like to say "don't cache this because of me" without stepping on the feet of other system activities.
Also, I found I need to also hook into write(2) and run fadvise after every
single write, because you can't fadvise a file to be written in its entirety,
unless you pass fadvise the file size in advance. But the size of the output
file cannot be known by the preloaded library, so meh.
So, now I can run: nocache cp bigfile someplace/ without trashing the
existing caches. I can also run nocache tar zxf foo.tar.gz and so on.
I wish, of course, that there were no need to do so in the first place.
Here is the nocache library source code, for reference:
/* * nocache - LD_PRELOAD library to fadvise written files to not be cached * * Copyright (C) 2009--2010 Enrico Zini <enrico@enricozini.org> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ #define _XOPEN_SOURCE 600 #include <fcntl.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <dlfcn.h> #include <stdarg.h> #include <errno.h> #include <stdio.h> typedef int (*open_t)(const char*, int, ...); typedef int (*write_t)(int fd, const void *buf, size_t count); int open(const char *pathname, int flags, ...) { static open_t func = 0; int res; if (!func) func = (open_t)dlsym(RTLD_NEXT, "open"); // Note: I wanted to add O_DIRECT, but it imposes restriction on buffer // alignment if (flags & O_CREAT) { va_list ap; va_start(ap, flags); mode_t mode = va_arg(ap, mode_t); res = func(pathname, flags, mode); va_end(ap); } else res = func(pathname, flags); if (res >= 0) { int saved_errno = errno; int z = posix_fadvise(res, 0, 0, POSIX_FADV_DONTNEED); if (z != 0) fprintf(stderr, "Cannot fadvise on %s: %m\n", pathname); errno = saved_errno; } return res; } int write(int fd, const void *buf, size_t count) { static write_t func = 0; int res; if (!func) func = (write_t)dlsym(RTLD_NEXT, "write"); res = func(fd, buf, count); if (res > 0) { int saved_errno = errno; int z = posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED); if (z != 0) fprintf(stderr, "Cannot fadvise during write: %m\n"); errno = saved_errno; } return res; }
Updates
Steve Schnepp writes:
Robert Love did a O_STREAMING patch for 2.4. It wasn't merged in 2.6 since POSIX_FADV_NOREUSE should be used instead.
But unfortunatly it's currently mapped either as WILLNEED or as a noop.
It seems that there is a google code project that has spawned to control this.
Cropping images with GDAL
I am working to get a better integration between Meteosatlib and GDAL.
A nice aspect of GDAL is that it allows to create read/write drivers around two
functions: Open and CreateCopy.
Open opens a datset read only, and to implement that all you have to do is to
implement read access to your data using the
GDALDataset interface.
To implement CreateCopy, all you have to do is to create a new file reading
information from a GDALDataset; then call Open on it. This means that there
is no need to support incremental updates, and that all the data required to
create a new file is readily available. This simplifies matters a lot.
GDAL provides some interesting image manipulation functions, that can work over just these two calls. The way it does it is by exploiting the concept of a virtual dataset, which wraps an existing dataset but changes some parameters on the fly.
This means that you can wrap a read only dataset with a virtual dataset that
transforms it somehow, and then pass the virtual dataset to CreateCopy to
save the transformed image in a format of your choice.
On top of that, for many transformations you do not need to create your own virtual datasets, but you can use the functions provided by the VRT GDAL driver.
One can learn a lot on how to use VRTDataset by reading the source code for
gdal_translate. By doing that I could come up with this code for cropping an
image, which is an interesting VRTDataset example.
GDALDataset* crop(GDALDataset* poDS, int xoff, int yoff, int xsize, int ysize) { VRTDataset *poVDS = (VRTDataset*)VRTCreate(xsize, ysize); // Copy dataset info const char* pszProjection = poDS->GetProjectionRef(); if (pszProjection != NULL && strlen(pszProjection) > 0) poVDS->SetProjection(pszProjection); double adfGeoTransform[6]; if (poDS->GetGeoTransform(adfGeoTransform) == CE_None) { // Adapt the geotransform matrix to the subarea adfGeoTransform[0] += xoff * adfGeoTransform[1] + yoff * adfGeoTransform[2]; adfGeoTransform[3] += xoff * adfGeoTransform[4] + yoff * adfGeoTransform[5]; poVDS->SetGeoTransform(adfGeoTransform); } poVDS->SetMetadata(poDS->GetMetadata()); // Here I also copy metadata from my own domain char **papszMD; papszMD = poDS->GetMetadata(MD_DOMAIN_MSAT); if (papszMD != NULL) poVDS->SetMetadata(papszMD, MD_DOMAIN_MSAT); for (int i = 0; i < poDS->GetRasterCount(); ++i) { GDALRasterBand* poSrcBand = poDS->GetRasterBand(i + 1); GDALDataType eBandType = poSrcBand->GetRasterDataType(); poVDS->AddBand(eBandType, NULL); VRTSourcedRasterBand* poVRTBand = (VRTSourcedRasterBand*)poVDS->GetRasterBand(i + 1); poVRTBand->AddSimpleSource(poSrcBand, xoff, yoff, xsize, ysize, 0, 0, xsize, ysize); poVRTBand->CopyCommonInfoFrom(poSrcBand); // Again, I copy my own metadata papszMD = poSrcBand->GetMetadata(MD_DOMAIN_MSAT); if (papszMD != NULL) poVRTBand->SetMetadata(papszMD, MD_DOMAIN_MSAT); } return poVDS; }
This function wraps a dataset with a virtual dataset that crops it. Just pass
the resulting dataset to GDALCreateCopy to save it in the format that you
need.
Custom function decorators with TurboGears 2
I am exposing some library functions using a TurboGears2 controller (see web-api-with-turbogears2). It turns out that some functions return a dict, some a list, some a string, and TurboGears 2 only allows JSON serialisation for dicts.
A simple work-around for this is to wrap the function result into a dict, something like this:
@expose("json") @validate(validator_dispatcher, error_handler=api_validation_error) def list_colours(self, filter=None, productID=None, maxResults=100, **kw): # Call API res = self.engine.list_colours(filter, productID, maxResults) # Return result return dict(r=res)
It would be nice, however, to have an @webapi() decorator that
automatically wraps the function result with the dict:
def webapi(func): def dict_wrap(*args, **kw): return dict(r=func(*args, **kw)) return dict_wrap # ...in the controller... @expose("json") @validate(validator_dispatcher, error_handler=api_validation_error) @webapi def list_colours(self, filter=None, productID=None, maxResults=100, **kw): # Call API res = self.engine.list_colours(filter, productID, maxResults) # Return result return res
This works, as long as @webapi appears last in the list of decorators.
This is because if it appears last it will be the first to wrap the function,
and so it will not interfere with the tg.decorators machinery.
Would it be possible to create a decorator that can be put anywhere among the decorator list? Yes, it is possible but tricky, and it gives me the feeling that it may break in any future version of TurboGears:
class webapi(object): def __call__(self, func): def dict_wrap(*args, **kw): return dict(r=func(*args, **kw)) # Migrate the decoration attribute to our new function if hasattr(func, 'decoration'): dict_wrap.decoration = func.decoration dict_wrap.decoration.controller = dict_wrap delattr(func, 'decoration') return dict_wrap # ...in the controller... @expose("json") @validate(validator_dispatcher, error_handler=api_validation_error) @webapi def list_colours(self, filter=None, productID=None, maxResults=100, **kw): # Call API res = self.engine.list_colours(filter, productID, maxResults) # Return result return res
As a convenience, TurboGears 2 offers, in the decorators module, a way to
build decorator "hooks":
class before_validate(_hook_decorator): '''A list of callables to be run before validation is performed''' hook_name = 'before_validate' class before_call(_hook_decorator): '''A list of callables to be run before the controller method is called''' hook_name = 'before_call' class before_render(_hook_decorator): '''A list of callables to be run before the template is rendered''' hook_name = 'before_render' class after_render(_hook_decorator): '''A list of callables to be run after the template is rendered. Will be run before it is returned returned up the WSGI stack''' hook_name = 'after_render'
The way these are invoked can be found in the _perform_call function in
tg/controllers.py.
To show an example use of those hooks, let's add a some polygen wisdom to every data structure we return:
class wisdom(decorators.before_render): def __init__(self, grammar): super(wisdom, self).__init__(self.add_wisdom) self.grammar = grammar def add_wisdom(self, remainder, params, output): from subprocess import Popen, PIPE output["wisdom"] = Popen(["polyrun", self.grammar], stdout=PIPE).communicate()[0] # ...in the controller... @wisdom("genius") @expose("json") @validate(validator_dispatcher, error_handler=api_validation_error) def list_colours(self, filter=None, productID=None, maxResults=100, **kw): # Call API res = self.engine.list_colours(filter, productID, maxResults) # Return result return res
These hooks cannot however be used for what I need, that is, to wrap the result inside a dict. The reason is because they are called in this way:
controller.decoration.run_hooks( 'before_render', remainder, params, output)
and not in this way:
output = controller.decoration.run_hooks( 'before_render', remainder, params, output)
So it is possible to modify the output (if it is a mutable structure) but not to exchange it with something else.
Can we do even better? Sure we can. We can assimilate @expose and @validate
inside @webapi to avoid repeating those same many decorator lines over and
over again:
class webapi(object): def __init__(self, error_handler = None): self.error_handler = error_handler def __call__(self, func): def dict_wrap(*args, **kw): return dict(r=func(*args, **kw)) res = expose("json")(dict_wrap) res = validate(validator_dispatcher, error_handler=self.error_handler)(res) return res # ...in the controller... @expose("json") def api_validation_error(self, **kw): pylons.response.status = "400 Error" return dict(e="validation error on input fields", form_errors=pylons.c.form_errors) @webapi(error_handler=api_validation_error) def list_colours(self, filter=None, productID=None, maxResults=100, **kw): # Call API res = self.engine.list_colours(filter, productID, maxResults) # Return result return res
This got rid of @expose and @validate, and provides almost all the
default values that I need. Unfortunately I could not find out how to access
api_validation_error from the decorator so that I can pass it to the
validator, therefore I remain with the inconvenience of having to explicitly
pass it every time.