All entries

bologna cazzeggio dcg debian debian-tips debtags eng etiopia food index ita openmoko osm pdo ppy python rant sw taiwan tips turbogears twabo ubuntu

Creating pipelines with subprocess

It is possible to create process pipelines using subprocess.Popen, by just using stdout=subprocess.PIPE and stdin=otherproc.stdout.

Almost.

In a pipeline created in this way, the stdout of all processes except the last is opened twice: once in the script that has run the subprocess and another time in the standard input of the next process in the pipeline.

This is a problem because if a process closes its stdin, the previous process in the pipeline does not get SIGPIPE when trying to write to its stdout, because that pipe is still open on the caller process. If this happens, a wait on that process will hang forever: the child process waits for the parent to read its stdout, the parent process waits for the child process to exit.

The trick is to close the stdout of each process in the pipeline except the last just after creating them:

#!/usr/bin/python
# coding=utf-8

import subprocess

def pipe(*args):
    '''
    Takes as parameters several dicts, each with the same
    parameters passed to popen.

    Runs the various processes in a pipeline, connecting
    the stdout of every process except the last with the
    stdin of the next process.
    '''
    if len(args) < 2:
        raise ValueError, "pipe needs at least 2 processes"
    # Set stdout=PIPE in every subprocess except the last
    for i in args[:-1]:
        i["stdout"] = subprocess.PIPE

    # Runs all subprocesses connecting stdins and stdouts to create the
    # pipeline. Closes stdouts to avoid deadlocks.
    popens = [subprocess.Popen(**args[0])]
    for i in range(1,len(args)):
        args[i]["stdin"] = popens[i-1].stdout
        popens.append(subprocess.Popen(**args[i]))
        popens[i-1].stdout.close()

    # Returns the array of subprocesses just created
    return popens

At this point, it's nice to write a function that waits for the whole pipeline to terminate and returns an array of result codes:

def pipe_wait(popens):
    '''
    Given an array of Popen objects returned by the
    pipe method, wait for all processes to terminate
    and return the array with their return values.
    '''
    results = [0] * len(popens)
    while popens:
        last = popens.pop(-1)
        results[len(popens)] = last.wait()
    return results

And, look and behold, we can now easily run a pipeline and get the return codes of every single process in it:

process1 = dict(args='sleep 1; grep line2 testfile', shell=True)
process2 = dict(args='awk \'{print $3}\'', shell=True)
process3 = dict(args='true', shell=True)
popens = pipe(process1, process2, process3)
result = pipe_wait(popens)
print result
Posted Wed Jul 1 09:08:06 2009 Tags:

Anni 80

È morto Michael Jackson.

Manca solo Berlusconi e finalmente siamo fuori dagli anni 80.

Posted Fri Jun 26 05:50:17 2009 Tags:

Tips on using python's datetime module

Python's datetime module is one of those bits of code that tends not to do what one would expect them to do.

I have come to adopt some extra usage guidelines in order to preserve my sanity:

  • Avoid using str(datetime_object) or isoformat to serialize a datetime: there is no function in the library that can parse all its possible outputs
  • datetime.strptime silently throws away all timezone information. If you look very closely, it even says so in its documentation
  • Timezones do not exist, all datetime objects have to be naive. aware means broken.
  • datetime objects must always contain UTC information
  • datetime.now() is never to be used. Always use datetime.utcnow()
  • Be careful of 3rd party python modules: people have a dangerous tendency to use datetime.now()
  • If a conversion to some local time is needed, it shall be done via either some ugly thing like time.localtime(int(dt.strftime("%s"))) or via the pytz module
  • pytz must be used directly, and never via timezone aware datetime objects, because datetime objects fail in querying pytz:

That’s right, the datetime object created by a call to datetime.datetime constructor now seems to think that Finland uses the ancient “Helsinki Mean Time” which was obsoleted in the 1920s. The reason for this behaviour is clearly documented on the pytz page: it seems the Python datetime implementation never asks the tzinfo object what the offset to UTC on the given date would be. And without knowing it pytz seems to default to the first historical definition. Now, some of you fellow readers could insist on the problem going away simply by defaulting to the latest time zone definition. However, the problem would still persist: For example, Venezuela switched to GMT-04:30 on 9th December, 2007, causing the datetime objects representing dates either before, or after the change to become invalid.

From: http://blog.redinnovation.com/2008/06/30/relativity-of-time-shortcomings-in-python-datetime-and-workaround/

  • Timezone-aware datetime objects have other bugs: for example, they fail to compute Unix timestamps correctly. The following example shows two timezone-aware objects that represent the same instant but produce two different timestamps.
>>> import datetime as dt
>>> import pytz
>>> utc = pytz.timezone("UTC")
>>> italy = pytz.timezone("Europe/Rome")
>>> a = dt.datetime(2008, 7, 6, 5, 4, 3, tzinfo=utc)
>>> b = a.astimezone(italy)
>>> str(a)
'2008-07-06 05:04:03+00:00'
>>> a.strftime("%s")
'1215291843'
>>> str(b)
'2008-07-06 07:04:03+02:00'
>>> b.strftime("%s")
'1215299043'
Posted Thu Jun 25 19:18:25 2009 Tags:
Posted Thu Jun 25 06:13:54 2009 Tags:

Things you learn while booking a trip to Cacerés

  1. Spain is the only country in Europe that requires the airlines to ask you to enter your document information twice: once in the online check in and once in the "Advance Passenger Information" form.
  2. "Express" buses are more expensive, but at least they are slower.
  3. You only need a EU national ID card to enter the country, but you require a passport to book a bus.
  4. When you click on a form, websites like to reload the current page unchanged, then wait a few seconds to give you time to curse their coños de developers, then cunningly reload again to show the next page.
  5. The main train company seems to have lost their trains, and they are adding them to their database as soon as they find them. Right now there are 2 trains a day between Madrid and Cacerés, but tomorrow they may find an extra locomotor in the back of a drawer and maybe manage to schedule another service.

Anyway, somehow I'll be there.

Posted Tue Jun 23 10:52:48 2009 Tags:

Feedback democratico

Le elezioni sono passate, la frustrazione rimane.

Per la comodità di chi volesse dare un democratico feedback alle deludenti forze politiche in gioco, dallo spiegare pacatamente le proprie ragioni per l'aver votato un altro, al mandarli cortesemente a scoreggiare nella farina, ecco un po' di email:

Non dimenticate di mettere bene in chiaro che la mail di insulti che avete mandato non è da intendere come una richiesta di essere iscritti alle loro newsletter.

Posted Tue Jun 9 10:45:05 2009 Tags:

Mapping using the Openmoko FreeRunner headset

The FreeRunner has a headset which includes a microphone and a button. When doing OpenStreetMap mapping, it would be very useful to be able to keep tangogps on the display and be able to mark waypoints using the headset button, and to record an audio track using the headset microphone.

In this way, I can use tangogps to see where I need to go, where it's already mapped and where it isn't, and then I can use the headset to mark waypoints corresponding to the audio track, so that later I can take advantage of JOSM's audio mapping features.

Enter audiomap:

$ audiomap --help
Usage: audiomap [options]

Create a GPX and audio trackFind the times in the wav file when there is clear
voice among the noise

Options:
  --version      show program's version number and exit
  -h, --help     show this help message and exit
  -v, --verbose  verbose mode
  -m, --monitor  only keep the GPS on and monitor satellite status
  -l, --levels   only show input levels

If called without parameters, or with -v which is suggested, it will:

  1. Fix the mixer settings so that it can record from the headset and detect headset button presses.
  2. Show a monitor of GPS satellite information until it gets a fix.
  3. Synchronize the system time with the GPS time so that the timestamps of the files that are created afterwards are accurate.
  4. Start recording a GPX track.
  5. Start recording audio.
  6. Record a GPX waypoint for every headset button press.

When you are done, you stop audiomap with ^C and it will properly close the .wav file, close the tags in the GPX waypoint and track files and restore the mixer settings.

You can plug the headset out and record using the handset microphone, but then you will not be able to set waypoints until you plug the headset back in.

After you stop audiomap, you will have a track, waypoints and .wav file ready to be loaded in JOSM.

Big thanks go to Luca Capello for finding out how to detect headset button presses.

Posted Sun Jun 7 23:51:37 2009 Tags:

Archive of all entries