Index of categories

Pages about Ubuntu.

First pratical lesson

Notes after today's training session.

Small index of most used shell commands:

  • ls - list directory contents
  • cp - copy files and directories
  • mv - move (rename) files
  • rm - remove files or directories
  • find - search for files in a directory hierarchy
  • cat - concatenate files and print on the standard output
  • more - file perusal filter for crt viewing
  • less - opposite of more (quit with 'q')
  • cd - Change the current directory to DIR. (use "help cd" instead of "man cd")
  • mkdir - make directories
  • rmdir - remove empty directories

Small index of commands useful for combining in pipelines:

  • grep, egrep, fgrep, rgrep - print lines matching a pattern
  • tail - output the last part of files
  • head - output the first part of files
  • sort - sort lines of text files
  • uniq - report or omit repeated lines
  • sed - stream editor
  • wc - print the number of newlines, words, and bytes in files

Problems found during the lesson:

  • You set the system default locale to Amharic, and the gdm login will be in Amharic input mode. We didn't find out how to switch it back to input roman characters. Right click on the input field to set the input method doesn't work. Since usernames are not in Amharic, you're locked out.
  • So you CTRL+ALT+F1, login and try dpkg-reconfigure locales. On Ubuntu Dapper, it does not work anymore.
  • So you dig and dig and dig and finally find that you can force a locale in /etc/default/gdm (but not in /etc/gdm/locale.conf, nor in /etc/gdm/gdm.conf).
  • Then the internet works for a bit and you look up how to reconfigure locales in Ubuntu. Turns out you have to use localeconf, which is not installed by default, is not in universe and thus not on the CDs, and needs to be downloaded from the Internet.
  • The Ubuntu wiki is all on https, which defeats any attempt of proxy caching.
  • An Internet proxy needs to be configured 3 times: in Gnome, in Firefox and in Synaptic (well, apt). This is especially tricky when you forgot to setup the proxy in Synaptic and seemingly unrelated applications fail, like the Ubuntu language selector, which internally invokes the package manager to download missing langpacks.
  • Some short descriptions in the NAME section of manpages are hard to understand, or wrong. Noted on apt-get, apt-cache and less. Top prize goes to apt-cache:

     NAME
            apt-cache - APT package handling utility -- cache manipulator
     DESCRIPTION
            [...] apt-cache does not manipulate the state of the system but
            does provide operations to search and generate interesting output
            from the package metadata. [...]
    

    So apt-cache is a manipulator that doesn't manipulate. A possible improvement can be "query the APT package cache".

  • The language selector in Ubuntu Breezy doesn't really exit and keeps the package database locked. This seems to be fixed in Dapper, and probably had been fixed in some Breezy update. System updates here are a problem: my Dapper (with some Universe things in it) wanted to download more than 120Mb of data, and the Uni network was giving me 14Kbps. It's been a nice opportunity to teach about fuser -uva and kill.
  • dict, squid and many other packages from 'main' are not on the normal Ubuntu CDs: is there an easy way to build a CD with them? Or do Ubuntu CDs with extra packages already exist? I'll have to find out.
  • cupsys has documentation outside of /usr/share/doc, in /usr/share/cups/doc-root.
  • man works on all commands, except cd, which is an internal shell command and thus needs help instead of man. I should remember to ponder about autogenerating manpages from help output.
  • Is there an index-like manpage with a list of the core Unix commands and their short descriptions? It there's not, it's easy to generate:

     #!/bin/sh
     DIR=${1:-"/bin"}
     (
     find $DIR | while read FILE
     do
         if [ -x $FILE ] && ! [ -d $FILE ]
         then
             LANG=C COLUMNS=2000 man `basename $FILE` | \
                      grep ^SYNOPSIS -B 100 | grep ^NAME -A 100 | \
                      tail -n +2 | head -n +2 | \
                      grep -v '^[ \t]*$'
         fi
     done
     ) | sort | uniq | sed 's/^ \+//'
    

    Try running it on /bin and /sbin: it's great!. Also, since it doesn't redirect stderr, it nicely exposes a number of manpage problems.

Lots of bugs to report when I come home: from here it'll take ages, and lots of money on the hotel internet connection, and some are Ubuntu-specific so I'd need to do everything online with Malone.

As usual, teaching is one of the best ways to find bugs.

I propose an Etch training session a month before release.

Other things to do:

  • Find more info about that Wikipedia live CD with Wikipedia browsable without the Internet.
  • Make a collection of Free technical E-books: even those Indian low-cost book editions are too expensive here, so E-books mean a lot.

Update: Matt Zimmerman writes:

I read your blog entry at http://www.enricozini.org/blog/eng/second-day-in-addis and wanted to respond as follows:

  • localeconf is not the standard way to configure locales in Ubuntu; what documentation told you that? It's an unsupported package from Progeny. If what you wanted was to set the system default locale from the command line, editing /etc/environment is probably the best way.

  • I suggest filing a bug report at <https://launchpad.net/products/ubuntu-website about the HTTPS issue>; I don't think it's necessary for the entire wiki to be HTTPS, only authentication.

  • Synaptic may be able to use the GNOME proxy settings without introducing undesirable dependencies; please file a wishlist bug

  • dict, squid and other packages from main are not on the Ubuntu CDs because there is no space. The DVD contains these packages.

  • The cupsys documentation bug was quite likely inherited from Debian and should be reported there

  • You can file bugs in Malone via email; this has been possible for a long time now. Please don't reinforce this misconception.

    https://help.launchpad.net/UsingMaloneEmail

Update:

Posted Sat Jun 6 00:57:39 2009 Tags:

Live CD on a removable disk

Eros is a hardware guru that happened to be the unknown guy sitting next to me on a plane.

He happens to be a happy Kubuntuer. While chatting, he told me one of his systems is an external hard drive made by copying a Kubuntu live CD image on it.

Why did you do so? I asked.

Because this way I can plug it in any computer, and it'll do hardware detection at boot. However it's a hard drive, so it's fast, and I can keep my home and all my customisations on it.

I had never thought of it.

That's an interesting and smart (ab)use of a live CD.

Now I wonder: what would be required to plug the live CD boot time hardware detection infrastructure on an existing Debian or Ubuntu instalation?

Update: slh on IRC suggests (a bit edited by me):

A lot of the former "obscure black magic" for live CDs isn't needed anymore. What is needed is: a kernel with static usb-storage, libusual, ehci-hcd, ohci-hcd, uhci-hdc (or an appropriate initrd/ initramfs). udev takes care of most h/w detection issues these days.

As long as everything needed to boot is contained in a single partition you don't need a fstab: udev, hal and pmount take care of the rest, procfs, sysvfs, devpts, usbfs, shm are mounted by sysvinit.

All what is left is a tool to create the xorg.conf while booting (those tools exist and just need to be called early).

Everything else is just a matter of convenience: enhancing the live span of the USB key by changing data into tmpfs, etc.; if passwordless logins are required then xsession and inittab need to be changed; new ssh host keys generated on boot; small stuff.

With ordinary flash storage, jffs2 and something to reduce write access is a good idea: perhaps unionfs for /var/ and /home/, bind mounting /tmp/ on /var/tmp/), but that's also not strictly necessary.

Mostly it boils down to running the xorg-creation script at every boot time.

There are various tools to do that. Some are here, but there is surely more. (Enrico's note: do we have anything in Debian that we can install and just does that?)

Since USB and PS/2 mice share the same device since kernel 2.6, that part of xorg.conf doesn't strictly need to be detected, same for the keyboard (alps and synaptic touchpads can be easily detected) and X.org can use the screen's ddc info although it's not always reliable.

It can boil down to just detecting the video chipset: something like this, that uses PCI IDs from discover1-data.

It can also become a lot easier with X.org's own ddc detection, which almost boils down to configuring input devices and selecting the video driver. If I understand Daniel Stone correctly, X.org will soon improve its detection routines (fail safe X (auto-)configuration) as well in X.org 7.3.

xresprobe is in debian: it's pretty similar to ddcxinfo-kanotix, both forked off RedHat's kudzu package - and all fail miserably on amd64. That's why ddcxinfo has a fallback to 1024*768 @75 Hz which "always works (+manual overrides)".

Posted Sat Jun 6 00:57:39 2009 Tags:

Fixing problems after upgrade to Dapper

Laptop: Asus M3Ae

Problem: Can't mount root partition because of various ACPI errors. Breezy kernel works.

Solution:

1) boot with old kernel 2) sudo echo "libata noacpi=1" >> /etc/mkinitramfs/modules 3) sudo mv /boot/initrd.img-2.6.15-25-686 /boot/initrd.img-2.6.15-25-686.backup 4) mkinitramfs -o /boot/initrd.img-2.6.15-25-686 2.6.15-25-686

Thanks: Matthew Garrett

Posted Sat Jun 6 00:57:39 2009 Tags:

Live CD on a removable disk, the Debian way

In [live-cd-on-removable-disk] at some point I wrote:

Enrico's note: do we have anything in Debian that we can install and just does that?

Here are the answers:

Sven Mueller writes:

Well, Enrico, a tool I really grew fond of, which auto-configures X on Debian systems is xdebconfigurator, it lacks being auto-run on each system start, which I consider a feature on normal systems, but for your proposed usage (i.e. a portable USB-storage based Debian system), it would certainly be the right thing.

Essentially, it never failed on me. Except for VMware virtual machines, where all it did wrong was that it proposed too high resolutions which resulted from my dual-screen Windows setup I ran VMware on. You might want to give it a try.

Tollef Fog Heen writes:

I added the support in casper for doing this almost a year ago and it has saved me lots of debugging time. Booting the live CD that way is almost as fast as booting an installed system. If you couple this with using the persistent storage support in casper, you can get the configure-on-boot support together with persistency.

In a later update, slh is quited saying that xresprobe doesn't work on AMD64. This is wrong, I wrote that support based on code by Matthew Garret a little more than nine months ago. I wouldn't recommend incorporating it in new-written code, but rather use libx86

And finally, Marco Amadori writes:

Without needing to look for tools external to Debian, there is already the Debian Live software in sid: live-package, that creates a live system, and casper, that generates an initramfs that can configure a Debian system on the fly.

So far there is no hard disk target for live-package, but the "Iso" target can already do the job quite well. At boot time, Casper's initramfs scans all the block devices, so it works also for USB keys and hard drives.

To obtain a hard drive image, you just need to invoke "make-live" with the options to have the required software, then copy the content of the iso (or of the directory ./debian-live/binary) on a partition and install the boot loader.

This is what the future "HD" target of live-package will do; so far it can only build ISO and Netboot images.

Posted Sat Jun 6 00:57:39 2009 Tags:
Posted Sat Jun 6 00:57:39 2009
pdo

Pages exported to http://planet.debian.org.

Work around Google evil .ics feeds

I've happily been using 2015/akonadi-install for my calendars, and yesterday I added an .ics feed export from Google, as a URL file source. It is a link in the form: https://www.google.com/calendar/ical/person%40gmail.com/private-12341234123412341234123412341234/basic.ics

After doing that, I noticed that the fan in my laptop was on more often than usual, and I noticed that akonadi-server and postgres were running very often, and doing quite a lot of processing.

The evil

I investigated and realised that Google seems to be doing everything they can to make their ical feeds hard to sync against efficiently. This is the list of what I have observed Gmail doing to an unchanged ical feed:

  • Date: headers in HTTP replies are always now
  • If-Modified-Since: is not supported
  • DTSTAMP of each element is always now
  • VTIMEZONE entries appear in random order
  • ORGANIZER CN entries randomly change between full name and plus.google.com user ID
  • ATTENDEE entries randomly change between having a CN or not having it
  • TRIGGER entries change spontaneously
  • CREATED entries change spontaneously

This causes akonadi to download and reprocess the entire ical feed at every single poll, and I can't blame akonadi for doing it. In fact, Google is saying that there is a feed with several years worth of daily appointments that all keep being changed all the time.

The work-around

As a work-around, I have configured the akonadi source to point at a local file on disk, and I have written a script to update the file only if the .ics feed has actually changed.

Have a look at the script: I consider it far from trivial, since it needs to do a partial parsing of the .ics feed to throw away all the nondeterminism that Google pollutes it with.

The setup

The script needs to be run periodically, and I used it as an opportunity to try systemd user timers:

    $ cat ~/.config/systemd/user/update-ical-feeds.timer
    [Unit]
    Description=Updates ical feeds every hour
    # Only run when on AC power
    ConditionACPower=yes

    [Timer]
    # Run every hour
    OnActiveSec=1h
    # Run a minute after boot
    OnBootSec=1m
    Unit=update-ical-feeds.service

    $ cat ~/.config/systemd/user/update-ical-feeds.service
    [Unit]
    Description=Update ICal feeds

    [Service]
    # Use oneshot to prevent two updates being run in case the previous one
    # runs for more time than the timer interval
    Type=oneshot
    ExecStart=/home/enrico/tmp/calendars/update

    $ systemctl --user start update-ical-feeds.timer
    $ systemctl --user list-timers
    NEXT                         LEFT       LAST                         PASSED UNIT                    ACTIVATES
    Wed 2015-03-25 22:19:54 CET  59min left Wed 2015-03-25 21:19:54 CET  2s ago update-ical-feeds.timer update-ical-feeds.service

    1 timers listed.
    Pass --all to see loaded but inactive timers, too.

To reload the configuration after editing: systemctl --user daemon-reload.

Further investigation

I wonder if ConditionACPower needs to be in the .timer or in the .service, since there is a [Unit] section is in both. Update: I have been told it can be in the .timer.

I also wonder if there is a way to have the timer trigger only when online. There is a network-online.target and I do not know if it is applicable. I also do not know how to ask systemd if all the preconditions are currently met for a .service/.timer to run.

Finally, I especially wonder if it is worth hoping that Google will ever make their .ics feeds play nicely with calendar clients.

Posted Wed Mar 25 21:50:21 2015 Tags:

Screen-dependent window geometry

I have an external monitor for my laptop in my work desk at home, and when I work I keep a few windows like IRC on my laptop screen, and everything else on the external monitor. Then maybe I transfer on the sofa to watch a movie or in the kitchen to cook, and I unplug from the external monitor to bring the laptop with me. Then maybe I go back to the external monitor to resume working.

The result of this (with openbox) is that when I disconnect the external monitor all the windows on my external monitor get moved to the right edge of the laptop monitor, and when I reconnect the external monitor I need to rearrange them all again.

I would like to implement something that does the following:

  1. it keeps a dictionary mapping screen geometry to window geometries
  2. every time a window geometry and virtual desktop number changes, it gets recorded in the hash for the current screen geometry
  3. every time the screen geometry changes, for each window, if there was a saved window geometry + wirtual desktop number for it for the new screen geometry, it gets restored.

Questions:

  1. Is anything like this already implemented? Where?
  2. If not, what would be a convenient way to implement it myself, ideally in a wmctrl-like way that does not depend on a specific WM?

Note: I am not interested in switching to a different WM unless it is openbox with this feature implemented in it.

Posted Mon Mar 16 21:29:36 2015 Tags:

Reuse passwords in /etc/crypttab

Today's scenario was a laptop with an SSD and a spinning disk, and the goal was to deploy a Debian system on it so that as many things as possible are encrypted.

My preferred option for it is to setup one big LUKS partition in each disk, and put a LVM2 Physical Volume inside each partition. At boot, the two LUKS partition are opened, their contents are assembled into a Volume Group, and I can have everything I want inside.

This has advantages:

  • if any of the disks breaks, the other can still be unlocked, and it should still be possible to access the LVs inside it
  • once boot has happened, any layout of LVs can be used with no further worries about encryption
  • I can use pvmove to move partitions at will between SSD and spinning disks, which means I can at anytime renegotiate the tradeoffs between speed and disk space.

However, by default this causes cryptsetup to ask for the password once for each LUKS partition, even if the passwords are the same.

Searching for ways to mitigate this gave me unsatisfactory results, like:

  • decrypt the first disk, and use a file inside it as the keyfile to decrypt the second one. But in this case if the first disk breaks, I also lose the data in the second disk.
  • reuse the LUKS session key for the first disk in the second one. Same problem as before.
  • put a detached LUKS header in /boot and use it for both disks, then make regular backups of /boot. It is an interesting option that I have not tried.

The solution that I found was something that did not show up in any of my search results, so I'm documenting it here:

    # <target name> <source device>   <key file>   <options>
    ssd             /dev/sda2         main         luks,initramfs,discard,keyscript=decrypt_keyctl
    spin            /dev/sdb1         main         luks,initramfs,keyscript=decrypt_keyctl

This caches each password for 60 seconds, so that it can be reused to unlock other devices that use it. The documentation can be found at the beginning of /lib/cryptsetup/scripts/decrypt_keyctl, beware of the leopard™.

main is an arbitrary tag used to specify which devices use the same password.

This is also useful to work easily with multiple LUKS-on-LV setups:

    # <target name> <source device>          <key file>  <options>
    home            /dev/mapper/myvg-chome   main        luks,discard,keyscript=decrypt_keyctl
    backup          /dev/mapper/myvg-cbackup main        luks,discard,keyscript=decrypt_keyctl
    swap            /dev/mapper/myvg-cswap   main        swap,discard,keyscript=decrypt_keyctl
Posted Thu Mar 12 22:45:57 2015 Tags:

Free as in Facebook

Yesterday we were in an airport. We tried to connect to the airport "free" wifi. It had a captive portal that asked for a lot of personal information before one could maybe get on the internet, and we gave up. Bologna Airport, no matter what they do to pretend that they like you, it's always clear that they don't.

I looked at the captive portal screen and I said: «ah yes, "free" wifi. Free as in Facebook».

We figured that we had an expression that will want to be reused.

Posted Mon Mar 9 10:58:49 2015 Tags:

Another day in the life of a poor developer

try:
    # After Python 3.3
    from collections.abc import Iterable
except ImportError:
    # This has changed in Python 3.3 (why, oh why?), reinforcing the idea that
    # the best Python version ever is still 2.7, simply because upstream has
    # promised that they won't touch it (and break it) for at least 5 more
    # years.
    from collections import Iterable

import shlex
if hasattr(shlex, "quote"):
    # New in version 3.3.
    shell_quote = shlex.quote
else:
    # Available since python 1.6 but deprecated since version 2.7: Prior to Python
    # 2.7, this function was not publicly documented. It is finally exposed
    # publicly in Python 3.3 as the quote function in the shlex module.
    #
    # Except everyone was using it, because it was the only way provided by the
    # python standard library to make a string safe for shell use
    #
    # See http://stackoverflow.com/questions/35817/how-to-escape-os-system-calls-in-python
    import pipes
    shell_quote = pipes.quote

import shutil
if hasattr(shutil, "which"):
    # New in version 3.3.
    shell_which = shutil.which
else:
    # Available since python 1.6:
    # http://stackoverflow.com/questions/377017/test-if-executable-exists-in-python
    from distutils.spawn import find_executable
    shell_which = find_executable
Posted Fri Feb 27 12:02:33 2015 Tags:

Akonadi client example

After many failed attemps I have managed to build a C++ akonadi client. It has felt like one of the most frustrating programming experiences of my whole life, so I'm sharing the results hoping to spare others from all the suffering.

First thing first, akonadi client libraries are not in libakonadi-dev but in kdepimlibs5-dev, even if kdepimlibs5-dev does not show in apt-cache search akonadi.

Then, kdepimlibs is built with Qt4. If your application uses Qt5 (mine was) you need to port it back to Qt4 if you want to talk to Akonadi.

Then, kdepimlibs does not seem to support qmake and does not ship pkg-config .pc files, and if you want to use kdepimlibs your build system needs to be cmake. I ported by code from qmake to cmake, and now qtcreator wants me to run cmake by hand every time I change the CMakeLists.txt file, and it stopped allowing to add, rename or delete sources.

Finally, most of the code / build system snippets found on the internet seem flawed in a way or another, because the build toolchain of Qt/KDE applications has undergone several redesignins during time, and the network is littered with examples from different eras. The way to obtain template code to start a Qt/KDE project is to use kapptemplate. I have found no getting started tutorial on the internet that said "do not just copy the snippets from here, run kapptemplate instead so you get them up to date".

kapptemplate supports building an "Akonadi Resource" and an "Akonadi Serializer", but it does not support generating template code for an akonadi client. That left me with the feeling that I was dealing with some software that wants to be developed but does not want to be used.

Anyway, now an example of how to interrogate Akonadi exists as is on the internet. I hope that all the tears of blood that I cried this morning have not been cried in vain.

Posted Mon Feb 23 15:44:01 2015 Tags:

The wonders of missing documentation

Update: I have managed to build an example Akonadi client application.

I'm new here, I want to make a simple C++ GUI app that pops up a QCalendarWidget which my local Akonadi has appointments.

I open qtcreator, create a new app, hack away for a while, then of course I get undefined references for all Akonadi symbols, since I didn't tell the build system that I'm building with akonadi. Ok.

How do I tell the build system that I'm building with akonadi? After 20 minutes of frantic looking around the internet, I still have no idea.

There is a package called libakonadi-dev which does not seem to have anything to do with this. That page mentions everything about making applications with Akonadi except how to build them.

There is a package called kdepimlibs5-dev which looks promising: it has no .a files but it does have haders and cmake files. However, qtcreator is only integrated with qmake, and I would really like the handholding of an IDE at this stage.

I put something together naively doing just what looked right, and I managed to get an application that segfaults before main() is even called:

/*
 * Copyright © 2015 Enrico Zini <enrico@enricozini.org>
 *
 * This work is free. You can redistribute it and/or modify it under the
 * terms of the Do What The Fuck You Want To Public License, Version 2,
 * as published by Sam Hocevar. See the COPYING file for more details.
 */
#include <QDebug>

int main(int argc, char *argv[])
{
    qDebug() << "BEGIN";
    return 0;
}
QT       += core gui widgets
CONFIG += c++11

TARGET = wtf
TEMPLATE = app

LIBS += -lkdecore -lakonadi-kde

SOURCES += wtf.cpp

I didn't achieve what I wanted, but I feel like I achieved something magical and beautiful after all.

I shall now perform some haruspicy on those oscure cmake files to see if I can figure something out. But seriously, people?

Posted Mon Feb 23 11:36:18 2015 Tags:

Setting up Akonadi

Now that I have a CalDAV server that syncs with my phone I would like to use it from my desktop.

It looks like akonadi is able to sync with CalDAV servers, so I'm giving it a try.

First thing first is to give a meaning to the arbitrary name of this thing. Wikipedia says it is the oracle goddess of justice in Ghana. That still does not hint at all at personal information servers, but seems quite nice. Ok. I gave up with software having purpose-related names ages ago.

# apt-get install akonadi-server akonadi-backend-postgresql

Akonadi wants a SQL database as a backend. By default it uses MySQL, but I had enough of MySQL ages ago.

I tried SQLite but the performance with it is terrible. Terrible as in, it takes 2 minutes between adding a calendar entry and having it show up in the calendar. I'm fascinated by how Akonadi manages to use SQLite so badly, but since I currently just want to get a job done, next in line is PostgreSQL:

# su - postgres
$ createuser enrico
$ psql postgres
postgres=# alter user enrico createdb;

Then as enrico:

$ createdb akonadi-enrico
$ cat <<EOT > ~/.config/akonadi/akonadiserverrc
[%General]
Driver=QPSQL

[QPSQL]
Name=akonadi-enrico
StartServer=false
Host=
Options=
ServerPath=
InitDbPath=

I can now use kontact to connect Akonadi to my CalDAV server and it works nicely, both with calendar and with addressbook entries.

KDE has at least two clients for Akonadi: Kontact, which is a kitchen sink application similar to Evolution, and KOrganizer, which is just the calendar and scheduling component of Kontact.

Both work decently, and KOrganizer has a pretty decent startup time. I now have a usable desktop PIM application that is synced with my phone. W00T!

Next step is to port my swift little calendar display tool to use Akonadi as a back-end.

Posted Tue Feb 17 15:34:55 2015 Tags:

seat-inspect

Four months ago I wrote this somewhere:

Seeing a DD saying "this new dbus stuff scares me" would make most debian users scared. Seeing a DD who has an idea of what is going on, and who can explain it, would be an interesting and exciting experience.

So, let's be exemplary, competent and patient. Or at least, competent. Some may like or not like the changes, but do we all understand what is going on? Will we all be able to support our friends and customers running jessie?

I confess that although I understand the need for it, I don't feel competent enough to support systemd-based machines right now.

So, are we maybe in need of help, cheat sheets, arsenals of one-liners, diagnostic tools?

Maybe a round of posts on -planet like "one debian package a day" but with new features that jessie will have, and how to understand them and take advantage of them?

That was four months ago. In the meantime, I did some work, and it got better for me.

Yesterday, however, I've seen an experienced Linux person frustrated because the shutdown function of the desktop was doing nothing whatsoever. Today I found John Goerzen's post on planet.

I felt like some more diagnostic tools were needed, so I spent the day making seat-inspect.

seat-inspect tries to make the status of the login/seat system visible, to help with understanding and troubleshooting.

The intent of running the code is to have an overview of the system status, both to see what the new facilities are about, and to figure out if there is something out of place.

The intent of reading the code is to have an idea of how to use these facilities: the code has been written to be straightforward and is annotated with relevant bits from the logind API documentation.

seat-inspect is not a finished tool, but a starting point. I put it on github hoping that people will fork it and add their own extra sanity checks and warnings, so that it can grow into a standard thing to run if a system acts weird.

As it is now, it should be able to issue warnings if some bits are missing for network-manager or shutdown functions to work correctly. I haven't really tested that, though, because I don't have a system at hand where they are currently not working fine.

Another nice thing of it is that when running seat-inspect -v you get a dump of what logind/consolekit think about your system. I found it an interesting way to explore the new functionalities that we recently grew. The same can be done, and in more details, with loginctl calls, but I lacked a summary.

After writing this I feel a bit more competent, probably enough to sit at somebody's computer and poke into loginctl bits. I highly recommend the experience.

Posted Tue Feb 10 18:06:43 2015 Tags:

Mozilla marketplace facepalm

This made me sad.

My view, which didn't seem to be considered in that discussion, is that people concerned about software freedom and security are likely to stay the hell away from such an app market and its feedback forms.

Also, that thread made me so sad about the state of that developer community that I seriously do not feel like investing energy into going through the hoops of getting an account in their bugtracker to point this out.

Sigh.

Posted Fri Jan 23 15:13:16 2015 Tags:
Posted Sat Jun 6 00:57:39 2009
dcg

Debian Community Guidelines.

Converging to a solution

Sustain a discussion towards solving a problem is sometimes more important than solving the problem.

I can't decide if this is trivial or counterintuitive. Anyway it's been quite enlightening when it came out. I once took this note:

I found that with my projects, when someone posted a mail about a problem I would work maybe some days to find a solution, and just post the solution at the end.

However, now I realised it's more costructive to have the problem-solving process itself happen online. This way, instead of keeping people waiting in silence for a few days they can get quicker feedback and extra informations, and they also have a chance to participate to solving the problem before I manage to.

For example, when I have to interrupt to go home or sleep, someone else can pick it up and do another step.

Plus, the entire problem-solving process remains documented, which will provide more written information for future readers.

This note was from a few months ago; however, I still fail to do it. Bad habits are sometimes hard to change. Please kick me about it :)

Posted Sat Jun 6 00:57:39 2009 Tags:

Reorganization of the DCG

I've recently received a substantial amount of feedback about the Debian Community Guidelines and went into some reorganization of it.

The previous general section still stands as the Main Guidelines: those are the substantial few things to always keep in mind.

What previously was the long list of checklists is now split in two: the Debian-specific Guidelines, which should be a shorter lists of non-obvious suggestions for people who already have experience with online life, and the General Guidelines, with the fuller checklists with useful suggestions for everyone.

I still haven't gone through the selection and reorganization of the Debian-specific and General part, so at the moment they look fairly similar and most things overlap. But the good news is that I finally found a structure that I like, and that can allow more experienced people to make use of the guidelines without getting bored with simpler things like "google before asking a question".

This division also suggests a little workflow: new suggestions can be added to the Debian-specific part, and then later moved to the general part when they become obvious for everyone.

I'm happy. This layout seems to be good in getting me unstuck with how to think of the DCG. More will come of course, as I'll prepare my DCG talk for Debconf6.

Posted Sat Jun 6 00:57:39 2009 Tags:

Debconf6 talks material now online

I've finally put online slides and notes for my debconf6 talks:

Many people had asked me the notes for the "Advanced ways of wasting time" talk: they're finally online, translated and with the links pointing to English Wikipedia pages. Sorry it took me so long.

Posted Sat Jun 6 00:57:39 2009 Tags:

DCG mentioned on Linux.com

A friend of mine pointed me to the Debian Community Guidelines being mentioned in a linux.com article.

"[Mako] hopes that Garrett's resignation will give the Debian community an added impetus to adapt its own code of conduct, like the one proposed by Enrico Zini."

I'm very happy to see the DCG geting mentioned, although I don't think that it makes sense to 'adopt' such a document.

what I'd like for it is to be mentioned as a suggested read, and linked from here and there. So today it happened, and I'm happy :)

Posted Sat Jun 6 00:57:39 2009 Tags:
Posted Sat Jun 6 00:57:39 2009

Localising free software for Taiwanese Aboriginal cultures.

People who participated so far:

Creating a new locale

I'm currently in Cilamitay, in the east of Taiwan. There is a little meeting of Taiwanese Free Software people and people from the Amis, Taroko and Puyuma tribes, with the idea of starting localisation efforts for some aboriginal languages.

These are some of the issues we are going to discuss:

Language code

A new ISO standard (639-3) will hopefully be formalised in January that will include the language codes for the Taiwanese aboriginal tribes. We'll have to work some temporary solution, but there's good hope that it won't have to be temporary for long.

List of characters

Because of Christian missionary influence, both Amis and Taroko use a roman alphabet, with accents. We need to work out the complete list of character and accent combination, see if everything is in Unicode, see how they sort.

We then need to find a comfortable way to input them using the keyboards normally available here (English US layout): compose key? Dead keys? How about on Windows?

Womble2 on IRC tells me that on Windows one can works with MSKLC.

Technical terms and country list

We need to work out how to map terms that do not exist in the language.

Technical terms are usually borrowed from Japanese.

Names for all the countries in the world probably do not exist.

Translation interface

We need to find an easy to use interface to input the translations.

There is Rosetta.

There is Pootle. (Thanks to Christian Perrier for pointing me at it)

There is Webpot.

Update: there is now a wiki page on the Debian wiki.

Posted Sat Jun 6 00:57:39 2009 Tags:

Amis and Paiwan input method and character set

Arne Götje (高盛華) created:

The scripts, especially Amis, make heavy use of Unicode combination characters. They should display well at least with the Dejavu Sans font in many applications.

Try it out: if it displays correctly, you should see:

  • accented letters instead of letters next to accents.
  • i with both the dot and the accent.

Update: there is now a wiki page on the Debian wiki.

Posted Sat Jun 6 00:57:39 2009 Tags:

Character list for the Amis language

We mapped the available glyphs and accents for the Amis language.

The letters in alphabetical order:

    a c d f ng h i k l m n o p r s t u w y

Everyone of them can get an acute or circumflex accent on top. ng can get a dot on top of the g.

The accents are literally on top: i would get the dot PLUS the accent on top.

Not all accented characters directly exist in Unicode; however Unicode developed various kinds of combination features to take care of these cases.

Then we need an input method that would insert ng instead of g and allow to type all the accent combinations.

Here is the full character set:

    a     á    â
    c     ć    ĉ
    d     d́    d̂
    f     f́    f̂
    ng    nǵ   nĝ  nġ
    h     h́    ĥ
    i     i̇́    i̇̂
    k     ḱ    k̂
    l     ĺ    l̂
    m     ḿ    m̂
    n     ń    n̂
    o     ó    ô
    p     ṕ    p̂
    r     ŕ    r̂
    s     ś    ŝ
    t     t́    t̂
    u     ú    û
    w     ẃ    ŵ
    y     ý    ŷ

Update: this character list has been improved and the good version is found in the Debian wiki.

The list is not displayed correctly with many fonts or rendering engines. Arne made a test page that explicitly sets a font that works.

The accents are not taken into account when sorting.

Uppercase letters are not used.

Note: the page has been updated to reflect further input from Unicode and Amis people.

Update: there is now a wiki page on the Debian wiki.

Posted Sat Jun 6 00:57:39 2009 Tags:

Happy new year

A year ago we got in touch with various Taiwanese aboriginal tribes to try to start localisation efforts.

Thanks to the research the Taroko people did during 2007 and the prototype work of tonight, the Taroko people in Taiwan can see the computer calendar of the new year in their own language:

trv_TZW Gnome calendar

Posted Sat Jun 6 00:57:39 2009 Tags:

Character list for the Paiwan language

We mapped the available glyphs and accents for the Paiwan language.

The letters in alphabetical order:

a b c d e f h i j k l m n p q r s t u v w y z ḏ nġ ḻ ṟ ṯ 

No uppercase.

Update: this character list has been improved and the good version is found in the Debian wiki.

All the characters are in Unicode except nġ, which already needs to be requested for the Amis script.

We need to design an input method to enter the underlined letters and the nġ.

Update: there is now a wiki page on the Debian wiki.

Posted Sat Jun 6 00:57:39 2009 Tags:
Posted Sat Jun 6 00:57:39 2009

Pages about Debtags.

Evolution's old odd mail folders to mbox

Something wrong happened in my dad's Evolution. It just would get stuck checking mail forever, with no useful diagnostic that I could find. Fun. Not.

Anyway, I solved by resetting everything to factory defaults, moving away all gconf entries and .evolution/ files. Then it started to work again, of course then I needed to reconfigure it from scratch.

It turned out however that some old mail was only archived locally, and in a kind of weird format that looks like this:

$ ls -la Enrico/
total 336
drwx------ 2 enrico enrico   4096 Jul 23 03:05 .
drwxr-xr-x 7 enrico enrico   4096 Jul 23 03:12 ..
-rw------- 1 enrico enrico   3230 Dec  4  2010 113.HEADER
-rw------- 1 enrico enrico  14521 Dec  4  2010 113.TEXT
-rw------- 1 enrico enrico   3209 Oct 22  2010 134.HEADER
-rw------- 1 enrico enrico   2937 Oct 22  2010 134.TEXT
-rw------- 1 enrico enrico   3116 Jun 27  2011 15.
-rw------- 1 enrico enrico   3678 Jun 27  2011 168.
-rw------- 1 enrico enrico     73 Apr 27  2009 22.1.MIME
-rw------- 1 enrico enrico   3199 Apr 27  2009 22.2
-rw------- 1 enrico enrico     88 Apr 27  2009 22.2.MIME
[...]

I couldn't even find the name of that mail folder layout, let alone conversion tools. So I had to sit down and waste my sunday break writing software to convert that to a mbox file. Here's the tool, may it save you the awful time I had today: http://anonscm.debian.org/gitweb/?p=users/enrico/evo2mbox.git

Note: feel free to fork it, or send patches, but don't bother with feature requests. Evolution isn't and won't be a personal interest of mine. Anything that makes an afternoon at my parents more tiresome than a whole busy month of paid work, doesn't deserve to be.

Luckily they now seem to have changed the local folder format to Maildir.

Posted Mon Jul 23 03:27:50 2012 Tags:

Giving away distromatch

at last year's Fosdem I tried to inject a lot of energy into distromatch but shortly afterwards I've had to urgently rewrite the nm.debian.org website.

After Lars Wirzenius GTDFH talks in Bologna and Varese I wrote a tool which, among other things, is able to scan my home dir and list how many projects I'm working on.

The output was scary. Like, they are too many. Like, I couldn't even recite the list out of memory. And since I couldn't do that, I had no idea there were so many. And I kept being stressful because I couldn't manage to take care of them all properly.

Now that I became conscious of the situation, it's time to deal with it like a grown up, and politely back off from some of my irresponsible responsibilities.

Distromatch is one of them. It had just started as a proof of concept prototype, and I had the vision that it could be the basis for a fantastic culture of sharing and exchange of information across distributions.

I need to distinguish the vision from the responsibility. I still have that vision for distromatch, but I cannot take responsibility for making it happen.

So I am giving it up to anyone who has the time and resources to pick up that responsibility.

Current status

It works well enough as a prototype. I believe it can successfully map a large enough slice of packages, that one can prototype stuff based on it.

I have for example used it to export the Debtags categories for other distros, and the resulting file looked big enough to be used for prototyping category-based features on distributions that don't have them yet.

I think it also works well enough to support a few common use cases, like sharing screenshots, or doing most of the work of converting dependency lists from a distro to another.

And finally, anyone can deploy it, and work on it.

Existing data sources

Everything I index in the Debian distromatch deployment is available at http://dde.debian.net/exports/distromatch/. The rpm-based data in there comes from an export script I wrote that runs on Sophie, but which I cannot maintain properly.

This is an experimental export of Fedora and OpenSUSE data: http://tmp.vuntz.net/misc/distromatch/distromatch-opensuse-fedora.tar

All existing export scripts are found in distromatch git repo on gitorious.

Contacts I gathered at Fosdem

At Fosdem I devoted quite some work to get contacts from all possible distributions and software repositories, so that distromatch could be hooked into them. Here is a dump of what I have collected:

  • Debian: me
  • OpenSuse: Vincent Untz and Adrian Schröter
  • Fedora: Tom "Spot" Callaway
  • Arch: Tasser on IRC
  • CPAN: contact the people of https://metacpan.org/, on irc.perl.org:#metacpan or make an issue on github
  • NetBSD: ask on #netbsd on Freenode
  • FreeBSD: Baptiste Daroussin (bapt)
  • Mageia: Olivier Thauvin

Some of those contacts may have "expired" in the meantime: I wouldn't assume all of them still remember talking with me, although most probably still do.

My commitment for the time being

I am happy to commit, at the moment, to maintaining a working data export for Debian data. I can take responsibility for making it so that the Debian data for it stays up to date, and to fix it asap if it isn't the case.

I hope that now someone can take distromatch over from me, and make it grow to achieve its great potential.

Posted Sat Jul 21 16:54:18 2012 Tags:

More diversity in Debian skills

This blog post has been co-authored with Francesca Ciceri.

In his Debconf talk, zack said:

We need to understand how to invite people with different backgrounds than packaging to join the Debian project [...] I don't know what exactly, but we need to do more to attract those kinds of people.

Francesca and I know what we could do: make other kinds of contributions visible.

Basically, we should track and acknowledge the contributions of webmasters, translators, programmers, sysadmins, event organisers, and so on, at the same level as what we do for packagers: DDPO, minechangelogs, Portfolio...

For any non-packaging activity that we can make visible and credited, we get:

  • to acknowledge the people who do it, and show that they are active contributors in the project;

  • to acknowledge the work that gets done, and show the actual amount of non-packaging work that gets done in Debian every day;

  • to allow non-packagers to have a reputation, too: first of all, they deserve it, and among other things, it would make nm processing trivial.  

Here's an example: who's the lead translator for German? And if you are German, who's the lead translator for Spanish? Czech? Thai? I (Enrico) don't know the answers, not even for Italian, but we all should! Or at least it should be trivial to find out.

To start to change this, is just a matter of programming.

Francesca already worked on a list of trackable data sources, at least for translators.

Here are some more details, related to translation:

  • Translations can be tracked via the i18n robot (and relative statistics). This works only with teams who activated the robot and actively use the pseudo-urls in their messages on localisation mailing lists. Some translators don't bother to do it but it's ok to only support the main workflow. It beats extracting .po files from l10n-tagged BTS bugs at any rate.

  • DPN and website translations: for wml pages there's a specific field to be extracted for each translated page: grep for maintainer="name" on normal wml pages, while for DPN translations we have a specific translator="name" field. The problem is that this field is not mandatory, so sometimes there's no indication of the maintainer. Again, it's ok to only support the main workflow.

    Anyway, this is preferable to the cvs log: often the commit is done by the coordinator of the team and not by the actual translator. See above for the alternative solution of using the statistics provided by the i18n bot.

  • DDTSS: since the new release of DDTSS-Django, done by Martijn van Oosterhout about a year ago, the contributions are by default non-anonymous. This should be easy to track.

  • http://wiki.debian.org: it is more complicated because in the wiki we do not have a proper l10n translation workflow, so the only thing that can be tracked are changelogs $LANG/* pages. A nice idea would be to have translated pages list the version of the page that was translated and who did the translation.

  • translation of debian manuals and release notes: usually in the translation of manuals and long documentation there is a specific translator field.

And here are some notes about other fields:

  • DPN editors: for each issue there's a list of editors at the bottom of the page. In the wml: grep for editor=.

  • Artwork: artwork submitted via debianart are easy to track on the portal. Anyway usually you can find the author in the license and copyright file.

  • Programming: the only thing we have is the list of services which can be expanded if needed.

  • Press and publicity: there seems to be not much besides svn logs.

  • l10n-english: The Smith Review Project page has some tracking links. Other activities can probably only be tracked, at the moment, via mailing list activity.

  • Events: we can use the "main coordinator" field on www.debian.org/events/$year/$date-$eventname.wml: grep for <define-tag coord>; for events not published on the http://www.debian.org, but only on http://wiki.debian.org, the coordinator or the contact for the event is usually present on the page itself.

  • Sysadmins: we haven't asked DSA.

And finally, if you are still wondering who those translation coordinators are, they are listed here, although not all teams keep that page up to date.

Of course, when a data source is too hard to mine, it can make sense to see if the workflow could be improved, rather than spending months writing compicated mining code.

This is a fun project for people at Debconf to get together and try.

If by the end of the conference we had a way to credit some group of non-packaging contributors, even if just one like translators or website contributors, at least we would finally have started having official trackers for the activities of non-packagers.

Posted Thu Jul 12 14:01:54 2012 Tags:

Debtags for derivative distributions

Sometimes I do cool stuff and I forget to announce it.

Ok, so I recently announced a new Debtags website.

I forgot to say in the announcement that the new website does not only know of Debian packages: see for example this page, at the very bottom it says: "Distributions: oneiric, precise, sid, testing".

This means that already, here and now, debtags.debian.net can be used to tag packages from both Debian and Ubuntu, and can easily be extended to cover the entire Debian ecosystem.

If you are a package maintainer, you will notice that your maintainer page shows your packages from everywhere. If you want to filter things a bit, for example hide obsolete packages from an old Debian Stable or Ubuntu LTS, just click on the "Settings" link on the top right to configure the page.

How it works

The magic is in this mergepackages script, which is run daily, and exports merged Packages files at dde.debian.net. The debtags.debian.net concept of Packages and Sources files are just those all-merged.gz and all-merged-sources.gz.

The merging is simple: that rebuild script processes files in order, and the first version of a package that is found is chosen as the base for the one that will go in the merged Packages file. Some fields like "Description" are just taken from this pivot package, others like Architecture or dependencies are merged into it. It's arbitrary, but works for me: the result has all the packages with all their possible architectures and dependencies, and is ready to be indexed with apt-xapian-index.

At the moment I pull data from Debian and Ubuntu, but you can see that the script can easily be extended to pull data from any Debian-style ftp archive, so any Debian derivative can go in. I've already started negotiations with the Derivatives Census on how to add any Debian derivative and keep the list up to date.

How to export tags for your own distribution

I'll use Ubuntu as an example since the data is already available.

The way you add Debtags to the Ubuntu packages file is just this one:

  1. Get the full reviewed tag database
  2. Optionally filter out those packages that you are not interested in
  3. Tweak this script to build an overrides file.
  4. Give the overrides file to your favourite ftp archive building tool.

The make-overrides is a bit rusty: if you improve it, please send me your changes.

That is it, nothing else required, no excuses, it's ready, here, now!

Hitches and gotchas

This merged Packages file is a bit of a hack, and suffers from name conflicts across distributions, where two different softwares are packaged in two different distributions with the same name.

Ideally, name conflicts should not happen: if a derivative decided to package kate and call it gedit, they deserve to have it tagged uitoolkit::gtk. I think it's rather important that the whole Debian ecosystem works as much as possible with a single package namespace.

However, that reasoning fails if you take time into account: packages get renamed, like git and chromium, and may mean completely different things, for example, if you compare Debian Stable with Debian Sid.

This last is a problem caused by debtags only working with package names but not package versions. I have a strategy in mind based on being able to override the stable tag database using headers in debian/control; it still needs some details sorted out, but I'm confident we will be able to address these issues properly soon enough.

Why stop at the Debian ecosystem?

Why indeed. I'm clearly trying to use FOSDEM, and the CrossDistribution devroom as the venue to discuss just that.

Posted Fri Jan 20 15:12:33 2012 Tags:

Deploying distromatch

I have been working on allowing anyone to set up their own distromatch instance.

For Debian and Ubuntu, I can easily generate the distromatch input using UDD and the Contents files found in any mirrors.

For the whole RPM world, thanks to Olivier Thauvin I have been able to set up regular exports from the vast Sophie database.

I have set up distromatch access on DDE, which can also serve as a list of all working distributions so far. If you have access to the full dataset of package names and package contents for a distribution not in that list, please get in touch and we can add it.

I'm also exporting the full raw dataset which enables anyone to set up the same distromatch environment on their own machines.

Here is how:

# Get distromatch
git clone git://gitorious.org/appstream/distromatch.git
cd distromatch

# Fetch distribution information (updated every 2 days)
wget http://dde.debian.net/exports/distromatch-all.tar.gz

# Unpack it
mkdir data
tar -C data -zxf distromatch-all.tar.gz

# Reindex it (use --verbose if you are curious)
./distromatch --datadir=data --reindex --verbose

# Run it
./distromatch --datadir=data debian gedit

What does this mean? For example it means that if another distribution has some data (categories, screenshots...) that your distribution doesn't have, you can use distromatch to translate package names, then go and get it!

My next step is going to be to improve the distromatch functionality in DDE and possibly build a simple user friendly web interface to it. If you have some JQuery experience and would like to help, don't wait to get in touch.

Posted Fri Feb 18 13:46:30 2011 Tags:

update-apt-xapian-index on other distros

I've drafted a little HOWTO on using apt-xapian-index on non-Debian distributions.

The procedure has been tried on Mageia with some success, and there's no reason it wouldn't work everywhere else: the index itself does not depend on anything distro-specific.

Posted Tue Jan 25 23:01:45 2011 Tags:

A prototype webby markety appy thing

What better way to introduce my work at an Application Installer meeting than to come with a prototype package browser modeled after shopping sites developed in just a few hours?

It's a little Flask webapp that just works on any Debian system, using the local apt-xapian-index as a backend. It has fast keyword search, faceted navigation and screenshots, and it runs on your system showing the packages that you have available.

Screenshot of packageshelf

To try it:

git clone git://git.debian.org/users/enrico/pkgshelf.git
cd pkgshelf
./web-server.py

Then visit http://localhost:5000

It didn't have much interface polishing, as it's just a quick technology demo. However you can see that:

  • keyword search is fast (fast enought that it could be made to search as you type);
  • relevant tags appear on the left, grouped by facets;
  • the most relevant tags are highlighted;
  • the less relevant tags could be hidden behind a [more] expander;
  • you can choose several strategies to hide packages you may find irrelevant.

Things that need doing:

  • hiding uninteresting facets;
  • making it pretty.

It's essentially JavaScript and CSS work. Anyone wants to play?

Posted Sat Jan 22 01:40:50 2011 Tags:

Match package names across distributions

What would happen if we had a quick and reliable way to match package names across distributions?

These ideas came up at the appinstaller2011 meeting:

  • it would be easy to lookup screenshots in the local distro, and if there are none then fall back on other distributions;
  • it would be easy to port Debtags to other distributions, and possibly get changes back;
  • it would be trivial to add a [patches in $DISTRO] link to the PTS
  • it would be easy to point to other BTSes

We thought they were good ideas, so we started hacking.

To try it, you need to get the code and build the index first:

git clone git://git.debian.org/users/enrico/distromatch.git
cd distromatch
# Careful: 90Mb
wget http://people.debian.org/~enrico/dist-info.tar.gz
tar zxf dist-info.tar.gz
# Takes a long time to do the indexing
./distromatch --reindex --verbose

Then you can query it this way:

./distromatch $DISTRO $PKGNAME [$PKGNAME1 ...]

This would give you, for the package $PKGNAME in $DISTRO, the corresponding package names in all other distros for which we have data. If you do not provide package names, it automatically shows output for all packages in $DISTRO.

For example:

$ time ./distromatch debian libdigest-sha1-perl
debian:libdigest-sha1-perl fedora:perl-Digest-SHA1
debian:libdigest-sha1-perl mandriva:perl-Digest-SHA1
debian:libdigest-sha1-perl suse:perl-Digest-SHA1

real    0m0.073s
user    0m0.056s
sys 0m0.016s

Yes it's quick. It builds a Xapian index with the information it needs, and then it reuses it. As soon as I find a moment, I intend to deploy an instance of it on DDE.

It is using a range of different heuristics:

  • match packages by name;
  • match packages by desktop files contained within;
  • match packages by pkg-config metadata files contained within;
  • match packages by [/usr]/bin/* files contained within;
  • match packages by shared library files contained within;
  • match packages by devel library files contained within;
  • match packages by man pages contained within;
  • match stemmed form of development library package names;
  • match stemmed form of shared library package names;
  • match stemmed form of perl library package names;
  • match stemmed form of python library package names.

This list may get obsolete soon as more heuristics get implemented.

Euristics will never cover all corner cases we surely have, but the idea is that if we can match a sizable amout of packages, the rest can be somehow fixed by hand as needed.

The data it requires for a distribution should be rather straightforward to generate:

  1. a file which maps binary package names to source package names
  2. a file with the list of files in all the packages

For example:

$ ls -l dist-debian/
total 39688
-rw-r--r--  1 enrico enrico  1688249 Jan 20 17:37 binsrc
drwxr-xr-x  2 enrico enrico     4096 Jan 21 19:12 db
-rw-r--r--  1 enrico enrico 29960406 Jan 21 10:02 files.gz
-rw-r--r--  1 enrico enrico  8914771 Jan 21 18:39 interesting-files

$ head dist-debian/binsrc 
openoffice.org-dev openoffice.org
ext4-modules-2.6.32-5-4kc-malta-di linux-kernel-di-mipsel-2.6
linux-headers-2.6.30-2-common linux-2.6
libnspr4 nspr
ipfm ipfm
libforks-perl libforks-perl
med-physics debian-med
libntfs-3g-dev ntfs-3g
libguppi16 guppi
selinux selinux

$ zcat dist-debian/files.gz | head
memstat etc/memstat.conf
memstat usr/bin/memstat
memstat usr/share/doc/memstat/changelog.gz
memstat usr/share/doc/memstat/copyright
memstat usr/share/doc/memstat/memstat-tutorial.txt.gz
memstat usr/share/man/man1/memstat.1.gz
libdirectfb-dev usr/bin/directfb-config
libdirectfb-dev usr/bin/directfb-csource
libdirectfb-dev usr/include/directfb-internal/core/clipboard.h
libdirectfb-dev usr/include/directfb-internal/core/colorhash.h

interesting-files and db are generated when indexing.

To prove the usefulness of the idea (but does it need proving?), you can find in the same git repo a little example app (it took me 10 minutes to write it), that uses the distromatch engine to export Debtags tags to other distributions:

$ ./exportdebtags fedora | head
memstat: admin::benchmarking, interface::commandline, role::program, use::monitor
libdirectfb-dev: devel::lang:c, devel::library, implemented-in::c, interface::framebuffer, role::devel-lib
libkonqsidebarplugin4a: implemented-in::c++, role::shared-lib, suite::kde, uitoolkit::qt
libemail-simple-perl: devel::lang:perl, devel::library, implemented-in::perl, role::devel-lib, role::shared-lib, works-with::mail
libpoe-component-pluggable-perl: devel::lang:perl, devel::library, implemented-in::perl, role::shared-lib
manpages-ja: culture::japanese, made-of::man, role::documentation
libhippocanvas-dev: devel::library, qa::low-popcon, role::devel-lib
libexpat-ocaml-dev: devel::lang:ocaml, devel::library, implemented-in::c, implemented-in::ocaml, role::devel-lib, works-with-format::xml
libgnutls-dev: devel::library, role::devel-lib, suite::gnu

Just in case this made you itch to play with Debtags in a non-Debian distribution, I've generated the full datasets for Fedora, Mandriva and OpenSUSE.

Others have been working on the same matching problem. After we started writing code we started to become aware of existing work:

I'd like to make use of those efforts, maybe to cross-validate results, maybe even better as yet another heuristics.

Update:

I built a simple distromatch query system into DDE!

Posted Sat Jan 22 01:40:50 2011 Tags:

Cross-distro Meeting on Application Installer

I have been to a Cross-distro Meeting on Application Installer which to the best of our knowledge is also the first one of its kind. Credit goes to Vincent Untz for organising it, to OpenSUSE for hosting it and to the various sponsors for getting us there.

It went surprisingly well. We got along, got stuff done, did as much work as possible to agree on as many formats, protocols and technologies as we possibly could.

The timing of it is very important, as most major distros would like to adopt some of the features that just became popular in the various new app markets and stores, such as screenshots, user comments and ratings. It looks like a lot of new code is about to be written, or a lot of existing code is about to gain quite a bit of popularity.

For my part, I presented the work on Debtags and apt-xapian-index.

With regards to Debtags, other distros seem to be missing a compehensive classification system, and Debtags is, well, it.

With regards to apt-xapian-index, we just noticed that it's the perfect back-end for what everyone would like to do, and the index structure is rather distribution-agnostic, and it's been road-tested with considerable success by at least software-center, so it attracted quite a bit of interest, and will likely attract some more.

Just to prove a point I put together a prototype webby markety appy thing in just a few hours of work.

The meeting was also the ideal place to create a joint effort to match package names across distributions, which means that a lot of things that were hard to share before, such as screenshots, tags and patches, are suddenly not hard to share anymore.

Posted Sat Jan 22 01:40:50 2011 Tags:

fuss-launcher: an application launcher built on apt-xapian-index

Long ago I blogged about using apt-xapian-index to write an application launcher.

Now I just added a couple of new apt-xapian-index plugins that look like they have been made just for that.

In fact, they have indeed been made just for that.

After my blog post in 2008, people from Truelite and the FUSS project took up the challenge and wrote a launcher applet around my example engine.

The prototype has been quite successful in FUSS, and as a consequence I've been asked (and paid) to bring in some improvements.

The result, that I have just uploaded to NEW, is a package called fuss-launcher:

* New upstream release
   - Use newer apt-xapian-index: removed need of local index
   - Dragging a file in the launcher shows the applications that can open it
   - Remembers the applications launched more frequently
   - Allow to set a list of favourite applications

To get it:

  • apt-get install fuss-launcher (after it passed NEW);
  • or git clone http://git.fuss.bz.it/git/launcher.git/ and apt-get install python-gtk2 python-xapian python-xdg apt-xapian-index app-install-data

It requires apt-xapian-index >= 0.35.

To try it:

  1. Make sure your index is up to date, especially if you just installed app-install-data: just run update-apt-xapian-index as root.
  2. Run fuss-launcher.
  3. Click on the new tray icon to open the launcher dialog.
  4. Type some keywords and see the list of matching applications come to life as you type.

It's worth mentioning again that all this work was sponsored by Truelite and the Fuss project, which rocks.

Some screenshots:

When you open the launcher, by default it shows the most frequently started applicationss and the favourite applications:

launcher just opened

When you type some keywords, you get results as you type, and context-sensitive completion:

keyword search

When you drag a file on the launcher you only see the applications that can open that file:

drag files to the launcher

Posted Mon May 17 10:41:09 2010 Tags:
Posted Sat Jun 6 00:57:39 2009
ppy

Posts for Planet Python.

Custom function decorators with TurboGears 2

I am exposing some library functions using a TurboGears2 controller (see web-api-with-turbogears2). It turns out that some functions return a dict, some a list, some a string, and TurboGears 2 only allows JSON serialisation for dicts.

A simple work-around for this is to wrap the function result into a dict, something like this:

@expose("json")
@validate(validator_dispatcher, error_handler=api_validation_error)
def list_colours(self, filter=None, productID=None, maxResults=100, **kw):
    # Call API
    res = self.engine.list_colours(filter, productID, maxResults)

    # Return result
    return dict(r=res)

It would be nice, however, to have an @webapi() decorator that automatically wraps the function result with the dict:

def webapi(func):
    def dict_wrap(*args, **kw):
        return dict(r=func(*args, **kw))
    return dict_wrap

# ...in the controller...

    @expose("json")
    @validate(validator_dispatcher, error_handler=api_validation_error)
    @webapi
    def list_colours(self, filter=None, productID=None, maxResults=100, **kw):
        # Call API
        res = self.engine.list_colours(filter, productID, maxResults)

        # Return result
        return res

This works, as long as @webapi appears last in the list of decorators. This is because if it appears last it will be the first to wrap the function, and so it will not interfere with the tg.decorators machinery.

Would it be possible to create a decorator that can be put anywhere among the decorator list? Yes, it is possible but tricky, and it gives me the feeling that it may break in any future version of TurboGears:

class webapi(object):
    def __call__(self, func):
        def dict_wrap(*args, **kw):
            return dict(r=func(*args, **kw))
        # Migrate the decoration attribute to our new function
        if hasattr(func, 'decoration'):
            dict_wrap.decoration = func.decoration
            dict_wrap.decoration.controller = dict_wrap
            delattr(func, 'decoration')
        return dict_wrap

# ...in the controller...

    @expose("json")
    @validate(validator_dispatcher, error_handler=api_validation_error)
    @webapi
    def list_colours(self, filter=None, productID=None, maxResults=100, **kw):
        # Call API
        res = self.engine.list_colours(filter, productID, maxResults)

        # Return result
        return res

As a convenience, TurboGears 2 offers, in the decorators module, a way to build decorator "hooks":

class before_validate(_hook_decorator):
    '''A list of callables to be run before validation is performed'''
    hook_name = 'before_validate'

class before_call(_hook_decorator):
    '''A list of callables to be run before the controller method is called'''
    hook_name = 'before_call'

class before_render(_hook_decorator):
    '''A list of callables to be run before the template is rendered'''
    hook_name = 'before_render'

class after_render(_hook_decorator):
    '''A list of callables to be run after the template is rendered.

    Will be run before it is returned returned up the WSGI stack'''

    hook_name = 'after_render'

The way these are invoked can be found in the _perform_call function in tg/controllers.py.

To show an example use of those hooks, let's add a some polygen wisdom to every data structure we return:

class wisdom(decorators.before_render):
    def __init__(self, grammar):
        super(wisdom, self).__init__(self.add_wisdom)
        self.grammar = grammar
    def add_wisdom(self, remainder, params, output):
        from subprocess import Popen, PIPE
        output["wisdom"] = Popen(["polyrun", self.grammar], stdout=PIPE).communicate()[0]

# ...in the controller...

    @wisdom("genius")
    @expose("json")
    @validate(validator_dispatcher, error_handler=api_validation_error)
    def list_colours(self, filter=None, productID=None, maxResults=100, **kw):
        # Call API
        res = self.engine.list_colours(filter, productID, maxResults)
    
        # Return result
        return res

These hooks cannot however be used for what I need, that is, to wrap the result inside a dict. The reason is because they are called in this way:

        controller.decoration.run_hooks(
                'before_render', remainder, params, output)

and not in this way:

        output = controller.decoration.run_hooks(
                'before_render', remainder, params, output)

So it is possible to modify the output (if it is a mutable structure) but not to exchange it with something else.

Can we do even better? Sure we can. We can assimilate @expose and @validate inside @webapi to avoid repeating those same many decorator lines over and over again:

class webapi(object):
    def __init__(self, error_handler = None):
        self.error_handler = error_handler

    def __call__(self, func):
        def dict_wrap(*args, **kw):
            return dict(r=func(*args, **kw))
        res = expose("json")(dict_wrap)
        res = validate(validator_dispatcher, error_handler=self.error_handler)(res)
        return res

# ...in the controller...

    @expose("json")
    def api_validation_error(self, **kw):
        pylons.response.status = "400 Error"
        return dict(e="validation error on input fields", form_errors=pylons.c.form_errors)

    @webapi(error_handler=api_validation_error)
    def list_colours(self, filter=None, productID=None, maxResults=100, **kw):
        # Call API
        res = self.engine.list_colours(filter, productID, maxResults)

        # Return result
        return res

This got rid of @expose and @validate, and provides almost all the default values that I need. Unfortunately I could not find out how to access api_validation_error from the decorator so that I can pass it to the validator, therefore I remain with the inconvenience of having to explicitly pass it every time.

Posted Wed Nov 4 17:52:38 2009 Tags:

Building a web-based API with Turbogears2

I am using TurboGears2 to export a python API over the web. Every API method is wrapper by a controller method that validates the parameters and returns the results encoded in JSON.

The basic idea is this:

@expose("json")
def list_colours(self, filter=None, productID=None, maxResults=100, **kw):
    # Call API
    res = self.engine.list_colours(filter, productID, maxResults)

    # Return result
    return res

To validate the parameters we can use forms, it's their job after all:

class ListColoursForm(TableForm):
    fields = [
            # One field per parameter
            twf.TextField("filter", help_text="Please enter the string to use as a filter"),
            twf.TextField("productID", help_text="Please enter the product ID"),
            twf.TextField("maxResults", validator=twfv.Int(min=0), default=200, size=5, help_text="Please enter the maximum number of results"),
    ]
list_colours_form=ListColoursForm()

#...

    @expose("json")
    @validate(list_colours_form, error_handler=list_colours_validation_error)
    def list_colours(self, filter=None, productID=None, maxResults=100, **kw):
        # Parameter validation is done by the form
    
        # Call API
        res = self.engine.list_colours(filter, productID, maxResults)
    
        # Return result
        return res

All straightforward so far. However, this means that we need two exposed methods for every API call: one for the API call and one error handler. For every API call, we have to type the name several times, which is error prone and risks to get things mixed up.

We can however have a single error handler for all methonds:

def get_method():
    '''
    The method name is the first url component after the controller name that
    does not start with 'test'
    '''
    found_controller = False
    for name in pylons.c.url.split("/"):
        if not found_controller and name == "controllername":
            found_controller = True
            continue
        if name.startswith("test"):
            continue
        if found_controller:
            return name
    return None

class ValidatorDispatcher:
    '''
    Validate using the right form according to the value of the "method" field
    '''
    def validate(self, args, state):
        method = args.get("method", None)
    # Extract the method from the URL if it is missing
        if method is None:
            method = get_method()
            args["method"] = method
        return forms[method].validate(args, state)

validator_dispatcher = ValidatorDispatcher()

This validator will try to find the method name, either as a form field or by parsing the URL. It will then use the method name to find the form to use for validation, and pass control to the validate method of that form.

We then need to add an extra "method" field to our forms, and arrange the forms inside a dictionary:

class ListColoursForm(TableForm):
    fields = [
            # One hidden field to have a place for the method name
            twf.HiddenField("method")
            # One field per parameter
            twf.TextField("filter", help_text="Please enter the string to use as a filter"),
    #...

forms["list_colours"] = ListColoursForm()

And now our methods become much nicer to write:

    @expose("json")
    def api_validation_error(self, **kw):
        pylons.response.status = "400 Error"
        return dict(form_errors=pylons.c.form_errors)

    @expose("json")
    @validate(validator_dispatcher, error_handler=api_validation_error)
    def list_colours(self, filter=None, productID=None, maxResults=100, **kw):
        # Parameter validation is done by the form
    
        # Call API
        res = self.engine.list_colours(filter, productID, maxResults)
    
        # Return result
        return res

api_validation_error is interesting: it returns a proper HTTP error status, and a JSON body with the details of the error, taken straight from the form validators. It took me a while to find out that the form errors are in pylons.c.form_errors (and for reference, the form values are in pylons.c.form_values). pylons.response is a WebOb Response that we can play with.

So now our client side is able to call the API methods, and get a proper error if it calls them wrong.

But now that we have the forms ready, it doesn't take much to display them in web pages as well:

def _describe(self, method):
    "Return a dict describing an API method"
    ldesc = getattr(self.engine, method).__doc__.strip()
    sdesc = ldesc.split("\n")[0]
    return dict(name=method, sdesc = sdesc, ldesc = ldesc)

@expose("myappserver.templates.myappapi")
def index(self):
    '''
    Show an index of exported API methods
    '''
    methods = dict()
    for m in forms.keys():
        methods[m] = self._describe(m)
    return dict(methods=methods)

@expose('myappserver.templates.testform')
def testform(self, method, **kw):
    '''
    Show a form with the parameters of an API method
    '''
    kw["method"] = method
    return dict(method=method, action="/myapp/test/"+method, value=kw, info=self._describe(method), form=forms[method])

@expose(content_type="text/plain")
@validate(validator_dispatcher, error_handler=testform)
def test(self, method, **kw):
    '''
    Run an API method and show its prettyprinted result
    '''
    res = getattr(self, str(method))(**kw)
    return pprint.pformat(res)

In a few lines, we have all we need: an index of the API methods (including their documentation taken from the docstrings!), and for each method a form to invoke it and a page to see the results.

Make the forms children of AjaxForm, and you can even see the results together with the form.

Posted Thu Oct 15 15:45:39 2009 Tags:

Creating pipelines with subprocess

It is possible to create process pipelines using subprocess.Popen, by just using stdout=subprocess.PIPE and stdin=otherproc.stdout.

Almost.

In a pipeline created in this way, the stdout of all processes except the last is opened twice: once in the script that has run the subprocess and another time in the standard input of the next process in the pipeline.

This is a problem because if a process closes its stdin, the previous process in the pipeline does not get SIGPIPE when trying to write to its stdout, because that pipe is still open on the caller process. If this happens, a wait on that process will hang forever: the child process waits for the parent to read its stdout, the parent process waits for the child process to exit.

The trick is to close the stdout of each process in the pipeline except the last just after creating them:

#!/usr/bin/python
# coding=utf-8

import subprocess

def pipe(*args):
    '''
    Takes as parameters several dicts, each with the same
    parameters passed to popen.

    Runs the various processes in a pipeline, connecting
    the stdout of every process except the last with the
    stdin of the next process.
    '''
    if len(args) < 2:
        raise ValueError, "pipe needs at least 2 processes"
    # Set stdout=PIPE in every subprocess except the last
    for i in args[:-1]:
        i["stdout"] = subprocess.PIPE

    # Runs all subprocesses connecting stdins and stdouts to create the
    # pipeline. Closes stdouts to avoid deadlocks.
    popens = [subprocess.Popen(**args[0])]
    for i in range(1,len(args)):
        args[i]["stdin"] = popens[i-1].stdout
        popens.append(subprocess.Popen(**args[i]))
        popens[i-1].stdout.close()

    # Returns the array of subprocesses just created
    return popens

At this point, it's nice to write a function that waits for the whole pipeline to terminate and returns an array of result codes:

def pipe_wait(popens):
    '''
    Given an array of Popen objects returned by the
    pipe method, wait for all processes to terminate
    and return the array with their return values.
    '''
    results = [0] * len(popens)
    while popens:
        last = popens.pop(-1)
        results[len(popens)] = last.wait()
    return results

And, look and behold, we can now easily run a pipeline and get the return codes of every single process in it:

process1 = dict(args='sleep 1; grep line2 testfile', shell=True)
process2 = dict(args='awk \'{print $3}\'', shell=True)
process3 = dict(args='true', shell=True)
popens = pipe(process1, process2, process3)
result = pipe_wait(popens)
print result

Update: Colin Watson suggests an improvement to compensate for Python's nonstandard SIGPIPE handling.

Colin Watson has a similar library for C.

Posted Wed Jul 1 09:08:06 2009 Tags:

Linking to self in turbogears

I want to put in my master.kid some icons that allow to change the current language for the session.

First, all user-accessible methods need to handle a 'language' parameter:

@expose(template="myapp.templates.foobar")
def index(self, someparam, **kw):
    if 'language' in kw: turbogears.i18n.set_session_locale(kw['language'])

Then, we need a way to edit the current URL so that we can generate modified links to self that preserve the existing path_info and query parameters. In your main controller, add:

def linkself(**kw):
    params = {}
    params.update(cherrypy.request.params)
    params.update(kw)
    url = cherrypy.request.browser_url.split('?', 1)[0]
    return url + '?' + '&'.join(['='.join(x) for x in params.iteritems()])

def add_custom_stdvars(vars):
    return vars.update({"linkself": linkself})

turbogears.view.variable_providers.append(add_custom_stdvars)

(see the turbogears stdvars documentation and the cherrypy request documentation (cherrypy 2 documentation at the bottom of the page))

And finally, in master.kid:

<div id="footer">
  <div id="langselector">
    <span class="language">
      <a href="${tg.linkself(language='it_IT')}">
        <img src="${tg.url('/static/images/it.png')}"/>
      </a>
    </span>

    <span class="language">
      <a href="${tg.linkself(language='C')}">
        <img src="${tg.url('/static/images/en.png')}"/>
      </a>
    </span>
  </div><!-- langselector -->
</div><!-- footer -->
Posted Sat Jun 6 00:57:39 2009 Tags:

Turbogears form quirk

I had a great idea:

@validate(model_form)
@error_handler()
@expose(template='kid:myproject.templates.new')
def new(self, id, tg_errors=None, **kw):
    """Create new records in model"""
    if tg_errors:
        # Ask until there is still something missing
        return dict(record = defaults, form = model_form)
    else:
        # We have everything: save it
        i = Item(**kw)
        flash("Item was successfully created.")
        raise redirect("../show/%d" % i.id)

It was perfect: one simple method, simple error handling, nice helpful messages all around. Except, check boxes and select fields would not get the default values while all other fields would.

After two hours searching and cursing and tracing things into widget code, I found this bit in InputWidget.adjust_value:

# there are some input fields that when nothing is checked/selected
# instead of sending a nice name="" are totally missing from
# input_values, this little workaround let's us manage them nicely
# without interfering with other types of fields, we need this to
# keep track of their empty status otherwise if the form is going to be
# redisplayed for some errors they end up to use their defaults values
# instead of being empty since FE doesn't validate a failing Schema.
# posterity note: this is also why we need if_missing=None in
# validators.Schema, see ticket #696.

So, what is happening here is that since check boxes and option fields don't have a nice behaviour when unselected, turbogears has to work around it. So in order to detect the difference between "I selected 'None'" and "I didn't select anything", it reasons that if the input has been validated, then the user has made some selections, so it defaults to "The user selected 'None'". If the input has not been validated, then we're showing the form for the first time, then a missing value means "Use the default provided".

Since I was doing the validation all the time, this meant that Checkboxes and Select fields would never use the default values.

Hence, if you use those fields then you necessarily need two different controller methods, one to present the form and one to save it:

@expose(template='kid:myproject.templates.new')
def new(self, id, **kw):
    """Create new records in model"""
    return dict(record = defaults(), form = model_form)

@validate(model_form)
@error_handler(new)
@expose()
def savenew(self, id, **kw):
    """Create new records in model"""
    i = Item(**kw)
    flash("Item was successfully created.")
    raise redirect("../show/%d"%i.id)

If someone else stumbles on the same problem, I hope they'll find this post and they won't have to spend another two awful hours tracking it down again.

Posted Sat Jun 6 00:57:39 2009 Tags:

Quirks when overriding SQLObject setters

Let's suppose you have a User that is, optionally, a member of a Company. In SQLObject you model it somehow like this:

    class Company(SQLObject):
        name = UnicodeCol(length=16, alternateID=True, alternateMethodName="by_name")
        display_name = UnicodeCol(length=255)

    class User(InheritableSQLObject):
        company = ForeignKey("Company", notNull=False, cascade='null')

Then you want to implement a user settings interface that uses a Select box to choose the company of the user.

For the Select widget to properly handle the validator for your data, you need to put a number in the first option. As my first option, I want to have the "None" entry, so I decided to use -1 to mean "None".

Now, to make it all blend nicely, I overrode the company setter to accept -1 and silently convert it to a None:

    class User(InheritableSQLObject):
        company = ForeignKey("Company", notNull=False, cascade='null')

        def _set_company(self, id):
            "Set the company id, using None if -1 is given"
            if id == -1: id = None
            self._SO_set_company(id)

In the controller, after parsing and validating all the various keyword arguments, I do something like this:

            user.set(**kw)

Now, the overridden method didn't get called.

After some investigation, and with the help of NandoFlorestan on IRC, we figured out the following things:

  1. That method needs to be rewritten as _set_companyID:

            def _set_companyID(self, id):
                "Set the company id, using None if -1 is given"
                if id == -1: id = None
                self._SO_set_companyID(id)
    
  2. Methods overridden in that way are alsop called by user.set(**kw), but not by the User(**kw) constructor, so using, for example, a similar override to transparently encrypt passwords would give you plaintext passwords for new users and encrypted passwords after they changed it.

Posted Sat Jun 6 00:57:39 2009 Tags:

TurboGears RemoteForm tip

In case your RemoteForm misteriously behaves like a normal HTTP form, refreshing the page on submit, and the only hint that there's something wrong is this bit in the Iceweasel's error console:

Errore: uncaught exception: [Exception... "Component returned failure
code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIXMLHttpRequest.open]"
nsresult: "0x80070057 (NS_ERROR_ILLEGAL_VALUE)"  location: "JS frame ::
javascript: eval(__firebugTemp__); :: anonymous :: line 1"  data: no]

the problem can just be a missing action= attribute to the form.

I found out after:

  1. reading the TurboGears remoteform wiki: "For some reason, the RemoteForm is acting like a regular html form, serving up a new page instead of performing the replacements we're looking for. I'll update this page as soon as I figure out why this is happening."

  2. finding this page on Google and meditating for a while while staring at it. I don't speak German, but often enough I manage to solve problems after meditating over Google results in all sorts of languages unknown or unreadable to me. I will call this practice Webomancy.

Posted Sat Jun 6 00:57:39 2009 Tags:

Turbogears quirks when testing controllers that use SingleSelectField

Suppose you have a User that can be a member of a Company. In SQLObject you model it somehow like this:

    class Company(SQLObject):
        name = UnicodeCol(length=16, alternateID=True, alternateMethodName="by_name")
        display_name = UnicodeCol(length=255)

    class User(InheritableSQLObject):
        company = ForeignKey("Company", notNull=False, cascade='null')

Then you want to make a form that allows to choose what is the company of a user:

def companies():
    return [ [ -1, 'None' ] ] + [ [c.id, c.display_name] for c in Company.select() ]

class NewUserFields(WidgetsList):
    """Fields for editing general settings"""
    user_name = TextField(label="User name")
    companyID = SingleSelectField(label="Company", options=companies)

Ok. Now you want to run tests:

  1. nosetests imports the controller to see if there's any initialisation code.
  2. The NewUserFields class is created.
  3. The SingleSelectField is created.
  4. The SingleSelectField constructor tries to guess the validator and peeks at the first option.
  5. This calls companies.
  6. companies accesses the database.
  7. The testing database has not yet been created because nosetests imported the module before giving the test code a chance to setup the test database.
  8. Bang.

The solution is to add an explicit validator to disable this guessing code that is a source of so many troubles:

class NewUserFields(WidgetsList):
    """Fields for editing general settings"""
    user_name = TextField(label="User name")
    companyID = SingleSelectField(label="Company", options=companies, validator=v.Int(not_empty=True))
Posted Sat Jun 6 00:57:39 2009 Tags:

Turbogears i18n quirks

Collecting strings from .kid files

tg-admin i18n collect won't collect strings from your .kid files: you need the toolbox web interface for that.

Indentation problems in .kid files

The toolbox web interface chokes on intentation errors on your .kid files.

To see the name of the .kid file that causes the error, look at the tg-admin toolbox output in the terminal for lines like Working on app/Foo/templates/bar.kid.

What happens is that the .kid files are converted to python files, and if there are indentation glitches they end up in the python files, and python will complain.

Once you see from the tg-admin toolbox standard error what is the .kid file with the problem, edit it and try to make sure that all closing tags are at the exact indentation level as their coresponding opening tags. Even a single space would matter.

Bad i18n bug in TurboKid versions earlier than 1.0.1

faide on #turbogears also says:

It is of outmost importance that you use TurboKid 1.0.1 because it is the first version that corrects a BIG bug regarding i18n filters ...

The version below had a bug where the filters kept being added at each page load in such a way that after a few hundreds of pages you could have page loading times as long as 5 minutes!

If one has a previous version of TurboKid, one (and only one) of these is needed:

So, in short, all i18n users should upgrade to TurboGears 1.0.2.2 or patch TurboKid using http://trac.turbogears.org/ticket/1301.

Posted Sat Jun 6 00:57:39 2009 Tags:

File downloads with TurboGears

In TurboGears, I had to implement a file download method, but the file required access controls so it was put in a directory not exported by Apache.

In #turbogears I've been pointed at: http://cherrypy.org/wiki/FileDownload and this is everything put together:

from cherrypy.lib.cptools import serveFile
# In cherrypy 3 it should be:
#from cherrypy.lib.static import serve_file

@expose()
def get(self, *args, **kw):
    """Access the file pointed by the given path"""
    pathname = check_auth_and_compute_pathname()
    return serveFile(pathname)

Then I needed to export some CSV:

@expose()
def getcsv(self, *args, **kw):
    """Get the data in CSV format"""
    rows = compute_data_rows()
    headers = compute_headers(rows)
    filename = compute_file_name()

    cherrypy.response.headers['Content-Type'] = "application/x-download"
    cherrypy.response.headers['Content-Disposition'] = 'attachment; filename="'+filename+'"'

    csvdata = StringIO.StringIO()
    writer = csv.writer(csvdata)
    writer.writerow(headers)
    writer.writerows(rows)

    return csvdata.getvalue()

In my case it's not an issue as I can only compute the headers after I computed all the data, but I still have to find out how to serve the CSV file while I'm generating it, instead of storing it all into a big string and returning the big string.

Posted Sat Jun 6 00:57:39 2009 Tags:
Posted Sat Jun 6 00:57:39 2009

Pages related to my visit in Addis Ababa for a Linux training course.

Eight day in Addis

Useful things to keep in mind when setting up a service:

  • always take note of what you do
  • make yourself always able to explain to another person what you did
  • keep a copy of the configuration files before changing them, so that you can see what you changed
  • be always able to move the service to another computer
  • make sure that it works after reboot

Example use of vim block selection:

  • ESC: exits insert mode.
  • ^V: starts block selection. Move the arrows to form a rectangle.
  • c: change. Type the new content for the line.
  • ESC: gets out of insert mode, and the change will happen in all the lines.

To change network configuration with config files, edit:

/etc/network/interfaces

To also setup DNS in /etc/network/interfaces, use dns-search and dns-nameservers (for this to work, you need to have the package resolvconf):

dns-search dream.edu.et
dns-nameservers 192.168.0.1 192.168.0.2

To make a router that connects to the internet on demand using a modem:

apt-get install diald

To see the path of network packets:

mtr 4.2.2.2

Basic NAT script:

OUT=eth2
IN=eth0

modprobe iptable_nat
iptables -t nat -A POSTROUTING -o $OUT -j MASQUERADE
echo 1 > /proc/sys/net/ipv4/ip_forward

What happens at system startup:

  1. the BIOS loads and runs the boot loader
  2. the boot loader loads the kernel and the inintrd ramdisk and runs the kernel
  3. the kernel runs the script 'init' in the initrd ramdisk
  4. the script 'init' mounts the root directory
  5. the script 'init' runs the command /sbin/init in the new root directory
  6. 'init' starts the system with the configuration in /etc/inittab

To install a new startup script:

sudo ln -s /usr/local/sbin/firewall /etc/init.d
sudo update-rc.d firewall defaults 16 75

Normally you can just do:

sudo update-rc.d [servicename] defaults

To have a look at the start and stop order numbers, look at /etc/rc2.d for other start scripts and /etc/rc0.d for other stop scripts

To test a proxy, low level way:

$ telnet proxy 8080
Trying 192.168.0.6...
Connected to proxy.dream.edu.et.
Escape character is '^]'.
GET http://www.google.com HTTP/1.0 [press enter twice]
Posted Sat Jun 6 00:57:39 2009 Tags:

Third day in Addis

Believe it or not, a network that fails often is the best thing to have when you are teaching network troubleshooting.

Various tools useful for networking:

  • ifconfig - configure a network interface
  • dnsmasq - Simple DNS and DHCP server
  • host - DNS lookup utility
  • route - show / manipulate the IP routing table
  • arping - send ARP REQUEST to a neighbour host
  • mii-tool - view, manipulate media-independent interface status (IOW, see if the cable works)
  • nmap - Network exploration tool and security / port scanner

    Examples:

     # Look at what machines are active in the local network:
     nmap -sP 10.5.15.0/24
    
     # Look at what ports are open in a machine:
     nmap 10.5.15.26
    
  • tcpdump - dump traffic on a network

    It can be used to see if there is traffic, and to detect traffic that shouldn't be there.

Useful tip:

    # Convert a unix timestamp to a readable date
    date -d @1152841341

What happens when you browse a web page:

  1. type the address www.google.com in the browser
  2. the browser needs the IP address of the web server:

  3. . look for the DNS address in /etc/resolv.conf (/etc/resolv.conf is created automatically by the DHCP client)

  4. . try all the DNS servers in /etc/resolv.conf until one gives you the IP address of www.google.com
  5. . take the first address that comes from the DNS (in our case was 64.233.167.104)

  6. figure out how to connect to 64.233.167.104:

  7. . consult the routing table to see if it's in the local network:

    1. if it's in the local network, then look for the MAC address (using ARP
      • Address Resolution Protocol)
    2. if it'd not in the local network, then send through the gateway (again using ARP to find the MAC address of the gateway)
  8. Send out the HTTP request to the local web server or through the gateway, using the Ethernet physical protocol, and the MAC address to refer to the other machine.

Troubleshooting network problems:

  1. See if the network driver works:

  2. . With ifconfig, see if you see the HWaddr:. If you do not see it, then the linux driver for the network card is not working. Unfortunately there's no exact way to say that it works perfectly

  3. See if you have an IP address with ifconfig. If you find out that you need to rerun DHCP (for example, if the network cable was disconnected when the system started), then you can do it either by deactivating/reactivating the Ethernet interface using System/Administration/Networking or, on a terminal, running:

    # ifdown eth0
    # ifup eth0
    

    If you don't get an IP, try to see if the DHCP server is reachable by running:

    $ arping -D [address of DHCP server]
    
  4. See if the local physical network works:

  5. . With sudo mii-tool, see if the cable link is ok. If it's not, then it's a problem in the cable or the plugs, or simply the device at the other end of the cable is turned off.

  6. . Try arping or ping -n on a machine in the local network (like the gateway) to see if the local network works.

  7. See if the DNS works:

  8. . Find out the DNS address:

    cat /etc/resolv.conf
    
  9. . If it's local, arping it

  10. . If it's not local, ping -n it
  11. . Try to resolve a famous name using that DNS:

    $ host [name] [IP address of the DNS]
    
  12. . Try to resolve the name of the machine you're trying to connect. If you can resolve a famous name but not the name you need, then it's likely a problem with their DNS.

  13. If you use a proxy, see if the proxy is reachable: check if the proxy name resolves to an IP, if you can ping it, if you can telnet to the proxy address and port:

    $ telnet [proxy address] [proxy port]
    

    you quit telnet with ^]quit.

  14. If you can connect directly to the web server, try to see if it answers:

    $ telnet [address] 80
    

    If you are connected, you can confirm that it's a web server:

    GET / HTTP/1.0 (then Enter twice)
    

    If it's a web server, it should give you something like a webpage or an HTTP redirect.

When you try to setup a service and it doesn't work:

  1. check that it's running:

    $ ps aux | grep dnsmasq
    
  2. check that it's listening on the right port:

    $ sudo netstat -lp
    
  3. check that it's listening from the outside:

    $ nmap [hostname]
    
  4. check for messages in /var/log/daemon.log or /var/log/syslog

  5. check that the configuration is correct and reload or restart the server to make sure it's running with the right configuration:

    # /etc/init.d/dnsmasq restart
    

dnsmasq:

By default: works as a DNS server that serves the data in /etc/hosts.

By default: uses /etc/resolv.conf to find addresses of other DNS to use when a name is not found in /etc/hosts.

To enable the DHCP server, uncomment:

    dhcp-range=192.168.0.50,192.168.0.150,12h

in /etc/dnsmasq.conf and set it to the range of addresses you want to serve. Pay attention to never put two DHCP servers on the same local network, or they will interfere with each others.

To test if the DHCP server is working, use dhcping (not installed by default on Ubuntu).

To communicate other information like DNS, gateway and netmask to the clients, use this piece of dnsmasq.conf:

    # For reference, the common options are:
    # subnet mask - 1
    # default router - 3
    # DNS server - 6
    # broadcast address - 28
    dhcp-option=1,255.255.255.0
    dhcp-option=3,192.168.0.1
    dhcp-option=6,192.168.0.1
    dhcp-option=28,192.168.0.255

Problems found today:

  • changing the name of the local machine in /etc/hosts breaks sudo, and without sudo it's impossible to edit the file. The only way to fix this is a reboot in recovery mode.

  • dhclient -n -w is different than dhclient -nw

Quick start examples with tar:

    # Create an archive
    tar zcvf nmap.tar.gz *.deb

    # Extract an archive
    tar zxvf nmap.tar.gz

    # Look at the contents of an archive
    tar ztvf nmap.tar.gz

Quick & dirty way to send a file between two computers without web server, e-mail, shared disk space or any other infrastructure:

    # To send
    nc -l -p 12345 -q 1 < nmap.tar.gz

    # To receive
    nc 10.5.15.123 12345 > nmap.tar.gz

    # To repeat the send command 20 times
    for i in `seq 1 20`; do nc -l -p 12345 -q 1 < nmap.tar.gz ; done

Update: Javier Fernandez-Sanguino writes:

Your "XXX day in Addis" is certainly good reading, nice to see somebody reviewing common tools from a novice point of view. Some comments:

  • Regarding your comments on how to troubleshoot network connectivity problems I just wanted to point you to the network test script I wrote and submited to the debian-goodies package ages ago. It's available at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=307694 and should do automatically most of the stuff you commented on your blog.

  • Your example to test hosts alive in the network using nmap -sP 10.5.15.0/24 is good. However, newer (v4) versions can do ARP ping in the local network which is much more efficient (some systems might block ICMP outbount), that's the -PR option and should be enabled (by default). See http://www.insecure.org/nmap/man/man-host-discovery.html Also, you might want to add a '-n' there so that nmap does not try to do DNS resolution of the hosts (which might take up some time if your DNS does not include local IPs)

  • tcpdump, it would be wiser to turn novice users to ethereal since it has a much better UI than tcpdump and it is able to dissect (interpret) protocols that tcpdump can't analyse.

  • you are missing arp as a tool in itself, it is useful to debug network issues since if the host is local and does not show up in arp output either a) it's down or b) you don't have proper network connectivity. (If you are missing an ARP entry for your default gateway your setup is broken)

Update: Marius Gedminas writes:

Re: http://www.enricozini.org/blog/eng/third-day-in-addis

In my experience if sudo cannot resolve the hostname (e.g. if you break /etc/hosts), you can still use sudo, but you have to wait something like 30 seconds until the DNS request times out.

I tried to break my /etc/hosts (while keeping a root shell so I can fix it if something goes wrong), but couldn't even get the timeout now. Sudo just said unable to lookup $hostname via gethostbyname() and gave me a root shell.

Posted Sat Jun 6 00:57:39 2009 Tags:

First pratical lesson

Notes after today's training session.

Small index of most used shell commands:

  • ls - list directory contents
  • cp - copy files and directories
  • mv - move (rename) files
  • rm - remove files or directories
  • find - search for files in a directory hierarchy
  • cat - concatenate files and print on the standard output
  • more - file perusal filter for crt viewing
  • less - opposite of more (quit with 'q')
  • cd - Change the current directory to DIR. (use "help cd" instead of "man cd")
  • mkdir - make directories
  • rmdir - remove empty directories

Small index of commands useful for combining in pipelines:

  • grep, egrep, fgrep, rgrep - print lines matching a pattern
  • tail - output the last part of files
  • head - output the first part of files
  • sort - sort lines of text files
  • uniq - report or omit repeated lines
  • sed - stream editor
  • wc - print the number of newlines, words, and bytes in files

Problems found during the lesson:

  • You set the system default locale to Amharic, and the gdm login will be in Amharic input mode. We didn't find out how to switch it back to input roman characters. Right click on the input field to set the input method doesn't work. Since usernames are not in Amharic, you're locked out.
  • So you CTRL+ALT+F1, login and try dpkg-reconfigure locales. On Ubuntu Dapper, it does not work anymore.
  • So you dig and dig and dig and finally find that you can force a locale in /etc/default/gdm (but not in /etc/gdm/locale.conf, nor in /etc/gdm/gdm.conf).
  • Then the internet works for a bit and you look up how to reconfigure locales in Ubuntu. Turns out you have to use localeconf, which is not installed by default, is not in universe and thus not on the CDs, and needs to be downloaded from the Internet.
  • The Ubuntu wiki is all on https, which defeats any attempt of proxy caching.
  • An Internet proxy needs to be configured 3 times: in Gnome, in Firefox and in Synaptic (well, apt). This is especially tricky when you forgot to setup the proxy in Synaptic and seemingly unrelated applications fail, like the Ubuntu language selector, which internally invokes the package manager to download missing langpacks.
  • Some short descriptions in the NAME section of manpages are hard to understand, or wrong. Noted on apt-get, apt-cache and less. Top prize goes to apt-cache:

     NAME
            apt-cache - APT package handling utility -- cache manipulator
     DESCRIPTION
            [...] apt-cache does not manipulate the state of the system but
            does provide operations to search and generate interesting output
            from the package metadata. [...]
    

    So apt-cache is a manipulator that doesn't manipulate. A possible improvement can be "query the APT package cache".

  • The language selector in Ubuntu Breezy doesn't really exit and keeps the package database locked. This seems to be fixed in Dapper, and probably had been fixed in some Breezy update. System updates here are a problem: my Dapper (with some Universe things in it) wanted to download more than 120Mb of data, and the Uni network was giving me 14Kbps. It's been a nice opportunity to teach about fuser -uva and kill.
  • dict, squid and many other packages from 'main' are not on the normal Ubuntu CDs: is there an easy way to build a CD with them? Or do Ubuntu CDs with extra packages already exist? I'll have to find out.
  • cupsys has documentation outside of /usr/share/doc, in /usr/share/cups/doc-root.
  • man works on all commands, except cd, which is an internal shell command and thus needs help instead of man. I should remember to ponder about autogenerating manpages from help output.
  • Is there an index-like manpage with a list of the core Unix commands and their short descriptions? It there's not, it's easy to generate:

     #!/bin/sh
     DIR=${1:-"/bin"}
     (
     find $DIR | while read FILE
     do
         if [ -x $FILE ] && ! [ -d $FILE ]
         then
             LANG=C COLUMNS=2000 man `basename $FILE` | \
                      grep ^SYNOPSIS -B 100 | grep ^NAME -A 100 | \
                      tail -n +2 | head -n +2 | \
                      grep -v '^[ \t]*$'
         fi
     done
     ) | sort | uniq | sed 's/^ \+//'
    

    Try running it on /bin and /sbin: it's great!. Also, since it doesn't redirect stderr, it nicely exposes a number of manpage problems.

Lots of bugs to report when I come home: from here it'll take ages, and lots of money on the hotel internet connection, and some are Ubuntu-specific so I'd need to do everything online with Malone.

As usual, teaching is one of the best ways to find bugs.

I propose an Etch training session a month before release.

Other things to do:

  • Find more info about that Wikipedia live CD with Wikipedia browsable without the Internet.
  • Make a collection of Free technical E-books: even those Indian low-cost book editions are too expensive here, so E-books mean a lot.

Update: Matt Zimmerman writes:

I read your blog entry at http://www.enricozini.org/blog/eng/second-day-in-addis and wanted to respond as follows:

  • localeconf is not the standard way to configure locales in Ubuntu; what documentation told you that? It's an unsupported package from Progeny. If what you wanted was to set the system default locale from the command line, editing /etc/environment is probably the best way.

  • I suggest filing a bug report at <https://launchpad.net/products/ubuntu-website about the HTTPS issue>; I don't think it's necessary for the entire wiki to be HTTPS, only authentication.

  • Synaptic may be able to use the GNOME proxy settings without introducing undesirable dependencies; please file a wishlist bug

  • dict, squid and other packages from main are not on the Ubuntu CDs because there is no space. The DVD contains these packages.

  • The cupsys documentation bug was quite likely inherited from Debian and should be reported there

  • You can file bugs in Malone via email; this has been possible for a long time now. Please don't reinforce this misconception.

    https://help.launchpad.net/UsingMaloneEmail

Update:

Posted Sat Jun 6 00:57:39 2009 Tags:

Tenth day in Addis

Procedure to check if all the services of Dream University are up and running

If a machine blocks pings, use arping instead.

  1. Test DHCP:

    $ sudo ifdown eth0
    $ sudo ifup eth0
    $ ifconfig
    
  2. Test the DNS:

    # See if the DNS machine is on
    # The network
    $ ping -n 192.168.0.1
    
    # See if the DNS resolves names
    $ host www.dream.edu.et
    
  3. Test the gateway:

    # Ping the gateway
    $ ping gateway
    # Ping an outside host
    $ ping -n 10.4.15.6
    
  4. Test the proxy:

    # Ping the proxy
    $ ping proxy
    # Open a web page and see if it displays
    # See if it caches
    http_proxy=http://proxy.dream.edu.et:3030/ wget -S -O/dev/null http://www.enricozini.org  2>&1 | grep X-Cache
    
  5. Test the mail server:

    $ ping smtp
    $ nmap smtp -p 25 |grep 25/tcp
    $ if nmap gateway -p 25 |grep 25/tcp | grep -q open ; then echo "It works"; fi
    $ send a mail and see if you receive it
    

To do more advanced network and service monitoring, try nagios:

New useful tools seen today

wget - The non-interactive network downloader.

Special devices

  • /dev/null:
    • On read, there is no data.
    • On write, discards data.
  • /dev/zero:
    • On read, reads an infininte amount of zero bits.
    • On write, discards data.
  • /dev/random, /dev/urandom
    • On read, reads random bits.
    • On write, discards data.
    • Difference: /dev/random is cryptographically secure, but it can hang waiting for system events

Example uses:

wget -O/dev/null http://www.example.org

dd if=/dev/zero of=testdisk bs=1M count=50
mke2fs testdisk
sudo mount -o loop testdisk  /mnt

Tiny little commands

  • true - do nothing, successfully
  • false - do nothing, unsuccessfully
  • yes - output a string repeatedly until killed

Example uses:

  • while /bin/true; do echo ciao; done
  • Using /bin/false as a shell
  • yes | boring-tool-that-asks-lots-of-silly-questions

Some more shell syntax

  • 2>&1 Redirects the standard error in the standard output
  • 2> Redirects the standard error instead of the standard output

Some people run commands ignoring the standard error: command 2> /dev/null this causes unexpected error messages to go unnoticed: please do not do it.

What to check if a machine is very slow

  • See if the ram is full: $ free If it is, you see what are the fattest programs using top, pressing M to sort by memory usage.
  • See if there are lots of programs competing for CPU: $ top
  • Check if you have I/O bottlenecks: $ vmstat (but I don't know how to read it)
  • For a desktop on older hardware, you can try xubuntu instead of ubuntu

More VIM command mode

Command mode allows to perform various text editing functions.

You work by performing operations on selected blocks of text.

Some common operations:

  • y: copy ("yank")
  • p: paste
  • P: paste before
  • d: cut ("delete")
  • c: change
  • i: insert
  • `a: append
  • .: repeat last operation

Some common blocks:

  • w: word
  • }: paragraph
  • left and right arrow: one character left or right
  • up and down arrow: this line and the one on top or below
  • f letter: from the cursor until the given letter
  • v: selection
  • V: line selection
  • ^V: block selection

Examples:

  • yw: copy word
  • dw: cut word
  • yy: copy line
  • dd: cut line
  • V (select lines) y: copy a selection of lines
  • V (select lines) d: cut a selection of lines
  • p: paste

The best way to learn more vim is always to run vimtutor.

Installing squirrelmail

To install squirrelmail:

  1. apt-get install squirrelmail
  2. /usr/sbin/squirrelmail-config and configure IMAP and SMTP.

    In our case, since we use IMAPS, the IMAP server is imap.dream.edu.et, port 993, secure IMAP enabled and SMTP is smtp.dream.edu.et.

  3. Read /usr/share/doc/squirrelmail/README.Debian.gz (with zless) for how to proceed with setup. A short summary:
    • link /etc/squirrelmail/apache.conf into the apache conf.d directory
    • customise /etc/squirrelmail/apache.conf for example setting up the virtual hosts, or running it only on SSL

To have different virtual hosts over HTTPS, you need to have a different IP for every virtual host: name based virtual hosts do not work on HTTPS.

You can configure multiple IP addresses on the same computer: use network interfaces named: eth0:1, eth0:2, eth0:3... These are called interface aliases.

You cannot setup interface aliases using the graphical network configuration and you need to add them in /etc/network/interfaces:

    iface eth0:1 inet static
          address 192.168.0.201
          netmask 255.255.255.0
          gateway 192.168.0.3
    auto eth0:1

This is the trick commonly used to put different virtual HTTPS hosts on the same computer.

Links

squid documentation:

Shell programming:

Performance analysis:

Setting up mail services:

Posted Sat Jun 6 00:57:39 2009 Tags:

Fifth day in Addis

Samba

To get samba:

    apt-get install samba samba-doc smbclient

To get the Samba Web Administration Tool:

    apt-get install swat netkit-inetd

The configuration is in /etc/samba:

  • One [global] section with the general settings
  • One section per share

One could use swat at http://localhost:901/ but it does not work easily on Ubuntu.

To see what is shared:

    smbclient -L localhost

To access a share:

    smbclient //localhost/name-of-the-share

To add a new user:

    sudo smbpasswd -a username

To change the password of a user:

    sudo smbpasswd username

To test accessing a share as a user:

    smbclient //localhost/web -U yared

Documentation:

    man smb.conf

To force the user or group used to access a share:

    force user = enrico
    force group = www-data

To set the unix permissions for every created file:

    # For files
    create mask = 0664
    # For directories
    directory mask = 0775

Example share configuration for a webspace:

    mkdir /var/www/public
    chgrp www-data /var/www/public
    chmod 0775 /var/www/public

Then, in /etc/samba/smb.conf:

    [web]
       comment = Webspace
       path = /var/www
       writable = yes
       public = no
       force group = www-data
       create mask = 0664
       directory mask = 0775

Example share configuration for a read only directory where only a limited group of people can write:

    [documents]
       comment = Documents
       path = /home/enrico/Desktop/documents
       force user = enrico
       public = yes
       writable = no
       write list = enrico, yared

Print server (CUPS)

Installation:

    apt-get install cupsys

Configuration:

  • On the web (not enabled in Ubuntu):

     http://localhost:631/
    
  • On the desktop:

     System/Administration/Printing
    

Example IPP URIs:

    ipp://server[:port]/printers/queue
    http://server:631/printers/queue
    ipp://server[:port]/...

For example:

    ipp://server/printers/laserjet

"This printer uri scheme can be used to contact local or remote print services to address a particular queue on the named host in the uri. The "ipp" uri scheme is specified in the Internet Print Protocol specifications and is actually much more free form that listed above. All Solaris and CUPS based print queues will be accessed using the formats listed above. Access to print queues on other IPP based print servers requires use of the server supported ipp uri format. Generally, it will be one of the formats listed above."

LDAP Lightweight Directory Access Protocol

Installation:

    apt-get install ldap-utils slapd

The configuration is in /etc/ldap.

To access a ldap server:

    apt-get install gq

Various LDAP HOWTOs:

GRUB

The configuration file is in /boot/grub/menu.lst.

The documentation can be accessed as info grub after installing the package grub-doc.

Quick list of keys for info:

  • arrows: move around
  • enter: enters a section
  • l: goes back
  • u: goes up one node
  • q: quit
  • /: search

Grub trick to have a memory checker:

  1. apt-get install memtest86+
  2. Add this to /boot/grub/menu.lst:

    title Memory test
        root (hd0,5)
        kernel /boot/memtest86+.bin
    

Firewall

With iptables:

    man iptables
    # Only allow in input the network packets
    # that are going to the web server
    iptables -P INPUT DROP
    iptables -A INPUT --protocol tcp --destination port 80 -j ACCEPT
    # To reset the input chain as the default
    iptables -F INPUT
    iptables -P INPUT ACCEPT

Some links:

Squid

Installation:

    apt-get install squid

The configuration is in /etc/squid/squid.conf.

To allow the local network to use the proxy:

    # Add this before "http_access deny all"
    acl our_networks src 10.4.15.0/24
    http_access allow our_networks

To use a parent proxy:

    cache_peer proxy.aau.edu.et     parent    8080  0  proxy-only no-query

Pay attention because /var/spool/squid will grow as the cache is used. The maximum cache size is set in the directive cache_dir.

Information about squid access control is at http://www.squid-cache.org/Doc/FAQ/FAQ-10.html

To check that the configuration has no syntactic errors: squid -k parse.

To match urls:

    acl forbiddensites url_regex [-i] regexp

For info about regular expressions:

    man regex

Example filtering by regular expression:

    acl skype url_regex -i [^A-Za-z]skype[^A-Za-z]
    http_access deny skype

Transparent proxy setup: http://www.tldp.org/HOWTO/TransparentProxy.html

Problems found today

Hiccups of the day:

  • swat does not run on Ubuntu because Ubuntu does not have inetd
  • swat does not allow root login on Ubuntu because root does not have a password
  • smbpasswd -a does not seem to update the timestamp of /var/lib/samba/passwd.tdb
  • cups web admin does not work on Ubuntu
  • LDAP is still not so intuitive to set up

Update: Marius Gedminas writes:

I think it would be a good idea to mention that running

     iptables -P INPUT DROP

in the shell is a Bad Idea if you're logged in remotely via SSH.

Posted Sat Jun 6 00:57:39 2009 Tags:

Etiopia

È interessante, bello e triste allo stesso tempo trovarsi a ridefinire il significato di "Abissinia". E maledire che per i primi 30 anni della tua vita, quella parola l'hai sentita soltanto quando uno stronzo cantava "Faccetta nera".

Posted Sat Jun 6 00:57:39 2009 Tags:

First day in Addis

First day in Addis Ababa, after the introductory session for this 10 days Linux training.

Interesting new quotes I picked up from the excellent presentation of Dr. Dawit:

Much that I bound I could not free Much that I freed returned to me

(I didn't manage to transcribe the attribution)

And this one for Bubulle, about translation:

When you speak to me in my language you speak to my heart when you speak to me in English you speak to my head

(sb.)

Incomplete list of questions I've been asked, in bogosort -n order:

  • How do I get support?
  • Are the configuration files always the same accross different distributions?
  • What is the level of interoperatibility between the various Linux distributions? And between different Unix-like systems?
  • Does plug and play work well when I change hardware?
  • Can I access NTFS partitions?
  • How do I play multimedia files in restricted formats?
  • I heard that NFS has security problems: can it be secured, or are there other file sharing alternatives?
  • Can I access a desktop remotely?
  • Can I install Linux on a computer where there's Windows already? Do I need to partition?
  • Can I be sure to find drivers for my hardware?

I'm happy to find that we've been successful in building more and more good answers for these questions.

Posted Sat Jun 6 00:57:39 2009 Tags:

Fourth day in Addis

Unix file permissions:

    drwxr-xr-x   2 root root    38 2006-07-14 
    |
    +- Is a directory

    drwxr-xr-x   2 root root    38 2006-07-14 
     ---
      |
      +- User permissions (u)

    drwxr-xr-x   2 root root    38 2006-07-14 
        ---
         |
         +- Group permissions (g)

    drwxr-xr-x   2 root root    38 2006-07-14 
           ---
            |
            +- Permissions for others (o)

    drwxr-xr-x   2 root root    38 2006-07-14 
                   ----
                    |
                    +- Owner user

    drwxr-xr-x   2 root root    38 2006-07-14 
                        ----
                         |
            Owner group -+

Other bits:

  • 4000 Set user ID:

    • For executable files: run as the user who owns the file, instead of the user who runs the file
    • For directories: I think it's not used
  • 2000 Set group ID:

    • For executable files: run as the group who owns the file, instead of the group of the user who runs the file
    • For directories: when a file is created inside the directory, it belongs to the group of the directory instead of the default group of the user who created the file
  • 1000 Sticky bit:

    • For files: I think it's not used anymore
    • For directories: only the owner of a file can delete or rename the file

The executable bit for directories means "can access the files in the directory".

If a directory is readable but not executable, then I can see the list of files (with ls) but I cannot access the files.

To access a file, all the directories of its path up to / need to be executable.

Commands to manipulate permissions:

  • chown - change file owner and group
  • chgrp - change group ownership
  • chmod - change file access permissions

  • sudo adduser enrico www-data adds the user enrico to the group www-data.

Example setup for a website for students:

    # Create the group 'students'
    mkdir /var/www/students
    chgrp students /var/www/students
    chmod 2775 /var/www/students

    # If you don't want other users to read the files of the students:

    chmod 2770 /var/www/students
    adduser www-data students
     (this way the web server can read the
      pages)

    # when you add a user to a group, it does not affect running processes:

     - users need to log out and in again
     - servers need to be restarted

Apache:

  • To install apache2 without a graphical interface:

     apt-cache search apache2 | less
     sudo apt-get install apache2
    
  • By default, /var/www is where is the static website.

  • By default, ~/public_html is the personal webspace for every user, accessible as: http://localhost/~user

  • By default, /usr/lib/cgi-bin contains scripts that are executed when someone browses http://website/cgi-bin/script

  • By default, apache reads the server name from the DNS. If we don't have a name in the DNS and we want to use the IP, we need to set:

     ServerName 10.4.15.158
    

    in /etc/apache/apache2.conf (set it to your IP address)

  • To access the Apache manual: http://localhost/doc/apache2-doc/manual/

  • http://localhost/doc/apache2-doc/manual/mod/mod_access.html The access control module

  • http://localhost/doc/apache2-doc/manual/mod/mod_auth.html The user authentication module

  • To edit a user password file, use:

     htpasswd - Manage user files for basic authentication
    
  • Example .htaccess file to password protect a directory:

     AuthUserFile /etc/apache2/students
     AuthType Basic
     AuthName "Students"
     Require valid-user
    
  • Information about .htaccess is in http://localhost/doc/apache2-doc/manual/howto/htaccess.html

  • If you need to tell apache to listen on different ports, add a Listen directive to /etc/apache2/ports.conf. Then you can use:

     <VirtualHost www.training.aau.edu.et:9000>
     [...]
     </VirtualHost>
    
  • To setup an HTTPS website:

    • Documentation is in http://localhost/doc/apache2-doc/manual/ssl/
    • How to create a certificate: http://www.tc.umn.edu/~brams006/selfsign.html

    • Create a certificate:

      /usr/sbin/apache2-ssl-certificate -days 365

    • Create a virtual host on port 443:

      [...]

    • Enable SSL in the VirtualHost:

      SSLEngine On SSLCertificateFile /etc/apache2/ssl/apache.pem

    • Enable listening on the HTTPS port (/etc/apache2/ports.conf):

      Listen 443

Apache troubleshooting:

  • check that there are no errors in the configuration file:

     apache2ctl configtest
    

    This it is always a good thing to do before restarting or reloading apache.

  • read logs in /var/log/apache2/

  • if you made a change but you don't see it on the web, it can be that you have the old page in the cache of the browser: try reloading a few times.

To install PHP

  • apt-get install libapache2-module-php5
  • then by default, every file .php is executed as php code
  • Small but useful test php file:

     <? phpinfo() ?>
    

To install MySQL

  • apt-get install mysql-client mysql-server
  • for administration run mysql as root:

    • Create a database with:

      create database students

  • Give a user access to the database:

     # Without password
     grant all on students.* to enrico;
    
     # With password
     grant all on students.* to enrico identified by "SECRET";
    
  • More information can be found at http://www-css.fnal.gov/dsg/external/freeware/mysqlAdmin.html

To use MySQL from PHP:

    apt-get install php5-mysqli php5-mysql

Problems found today:

  • the apache2 manual in /usr/share/doc/manual can only be viewed using apache because it uses MultiView. So you need to have a working apache to read how to have a working apache.

  • chmod does not have examples in the manpage.

Posted Sat Jun 6 00:57:39 2009 Tags:

Seventh day in Addis

Setting up a mail server

Background

Some terminology:

  • MTA: Mail Transport Agent
  • MUA: Mail User Agent
  • MDA: Mail Delivery Agent
  • SMTP: Simple Mail Transfer Protocol
  • MX: Mail eXchange
  • POP: Post Office Protocol
  • IMAP: Internet Message Access Protocol

With SMTP you connect to a server and send two things: envelope and message.

The envelope looks like this:

MAIL FROM: <enrico@enricozini.org>
RCPT TO: <rms@fsf.org>
RCPT TO: <linus@linux.org>

The message looks like this:

From: <enrico@enricozini.org>
To: <rms@fsf.org>
Cc: <linus@linux.org>
Message-ID: <1234567@enricozini.org>
Subject: Test mail

Hi Richard,

this is a test mail.  I'm also writing
Linus to show how to send to more people.

Cheers,

Enrico

There is no authentication.

There is no encryption.

Two usual types of access control:

  1. Outbound e-mail is normally only accepted from an internal network
  2. Inbound e-mail is normally accepted from anywhere

The DNS is used to find the SMTP server to use to send a message:

$ host -t MX yahoo.com
yahoo.com MX 10 smtp1.yahoo.com
yahoo.com MX 20 smtp2.yahoo.com
yahoo.com MX 20 smtp3.yahoo.com

The process of sending an E-Mail:

  1. Enrico writes an E-Mail:

    From: Enrico Zini <enrico@enricozini.org>
    To: Richard Stallman <rms@fsf.org>
    Subject: Hello from Addis
    
    Hi Richard,
    
    Addis is a wonderful city, even if
    it rains a lot.
    
    Bye,  Enrico
    
  2. Enrico's MUA connects to the SMTP server (for example, port 25 of smtp.aau.edu.et):

    HELO enricozini.org
    200 OK Hello enricozini.org
    MAIL FROM: <enrico@enricozini.org>
    200 OK Mail from enrico@enricozini.org
    RCPT TO: <rms@fsf.org>
    

    Here, the SMTP server performs relay control: "do we relay mail to rms@fsf.org?":

    • Outbound e-mail is normally only accepted from an internal network
    • Inbound e-mail is normally accepted from anywhere

    A target address could be refused:

    413 ERR I don't relay for rms@fsf.org
    

    In this case, the destination is not local but the recipient is accepted because I'm inside the local network:

    200 OK Destination rms@fsf.org
    DATA
    200 OK Please send message body
    From: Enrico Zini <enrico@enricozini.org>
    To: Richard Stallman <rms@fsf.org>
    Subject: Hello from Addis
    Date: Mon, 17 Jul 2006 09:49:45 +0300
    Message-ID: <124372643@enricozini.org>
    
    Hi Richard,
    
    Addis is a wonderful city, even if
    it rains a lot.
    
    Bye,  Enrico
    .
    200 OK Message accepted
    QUIT
    200 OK Bye.
    
  3. The SMTP server needs to find out where to send the message, using the DNS:

    $ host -t MX fsf.org
    fsf.org MX 10 mail.fsf.org
    fsf.org MX 20 mail.gnu.org
    
  4. So the SMTP server tries the first one and connects to port 25 of mail.fsf.org:

    HELO smtp.aau.edu.et
    200 OK Hello smtp.aau.edu.et
    MAIL FROM: <enrico@enricozini.org>
    200 OK Mail from enrico@enricozini.org
    RCPT TO: <rms@fsf.org>
    

    The destination is accepted because it's for a local user::

    200 OK Destination rms@fsf.org
    DATA
    200 OK Please send message body
    From: Enrico Zini <enrico@enricozini.org>
    To: Richard Stallman <rms@fsf.org>
    Subject: Hello from Addis
    Date: Mon, 17 Jul 2006 09:49:45 +0300
    Message-ID: <124372643@enricozini.org>
    Received: by mail.aau.edu.et
      on Mon, 17 Jul 2006 09:55:53 +0300
      from 10.4.15.158
    
    Hi Richard,
    
    Addis is a wonderful city, even if
    it rains a lot.
    
    Bye,  Enrico
    .
    200 OK Message accepted
    QUIT
    200 OK Bye.
    
  5. Now, mail.fsf.org will invoke a MDA to write the mail in Richard Stallman's mailbox.

Example of problems with mail handling:

  • Accepting inbound connections:
    • Malicious input:
      • logic errors
      • buffer overflows
      • DoS (Denial Of Service) attacks
      • Connection floods
  • Performing outbound connections:
    • Programming errors:
      • Flooding of connections
  • Performing routing:
    • Unauthorised relays
    • Mail loops
  • Writing to the local hard drive:
    • Filling up the hard drive
    • Writing to the wrong files
  • Writing to the local hard drive as root:
    • In case of error or attack, any file in the system can potentially be compromised

RFC-822 is the original standard for E-mail. RFCs are standard Internet documents. Have a look at RFC documents released the 1st of April.

postfix

Common setup: "Internet site with smarthost".

More difficult to maintain: "Internet site".

A smarthost is a machine that will relay e-mail for you.

Questions asked with "Internet site with smarthost":

  • Mail name: aau.edu.et (name used to publicly identify the mail server)
  • Smarthost name: smtp.telecom.net.et (SMTP server that will relay our e-mail)

To test a mail server::

$ telnet localhost 25
HELO me
MAIL FROM: <a@b.c>
RCPT TO: <mail@of.a.local.user>
DATA

hi
.
QUIT

By default, you find locally delivered mail in /var/mail/username.

Postfix configuration files:

  • /etc/postfix/master.cf: configures how all the postfix components run together (man 5 master)
  • /etc/postfix/main.cf: Main postfix configuration (man 5 postconf)

To rewrite addresses:

  1. In /etc/postfix/main.cf::

    canonical_maps = hash:/etc/postfix/canonical
    
  2. Then in /etc/postfix/canonical you can add the rewrite rules, like::

    enrico   enrico@enricozini.org
    
  3. When /etc/postfix/canonical is modified you need to regenerate the index::

    sudo postmap canonical
    

    (same is when you change the alias file: sudo postalias /etc/aliases)

(see file:///usr/share/doc/postfix/html/ADDRESS_REWRITING_README.html)

Manipulating the message queue:

mailq - List the mail queue.

Example::

    mailq

postqueue - Postfix queue control

Examples::

    # Like mailq
    postqueue -p

    # Tries to send every message in the queue
    postqueue -f

    # Tries to send every message in the queue for that site
    postqueue -s site

postsuper - Postfix superintendent

Examples::

    # Deletes one message
    sudo postsuper -d 7C4D2EC0F5D

    # Deletes all messages held in the queue for later delivery
    sudo postsuper -d ALL deferred

Different mail queues in postfix:

  • incoming: mail who just entered the system
  • active: mail to be delivered
  • deferred: mail to be delivered later because there were problems
  • hold: mail that should not be delivered until released from hold

Mail logs are in::

/var/log/mail.log
/var/log/mail.err
/var/log/mail.info
/var/log/mail.warn

Mail delivery

Mailbox formats:

  • mbox: single file, mail separated by "From " lines
  • maildir: one directory per folder, one file per mail
  • mh: similar to maildir, but not really used

Alternate MDA: procmail: allows to filter mail automatically into different folders.

Mail forwarding with ~/.forward: allows to redirect mail to a different address: just put the address you want to send to in the file ~/.forward.

POP or IMAP server

Installation:

apt-get install dovecot

Configuration is in::

/etc/dovecot/dovecot.conf

The main thing that is needed is to enable the mail protocols you want::

protocols = imaps

Server monitoring

To make all sorts of graphs::

apt-get install munin munin-node

Example: http://munin.ping.uio.no

To compute more statistics:

  • anteater
  • isoqlog
  • mailgraph

Monitor system logs: logcheck:

  • sends you mail with abnormal log lines
  • It's important to customize what is normal and you do it with regular expressions

Filtering viruses and spam

clamav - Virus scanner

Virus scanning:

  • Postfix gives the mail to clamav that scans it and gives it back if it's clean.
  • Strategies for infected mail:
    • silently delete it
    • refuse the mail and send a notification to the sender
    • refuse the mail and send a notification to the receiver
    • quarantine the e-mail
    • refuse delivery with a SMTP error
    • deliver with an extra header that says that it's a virus

spamassassin - Spam filter

Spam scanning:

  • Postfix gives the mail to spamd that scans it and gives it back with some spam information.
  • Strategies for spam mail:
    • silently delete it
    • refuse the mail and send a notification to the sender
    • refuse the mail and send a notification to the receiver
    • quarantine the e-mail
    • refuse delivery with a SMTP error
    • deliver with an extra header that says that it's spam
  • New techniques:
    • greylisting: when you receive a mail from a host you've never seen before, refuse it with a temporary error, and accept it the second time (after some time delay). Spammers normally don't retry, and implementing retry would increase their cost of sending e-mail.
    • crossassassin: if more than some amount of your users receive a mail with the same message ID, throw it away. Sending mails with different headers would increase the cost of sending e-mail.

Man pages and sections

Man pages are divided in sections:

  • man man shows all the sections of the manpages
  • man 5 postconf shows the postconf manpage in the "configuration file" section
  • Normally manpages are referred as manpage(section) (e.g. postconf(5) )

Authentication and encryption with SMTP (update by Marius Gedminas)

You can have authentication and encryption with SMTP:

Cheat sheet

Setting up the client (I assume Ubuntu)

  # vi /etc/postfix/main.cf

      relayhost = [hostname.of.your.ISPs.smtp.server]
      smtp_use_tls = yes
      smtp_enforce_tls = yes
      smtp_tls_enforce_peername = no
      smtp_sasl_auth_enable = yes
      smtp_sasl_password_maps = hash:/etc/postfix/smtp_auth
      smtp_sasl_security_options = noanonymous

  # vi /etc/postfix/smtp_auth

      [hostname.of.your.ISPs.smtp.server] username:password

  # chmod 600 /etc/postfix/smtp_auth
  # postmap /etc/postfix/smtp_auth
  # postfix reload

(It would be a good idea to make the client verify the server's certificate to prevent man-in-the-middle attacks, but I haven't figured out that part yet...)

Setting up the server

  # apt-get install sasl2-bin libsasl2-modules
  # saslpasswd2 -u hostname.of.the.server -c username1
  # saslpasswd2 -u hostname.of.the.server -c username2
  ...

        these commands create /etc/sasldb2

  # echo "pwcheck_method: auxprop" > /etc/postfix/sasl/smtpd.conf
  # touch /var/spool/postfix/etc/sasldb2
  # echo mount --bind /etc/sasldb2 /var/spool/postfix/etc/sasldb2 \
          > /etc/init.d/local-sasl-for-postfix
  # chmod +x /etc/init.d/local-sasl-for-postfix
  # ln -s ../init.d/local-sasl-for-postfix /etc/rc2.d/S19local-sasl-for-postfix
  # /etc/init.d/local-sasl-for-postfix
  # adduser postfix sasl

        these commands let postfix (which runs chrooted) access /etc/salsdb2

  # cd /etc/postfix
  # openssl req -new -outform PEM -out smtpd.cert -newkey rsa:2048 -nodes \
            -keyout smtpd.key -keyform PEM -days 365 -x509
  # chmod 600 smtpd.key

        these commands create a self-signed SSL certificate

  # vi main.cf

      smtpd_sasl_auth_enable = yes
      broken_sasl_auth_clients = yes
      smtpd_sasl_local_domain = hostname.of.the.server
      smtpd_recipient_restrictions = permit_mynetworks,
                                     permit_sasl_authenticated,
                                     reject_unauth_destination
      smtpd_use_tls = yes
      smtpd_tls_cert_file = /etc/postfix/smtpd.cert
      smtpd_tls_key_file = /etc/postfix/smtpd.key

  # /etc/init.d/postfix restart
Posted Sat Jun 6 00:57:39 2009 Tags:

Addis course Tasks & Skills questions

  • What does the command find /etc | less do?

  • What does the command ps aux do?

  • What does the command mii-tool do and when would you use it?

  • What does the command host www.google.com do?

  • How do you get the MAC address of your computer?

  • What can you use dnsmasq for?

  • What is in /etc/dnsmasq.conf?

  • What is the use of the dhcp-option configuration parameter of /etc/dnsmasq.conf?

  • What is the difference between chown, chgrp and chmod?

  • What would you use nmap for?

  • How do you check to see if a network service is running on your computer?

  • What does apache2ctl configtest do? When should you run it?

  • Consider this piece of configuration of apache:

     AuthUserFile /etc/apache2/students
     AuthType Basic
     AuthName "Students"
     Require valid-user
    

    What does it do?

    What command would you use to add a new username and password to /etc/apache2/students? (you can write the entire commandline if you know it, but just the name of the command is fine)

  • You created the configuration for a new apache site in /etc/apache2/sites-available. How do you activate the new site?

  • When do you need to add the line Listen 443 to /etc/apache2/ports.conf?

  • What do you normally find in /var/log/syslog, and when would you read it?

  • What does the command smbclient //localhost/web do?

  • What does the command sudo smbpasswd -a enrico do?

  • Where do you look for the explanation of the many directives found in /etc/samba/smb.conf?

  • What is the purpose of the package cupsys?

  • What is the purpose of the command iptables?

  • What is the difference between MDA, MTA and MUA?

  • In a normal mail server configuration, when should you accept a mail coming from outside your local network?

  • Suppose you are a mail software and you need to send a mail to addis@yahoo.com: how do you find out the internet host to which you should connect to send the mail?

  • What is the difference between man 5 postconf and man 8 postconf?

  • What is the different use of SMTP and IMAP?

  • What is a "smarthost" in the context of mail server configuration?

  • What does the command mailq do?

  • What does the command sudo postsuper -d ALL deferred do?

  • Postfix has four mail queues: "incoming", "active", "deferred" and "hold". What is the difference among them?

  • What does the package dovecot do?

  • In the file /etc/dovecot/dovecot.conf, what is the difference between having protocols = imap and protocols = imaps?

  • What happens if I put the line enrico@enricozini.org in the file /home/enrico/.forward?

  • Consider this list of possible strategies for handling mail classified as spam:

    • silently delete it
    • refuse the mail and send a notification to the sender
    • refuse the mail and send a notification to the receiver
    • quarantine the e-mail
    • refuse delivery with a SMTP error
    • deliver with an extra header that says that it's spam

    What are their advantages and disadvantages?

Posted Sat Jun 6 00:57:39 2009 Tags:
Posted Sat Jun 6 00:57:39 2009

Food and recipes.

Coppone e spinaci all'orientale

Ingredienti:

  • una bistecca di coppone
  • spinaci surgelati
  • olio
  • aglio
  • zenzero
  • peperoncino
  • anice stellato
  • salsa di soia
  • olio di sesamo tostato
  • pepe

Al supermercato hanno spesso delle bistecchine di coppone in sconto. Sono ottime sulla griglia, ma in mancanza di griglia una volta ho improvvisato questo, e ogni tanto lo rifaccio. È una cena velocissima che si può preparare quando non c'è niente in casa con ingredienti presi fuori dal freezer.

Soffriggere nell'olio l'aglio, lo zenzero e l'anice stellato.

Aggiungere il coppone tagliato a pezzetti e farlo rosolare. Mentre sta cuocendo, aggiungere il peperoncino sbriciolato, l'olio di sesamo e un po' di salsa di soia.

Quando la carne ha preso colore, aggiungere gli spinaci scongelati e rosolarli assieme alla carne e al suo sugo.

Regolare di sale con la salsa di soia e spolverare di pepe macinato prima di servire.

Posted Sat Mar 9 18:42:22 2013 Tags:

Besciamella al caffè e acciughe

Ingredienti:

Ispirato da un soufflé di broccoli al caffè mangiato alla meravigliosa trattoria Antichi Sapori a Parma, ho provato anch'io a combinare caffè e broccoli.

L'idea era fare una salsa da versare sui broccoli appena lessati. Lo chef Davide Sensi aveva parlato di caffè e acciughe, quindi ho deciso che il sapore della salsa dovrà venire da caffè e salsa di pesce thai. Per addensarli, ci potrebbe stare anche solo un classico roux.

Il risultato, una besciamella in cui il roux non è allungato con latte, ma con caffè, salsa di pesce thai e acqua di cottura dei broccoli.

La prima prova è venuta un po' troppo carica di caffè. Sui broccoli però ci sta benino.

Posted Fri Mar 8 19:17:04 2013 Tags:

Aubergine soup

I, too, have been guilty of discovering a shrivelled aubergine in the bottom of the fridge, and I think I improved on the recipe a bit.

First I softened the onion in butter, then I added the crushed garlic, the aubergine peeled and diced, and the cumin seeds. I let them all roast in the pan for a while, until the aubergines took some colour, then I added the stock. Carefully, as the pan with the roasting aubergines is far above 100°C and the first splash of water turns into steam very quickly.

When it was all soft and yummy, I added two spoonfuls of tahini, as a thickener, a generous grating of nutmeg, and blended the lot.

What came out is basically a soup version of baba ganoush, and it is yummy!

Posted Thu Nov 8 21:15:29 2012 Tags:

Spaghetti con friggitelli e mozzarella

Dosi per 4 persone:

  • 300 gr di spaghetti
  • 300 gr di friggitelli
  • 70 gr di mollica di pane
  • 125 gr di mozzarella
  • parmigiano grattugiato q.b.
  • 4 foglioline di menta
  • olio extra-vergine q.b.
  • sale & pepe

Laviamo i peperoncini, togliamo i semini e il picciolo, asciughiamoli e tagliamoli a striscette.

Tritiamo nel mixer la mollica di pane e doriamola in padella con 3 cucchiai di l'olio, finchè non diventerà croccanate e, mettiamolo da parte.

Tagliamo la mozzarella a dadini e teniamo anch'essa da parte.

Scaldiamo altri 5 cucchiai di olio, uniamo i peperoncini e facciamoli cuocere a fiamma viva per 10 minuti, regolandoli di sale e pepe.

Cuociamo la pasta, scoliamola e ripassiamola in padella con i peperoncini.

Facciamo saltare il tutto a fiamma vivace per qualche minuto.

Aggiungiamo la mozzarella a dadini, il pane croccante e il parmigiano.

Come tocco finale, uniamo la menta spezzettata e serviamo.

(via http://friggitelli.it/)

Fatta oggi, buona. Per le briciole di pane ho usato un avanzo della farcitura dei carciofi alla romana di ieri sera.

Con l'acqua di cottura dei carciofi, stasera risotto.

Posted Mon Apr 30 16:08:55 2012 Tags:

Risotto ai funghi (e un po' di banana)

Avevo voglia di sperimentare, e in casa avevo delle banane. Cosa ci si può fare, con delle banane?

Entra in gioco http://www.foodpairing.be/

Questo sito raggruppa vari ingredienti in base alla comunanza delle sostanze chimiche che gli danno il sapore. E chi c'è vicino alla banana? I FUNGHI!

Facciamo quindi un risotto coi funghi: solito fondo di cipolla soffritta nel burro finché non diventa trasparente, e poi giú un pezzetto di banana tagliato a pezzettini sottili, a soffriggere anche lui e a caramellarsi un po'. Infine, qualche pezzetto di porcino secco rinvenuto in acqua tiepida.

Aggiungiamo poi il riso, lasciamolo soffriggere anche lui un po' nell'intingolo, e poi allunghiamo col brodo (io avevo un dado apposta per il risotto ai funghi comprato nel vicino negozietto di cose belle).

Niente sale, pepe, burro per mantecare, niente. Una volta cotto, l'ho solo lasciato a riposare per 5 minuti.

Il risultato è stato delizioso. "Ci hai messo la panna? Come fa a essere cosí cremoso?". Saporito ma non dolce. E la banana si sente che c'è, ma non si sente che è banana.

Da oggi mi sa che nel risotto ai funghi il mio ingrediente segreto sarà un pezzetto di banana.

Da notare che, come si legge in http://khymos.org/pairings.php, la banana sta anche bene col prezzemolo, e cosí i funghi. Mi son scordato il prezzemolo nel risotto in questo esperimento, ma ci sarà nel prossimo: ce l'abbiamo anche fresco in giardino. La stessa pagina parla anche di un probabile abbinamento molecolare cacao-funghi... chissà.

Pagine collegate:

Posted Sat Jun 6 00:57:39 2009 Tags:

Fagottini di pollo agli spinaci

È da tempo che cerco di capire come cucinare una buona bistecca, e finalmente ho trovato un sito di cucina che parla la mia lingua.

Giochiamo quindi con la Reazione di Maillard. Dopo un discreto successo con una bistecchina da quattro soldi, è venuto il momento di cimentarsi col pollo, che è l'unica carne che piace alla morosa.

La carne di pollo ha proteine, ma non abbastanza zuccheri perché avvenga la reazione di Maillard. Ergo, mariniamo la carne in qualcosa che contenga zuccheri.

Guggolando "pollo" e "marinata", esce questa bella ricetta "Petti di pollo ripieni al miele e aceto balsamico". La ricetta dice: "rosolatevi il pollo (3-4 minuti per lato, a fuoco medio)", ma io Maillard lo volevo guardare negli occhi e "fuoco medio" non mi bastava, e poi in Inghilterra non si trova lo speck, e io in casa avevo degli spinaci e non dell'"insalata Tatsoi", ergo, ho pistolato la ricetta come mio solito:

Arrosto di pollo ripieno di spinaci

Ingredienti:

  • 3 petti di pollo a fette
  • 3 fette di bacon magro, senza cotenna (siamo in Inghilterra...)
  • 6-7 cubetti di spinaci surgelati
  • 2 cucchiai di miele (meglio se d’Acacia)
  • 2 cucchiai di aceto balsamico
  • 1 cucchiaio di salsa di soia
  • 1 cipollotto
  • peperoncino secco sminuzzato
  • olio, sale, pepe nero
  • vari spicchi di aglio

Ho scongelato gli spinaci in un tegamino a fuoco basso, assieme a 4 o 5 spicchietti d'aglio schiacciati.

Nel frattempo ho fatto la marinata con miele, aceto balsamico, salsa di soia, il cipollotto tagliato finemente, uno spicchio o due d'aglio schiacciato, il peperoncino, due cucchiai d'olio, sale, pepe.

Ho poi spiattellato un po' i petti di pollo, ci ho messo sopra una fetta di pancetta, poi ho intonacato con uno strato di spinaci, arrotolato il tutto e legato con lo spago.

Pronti i fagotti li ho messi a mollo nella marinata. Li ho lasciati lí una buona mezz'oretta poi li ho girati e li ho lasciati lí un'altra mezz'oretta, in modo che si impregnassero e colorassero bene.

A questo punto, ho dato la molla al forno a 180° (per dopo) e ho messo sul fuoco una padella (io ho usato il wok antiaderente) con un pochino d'olio d'oliva.

Mi sono assicurato di non far danni con la fiamma vivace: la reazione di Maillard avviene oltre i 140°, il punto di fumo dell'olio d'oliva è dai 190° ai 240°, e quello del teflon dell'antiaderente è di 300°, quindi i margini ci sono.

Fiamma alta, olio caldo, giú il primo fagotto di pollo: due minuti per lato, col tegame coperto per limitare i danni degli schizzi. Ogni fagotto fatto da entrambi i lati l'ho poi messo in una pirofila, ci ho versato sopra un filo d'olio e ho messo tutto in forno per una 20ina di minuti, per stare nel sicuro perché, seppure a fiamma alta, 2 minuti per lato non mi sembravano abbastanza per cuocere il pollo e il bacon all'interno.

Tra un fagotto e l'altro vale la pena togliere dal tegame il grosso dei fondi e metterlo da parte, altrimenti a star lí per 3 fagotti su 2 lati c'è il rischio che bruci. Alla fine, col tegame bello incrostato, ci ho versato del vino, ho aggiunto i fondi messi da parte, e col fuoco basso e il cucchiaio di legno ho scrostato il tutto. Ho poi aggiunto un po' di zucchero per contrastare l'aspro del vino e ho lasciato restringere, dopodiché ho filtrato col colino e ho ottenuto una salsina deliziosa da cospargenere sui fagotti al momento di servire.

Purtroppo non ho la foto perché, vuoi l'aspetto vuoi il profumo, tutti e tre i fagotti sono spariti prima che ci venisse in mente di fare la foto.

Siccome era rimasto dell'unto invitante nel fondo della pirofila e il forno era ancora caldo, ci ho poi arrostito delle patate al forno. In Italia ci saranno 40 gradi, ma qui si fa fatica ad arrivare a 20.

Il tutto, annaffiato da una bottiglia di dolcetto del monferrato che trovammo tempo fa in sconto al supermercato: saporito com'era il pollo, un vino bianco non avrebbe avuto speranza.

Posted Sat Jun 6 00:57:39 2009 Tags:
Posted Sat Jun 6 00:57:39 2009

Posts containing useful tips.

Resolving IP addresses in vim

A friend on IRC said: "I wish vim had a command to resolve all the IP addresses in a block of text".

But it does:

:<block>!perl -MSocket -pe 's/(\d+\.\d+\.\d+\.\d+)/gethostbyaddr(inet_aton($1), AF_INET)/ge'

If you use it often, put the perl command in a one-liner script and call it an editor macro. It works on other editors, too, and even without an editor at all. And it can be scripted!

We live with the power of Unix every day, so much that we risk forgetting how awesome it is.

Posted Wed Mar 7 14:07:07 2012 Tags:

SQLAlchemy, MySQL and sql_mode=traditional

As everyone should know, by default MySQL is an embarassing stupid toy:

mysql> create table foo (val integer not null);
Query OK, 0 rows affected (0.03 sec)

mysql> insert into foo values (1/0);
ERROR 1048 (23000): Column 'val' cannot be null

mysql> insert into foo values (1);
Query OK, 1 row affected (0.00 sec)

mysql> update foo set val=1/0 where val=1;
Query OK, 1 row affected, 1 warning (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 1

mysql> select * from foo;
+-----+
| val |
+-----+
|   0 |
+-----+
1 row in set (0.00 sec)

Luckily, you can tell it to stop being embarassingly stupid:

mysql> set sql_mode="traditional";
Query OK, 0 rows affected (0.00 sec)

mysql> update foo set val=1/0 where val=0;
ERROR 1365 (22012): Division by 0

(There is an even better sql mode you can choose, though: it is called "Install PostgreSQL")

Unfortunately, I've been hired to work on a project that relies on the embarassing stupid behaviour of MySQL, so I cannot set sql_mode=traditional globally or the existing house of cards will collapse.

Here is how you set it session-wide with SQLAlchemy 0.6.x: it took me quite a while to find out:

import sqlalchemy.interfaces

# Without this, MySQL will silently insert invalid values in the
# database, causing very long debugging sessions in the long run
class DontBeSilly(sqlalchemy.interfaces.PoolListener):
    def connect(self, dbapi_con, connection_record):
        cur = dbapi_con.cursor()
        cur.execute("SET SESSION sql_mode='TRADITIONAL'")
        cur = None
engine = create_engine(..., listeners=[DontBeSilly()])

Why does it take all that effort is beyond me. I'd have expected this to be turned on by default, possibly with a switch that insane people could use to turn it off.

Posted Mon Feb 27 19:45:58 2012 Tags:

Quei simpatici spammer di Aruba

Aruba ha deciso, di punto in bianco, di iscrivermi a tutte le loro newsletter.

Le newsletter non hanno link di deiscrizione. O meglio, forse ce l'hanno, ma si vedono solo decodificando la mail usando programmi che io non ho intenzione di usare. A prescindere dal link di deiscrizione, perché dovrei deiscrivermi da delle newsletter alle quali non mi sono mai iscritto?

Ho mandato questa mail a abuse@staff.aruba.it, e altre 3 segnalazioni dopo di questa, che ovviamente sono state ignorate:

Buon giorno,

vi segnalo questo spam inviato da voi stessi (in allegato la mail con
gli header intatti).

Potreste per favore procedere con provvedimenti disciplinari contro voi
stessi? Il vostro comportamento su internet viola le piú banali regole
di netiquette, ed è vostro interesse, come provider, istruire voi stessi
sulle stesse e farvele rispettare.


Cordiali saluti,

Enrico

Che dire, una nazione del terzo mondo si merita ISP da terzo mondo.

È pur sempre un'ottima scusa per studiarsi gli header_check di postfix: ora le mail delle newsletter di Aruba, che son tra l'altro dei patozzi da 300Kb l'una, incontrano un REJECT direttamente nella sessione SMTP:

550 5.7.1 Criminal third-world ISP spammers not accepted here.

Per farlo, ho aggiunto a /etc/postfix/main.cf:

# Reject aruba spam right away
header_checks = pcre:/etc/postfix/known_idiots.pcre

E poi ho creato il file /etc/postfix/known_idiots.pcre:

/^Received:.+smtpnewsletter[0-9]+.aruba.it/
REJECT Criminal third-world ISP spammers not accepted here.

Nel frattempo ho mandato un'email al Garante Privacy e una all'AGCOM, piú per curiosità che altro. Non mi aspetto nessuna risposta, ma se succede qualcosa lo aggiungo volentieri qui.

Posted Fri Oct 14 16:14:54 2011 Tags:

Award winning code

Me and Yuwei had a fun day at hhhmcr (#hhhmcr) and even managed to put together a prototype that won the first prize \o/

We played with the gmp24 dataset kindly extracted from Twitter by Michael Brunton-Spall of the Guardian into a convenient JSON dataset. The idea was to find ways of making it easier to look at the data and making sense of it.

This is the story of what we did, including the code we wrote.

The original dataset has several JSON files, so the first task was to put them all together:

#!/usr/bin/python

# Merge the JSON data
# (C) 2010 Enrico Zini <enrico@enricozini.org>
# License: WTFPL version 2 (http://sam.zoy.org/wtfpl/)

import simplejson
import os

res = []
for f in os.listdir("."):
    if not f.startswith("gmp24"): continue
    data = open(f).read().strip()
    if data == "[]": continue
    parsed = simplejson.loads(data)
    res.extend(parsed)

print simplejson.dumps(res)

The results however were not ordered by date, as GMP had to use several accounts to twit because Twitter was putting Greather Manchester Police into jail for generating too much traffic. There would be quite a bit to write about that, but let's stick to our work.

Here is code to sort the JSON data by time:

#!/usr/bin/python

# Sort the JSON data
# (C) 2010 Enrico Zini <enrico@enricozini.org>
# License: WTFPL version 2 (http://sam.zoy.org/wtfpl/)

import simplejson
import sys
import datetime as dt

all_recs = simplejson.load(sys.stdin)
all_recs.sort(key=lambda x: dt.datetime.strptime(x["created_at"], "%a %b %d %H:%M:%S +0000 %Y"))

simplejson.dump(all_recs, sys.stdout)

I then wanted to play with Tf-idf for extracting the most important words of every tweet:

#!/usr/bin/python

# tfifd - Annotate JSON elements with Tf-idf extracted keywords
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

import sys, math
import simplejson
import re

# Read all the twits
records = simplejson.load(sys.stdin)

# All the twits by ID
byid = dict(((x["id"], x) for x in records))

# Stopwords we ignore
stopwords = set(["by", "it", "and", "of", "in", "a", "to"])

# Tokenising engine
re_num = re.compile(r"^\d+$")
re_word = re.compile(r"(\w+)")
def tokenise(tweet):
    "Extract tokens from a tweet"
    for tok in tweet["text"].split():
        tok = tok.strip().lower()
        if re_num.match(tok): continue
        mo = re_word.match(tok)
        if not mo: continue
        if mo.group(1) in stopwords: continue
        yield mo.group(1)

# Extract tokens from tweets
tokenised = dict(((x["id"], list(tokenise(x))) for x in records))

# Aggregate token counts
aggregated = {}
for d in byid.iterkeys():
    for t in tokenised[d]:
        if t in aggregated:
            aggregated[t] += 1
        else:
            aggregated[t] = 1

def tfidf(doc, tok):
    "Compute TFIDF score of a token in a document"
    return doc.count(tok) * math.log(float(len(byid)) / aggregated[tok])

# Annotate tweets with keywords
res = []
for name, tweet in byid.iteritems():
    doc = tokenised[name]
    keywords = sorted(set(doc), key=lambda tok: tfidf(doc, tok), reverse=True)[:5]
    tweet["keywords"] = keywords
    res.append(tweet)

simplejson.dump(res, sys.stdout)

I thought this was producing a nice summary of every tweet but nobody was particularly interested, so we moved on to adding categories to tweet.

Thanks to Yuwei who put together some useful keyword sets, we managed to annotate each tweet with a place name (i.e. "Stockport"), a social place name (i.e. "pub", "bank") and a social category (i.e. "man", "woman", "landlord"...)

The code is simple; the biggest work in it was the dictionary of keywords:

#!/usr/bin/python

# categorise - Annotate JSON elements with categories
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
# Copyright (C) 2010  Yuwei Lin <yuwei@ylin.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

import sys, math
import simplejson
import re

# Electoral wards from http://en.wikipedia.org/wiki/List_of_electoral_wards_in_Greater_Manchester
placenames = ["Altrincham", "Sale West",
"Altrincham", "Ashton upon Mersey", "Bowdon", "Broadheath", "Hale Barns", "Hale Central", "St Mary", "Timperley", "Village",
"Ashton-under-Lyne",
"Ashton Hurst", "Ashton St Michael", "Ashton Waterloo", "Droylsden East", "Droylsden West", "Failsworth East", "Failsworth West", "St Peter",
"Blackley", "Broughton",
"Broughton", "Charlestown", "Cheetham", "Crumpsall", "Harpurhey", "Higher Blackley", "Kersal",
"Bolton North East",
"Astley Bridge", "Bradshaw", "Breightmet", "Bromley Cross", "Crompton", "Halliwell", "Tonge with the Haulgh",
"Bolton South East",
"Farnworth", "Great Lever", "Harper Green", "Hulton", "Kearsley", "Little Lever", "Darcy Lever", "Rumworth",
"Bolton West",
"Atherton", "Heaton", "Lostock", "Horwich", "Blackrod", "Horwich North East", "Smithills", "Westhoughton North", "Chew Moor", "Westhoughton South",
"Bury North",
"Church", "East", "Elton", "Moorside", "North Manor", "Ramsbottom", "Redvales", "Tottington",
"Bury South",
"Besses", "Holyrood", "Pilkington Park", "Radcliffe East", "Radcliffe North", "Radcliffe West", "St Mary", "Sedgley", "Unsworth",
"Cheadle",
"Bramhall North", "Bramhall South", "Cheadle", "Gatley", "Cheadle Hulme North", "Cheadle Hulme South", "Heald Green", "Stepping Hill",
"Denton", "Reddish",
"Audenshaw", "Denton North East", "Denton South", "Denton West", "Dukinfield", "Reddish North", "Reddish South",
"Hazel Grove",
"Bredbury", "Woodley", "Bredbury Green", "Romiley", "Hazel Grove", "Marple North", "Marple South", "Offerton",
"Heywood", "Middleton",
"Bamford", "Castleton", "East Middleton", "Hopwood Hall", "Norden", "North Heywood", "North Middleton", "South Middleton", "West Heywood", "West Middleton",
"Leigh",
"Astley Mosley Common", "Atherleigh", "Golborne", "Lowton West", "Leigh East", "Leigh South", "Leigh West", "Lowton East", "Tyldesley",
"Makerfield",
"Abram", "Ashton", "Bryn", "Hindley", "Hindley Green", "Orrell", "Winstanley", "Worsley Mesnes",
"Manchester Central",
"Ancoats", "Clayton", "Ardwick", "Bradford", "City Centre", "Hulme", "Miles Platting", "Newton Heath", "Moss Side", "Moston",
"Manchester", "Gorton",
"Fallowfield", "Gorton North", "Gorton South", "Levenshulme", "Longsight", "Rusholme", "Whalley Range",
"Manchester", "Withington",
"Burnage", "Chorlton", "Chorlton Park", "Didsbury East", "Didsbury West", "Old Moat", "Withington",
"Oldham East", "Saddleworth",
"Alexandra", "Crompton", "Saddleworth North", "Saddleworth South", "Saddleworth West", "Lees", "St James", "St Mary", "Shaw", "Waterhead",
"Oldham West", "Royton",
"Chadderton Central", "Chadderton North", "Chadderton South", "Coldhurst", "Hollinwood", "Medlock Vale", "Royton North", "Royton South", "Werneth",
"Rochdale",
"Balderstone", "Kirkholt", "Central Rochdale", "Healey", "Kingsway", "Littleborough Lakeside", "Milkstone", "Deeplish", "Milnrow", "Newhey", "Smallbridge", "Firgrove", "Spotland", "Falinge", "Wardle", "West Littleborough",
"Salford", "Eccles",
"Claremont", "Eccles", "Irwell Riverside", "Langworthy", "Ordsall", "Pendlebury", "Swinton North", "Swinton South", "Weaste", "Seedley",
"Stalybridge", "Hyde",
"Dukinfield Stalybridge", "Hyde Godley", "Hyde Newton", "Hyde Werneth", "Longdendale", "Mossley", "Stalybridge North", "Stalybridge South",
"Stockport",
"Brinnington", "Central", "Davenport", "Cale Green", "Edgeley", "Cheadle Heath", "Heatons North", "Heatons South", "Manor",
"Stretford", "Urmston",
"Bucklow-St Martins", "Clifford", "Davyhulme East", "Davyhulme West", "Flixton", "Gorse Hill", "Longford", "Stretford", "Urmston",
"Wigan",
"Aspull New Springs Whelley", "Douglas", "Ince", "Pemberton", "Shevington with Lower Ground", "Standish with Langtree", "Wigan Central", "Wigan West",
"Worsley", "Eccles South",
"Barton", "Boothstown", "Ellenbrook", "Cadishead", "Irlam", "Little Hulton", "Walkden North", "Walkden South", "Winton", "Worsley",
"Wythenshawe", "Sale East",
"Baguley", "Brooklands", "Northenden", "Priory", "Sale Moor", "Sharston", "Woodhouse Park"]

# Manual coding from Yuwei
placenames.extend(["City centre", "Tameside", "Oldham", "Bury", "Bolton",
"Trafford", "Pendleton", "New Moston", "Denton", "Eccles", "Leigh", "Benchill",
"Prestwich", "Sale", "Kearsley", ])
placenames.extend(["Trafford", "Bolton", "Stockport", "Levenshulme", "Gorton",
"Tameside", "Blackley", "City centre", "Airport", "South Manchester",
"Rochdale", "Chorlton", "Uppermill", "Castleton", "Stalybridge", "Ashton",
"Chadderton", "Bury", "Ancoats", "Whalley Range", "West Yorkshire",
"Fallowfield", "New Moston", "Denton", "Stretford", "Eccles", "Pendleton",
"Leigh", "Altrincham", "Sale", "Prestwich", "Kearsley", "Hulme", "Withington",
"Moss Side", "Milnrow", "outskirt of Manchester City Centre", "Newton Heath",
"Wythenshawe", "Mancunian Way", "M60", "A6", "Droylesden", "M56", "Timperley",
"Higher Ince", "Clayton", "Higher Blackley", "Lowton", "Droylsden",
"Partington", "Cheetham Hill", "Benchill", "Longsight", "Didsbury",
"Westhoughton"])


# Social categories from Yuwei
soccat = ["man", "woman", "men", "women", "youth", "teenager", "elderly",
"patient", "taxi driver", "neighbour", "male", "tenant", "landlord", "child",
"children", "immigrant", "female", "workmen", "boy", "girl", "foster parents",
"next of kin"]
for i in range(100):
    soccat.append("%d-year-old" % i)
    soccat.append("%d-years-old" % i)

# Types of social locations from Yuwei
socloc = ["car park", "park", "pub", "club", "shop", "premises", "bus stop",
"property", "credit card", "supermarket", "garden", "phone box", "theatre",
"toilet", "building site", "Crown court", "hard shoulder", "telephone kiosk",
"hotel", "restaurant", "cafe", "petrol station", "bank", "school",
"university"]


extras = { "placename": placenames, "soccat": soccat, "socloc": socloc }

# Normalise keyword lists
for k, v in extras.iteritems():
    # Remove duplicates
    v = list(set(v))
    # Sort by length
    v.sort(key=lambda x:len(x), reverse=True)

# Add keywords
def add_categories(tweet):
    text = tweet["text"].lower()
    for field, categories in extras.iteritems():
        for cat in categories:
            if cat.lower() in text:
                tweet[field] = cat
                break
    return tweet

# Read all the twits
records = (add_categories(x) for x in simplejson.load(sys.stdin))

simplejson.dump(list(records), sys.stdout)

All these scripts form a nice processing chain: each script takes a list of JSON records, adds some bit and passes it on.

In order to see what we have so far, here is a simple script to convert the JSON twits to CSV so they can be viewed in a spreadsheet:

#!/usr/bin/python

# Convert the JSON twits to CSV
# (C) 2010 Enrico Zini <enrico@enricozini.org>
# License: WTFPL version 2 (http://sam.zoy.org/wtfpl/)

import simplejson
import sys
import csv

rows = ["id", "created_at", "text", "keywords", "placename"]

writer = csv.writer(sys.stdout)
for rec in simplejson.load(sys.stdin):
    rec["keywords"] = " ".join(rec["keywords"])
    rec["placename"] = rec.get("placename", "")
    writer.writerow([rec[row] for row in rows])

At this point we were coming up with lots of questions: "were there more reports on women or men?", "which place had most incidents?", "what were the incidents involving animals?"... Time to bring Xapian into play.

This script reads all the JSON tweets and builds a Xapian index with them:

#!/usr/bin/python

# toxapian - Index JSON tweets in Xapian
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

import simplejson
import sys
import os, os.path
import xapian

DBNAME = sys.argv[1]

db = xapian.WritableDatabase(DBNAME, xapian.DB_CREATE_OR_OPEN)

stemmer = xapian.Stem("english")
indexer = xapian.TermGenerator()
indexer.set_stemmer(stemmer)
indexer.set_database(db)

data = simplejson.load(sys.stdin)
for rec in data:
    doc = xapian.Document()
    doc.set_data(str(rec["id"]))

    indexer.set_document(doc)
    indexer.index_text_without_positions(rec["text"])

    # Index categories as categories
    if "placename" in rec:
        doc.add_boolean_term("XP" + rec["placename"].lower())
    if "soccat" in rec:
        doc.add_boolean_term("XS" + rec["soccat"].lower())
    if "socloc" in rec:
        doc.add_boolean_term("XL" + rec["socloc"].lower())

    db.add_document(doc)

db.flush()

# Also save the whole dataset so we know where to find it later if we want to
# show the details of an entry
simplejson.dump(data, open(os.path.join(DBNAME, "all.json"), "w"))

And this is a simple command line tool to query to the database:

#!/usr/bin/python

# xgrep - Command line tool to query the GMP24 tweet Xapian database
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

import simplejson
import sys
import os, os.path
import xapian

DBNAME = sys.argv[1]

db = xapian.Database(DBNAME)

stem = xapian.Stem("english")

qp = xapian.QueryParser()
qp.set_default_op(xapian.Query.OP_AND)
qp.set_database(db)
qp.set_stemmer(stem)
qp.set_stemming_strategy(xapian.QueryParser.STEM_SOME)
qp.add_boolean_prefix("place", "XP")
qp.add_boolean_prefix("soc", "XS")
qp.add_boolean_prefix("loc", "XL")

query = qp.parse_query(sys.argv[2],
    xapian.QueryParser.FLAG_BOOLEAN |
    xapian.QueryParser.FLAG_LOVEHATE |
    xapian.QueryParser.FLAG_BOOLEAN_ANY_CASE |
    xapian.QueryParser.FLAG_WILDCARD |
    xapian.QueryParser.FLAG_PURE_NOT |
    xapian.QueryParser.FLAG_SPELLING_CORRECTION |
    xapian.QueryParser.FLAG_AUTO_SYNONYMS)

enquire = xapian.Enquire(db)
enquire.set_query(query)

count = 40
matches = enquire.get_mset(0, count)
estimated = matches.get_matches_estimated()
print "%d/%d results" % (matches.size(), estimated)

data = dict((str(x["id"]), x) for x in simplejson.load(open(os.path.join(DBNAME, "all.json"))))

for m in matches:
    rec = data[m.document.get_data()]
    print rec["text"]

print "%d/%d results" % (matches.size(), matches.get_matches_estimated())

total = db.get_doccount()
estimated = matches.get_matches_estimated()
print "%d results over %d documents, %d%%" % (estimated, total, estimated * 100 / total)

Neat! Now that we have a proper index that supports all sort of cool things, like stemming, tag clouds, full text search with complex queries, lookup of similar documents, suggest keywords and so on, it was just fair to put together a web service to share it with other people at the event.

It helped that I had already written similar code for apt-xapian-index and dde before.

Here is the server, quickly built on bottle. The very last line starts the server and it is where you can configure the listening interface and port.

#!/usr/bin/python

# xserve - Make the GMP24 tweet Xapian database available on the web
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

import bottle
from bottle import route, post
from cStringIO import StringIO
import cPickle as pickle
import simplejson
import sys
import os, os.path
import xapian
import urllib
import math

bottle.debug(True)

DBNAME = sys.argv[1]
QUERYLOG = os.path.join(DBNAME, "queries.txt")

data = dict((str(x["id"]), x) for x in simplejson.load(open(os.path.join(DBNAME, "all.json"))))

prefixes = { "place": "XP", "soc": "XS", "loc": "XL" }
prefix_desc = { "place": "Place name", "soc": "Social category", "loc": "Social location" }

db = xapian.Database(DBNAME)

stem = xapian.Stem("english")

qp = xapian.QueryParser()
qp.set_default_op(xapian.Query.OP_AND)
qp.set_database(db)
qp.set_stemmer(stem)
qp.set_stemming_strategy(xapian.QueryParser.STEM_SOME)
for k, v in prefixes.iteritems():
    qp.add_boolean_prefix(k, v)

def make_query(qstring):
    return qp.parse_query(qstring,
        xapian.QueryParser.FLAG_BOOLEAN |
        xapian.QueryParser.FLAG_LOVEHATE |
        xapian.QueryParser.FLAG_BOOLEAN_ANY_CASE |
        xapian.QueryParser.FLAG_WILDCARD |
        xapian.QueryParser.FLAG_PURE_NOT |
        xapian.QueryParser.FLAG_SPELLING_CORRECTION |
        xapian.QueryParser.FLAG_AUTO_SYNONYMS)


@route("/")
def index():
    query = urllib.unquote_plus(bottle.request.GET.get("q", ""))

    out = StringIO()
    print >>out, '''
<html>
<head>
<title>Query</title>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"></script>
<script type="text/javascript">
$(function(){
    $("#queryfield")[0].focus()
})
</script>
</head>
<body>
<h1>Search</h1>
<form method="POST" action="/query">
Keywords: <input type="text" name="query" value="%s" id="queryfield">
<input type="submit">
<a href="http://xapian.org/docs/queryparser.html">Help</a>
</form>''' % query

    print >>out, '''
<p>Example: "car place:wigan"</p>

<p>Available prefixes:</p>

<ul>
'''
    for pfx in prefixes.keys():
        print >>out, "<li><a href='/catinfo/%s'>%s - %s</a></li>" % (pfx, pfx, prefix_desc[pfx])
    print >>out, '''
</ul>
'''

    oldqueries = []
    if os.path.exists(QUERYLOG):
        total = db.get_doccount()
        fd = open(QUERYLOG, "r")
        while True:
            try:
                q = pickle.load(fd)
            except EOFError:
                break
            oldqueries.append(q)
        fd.close()

        def print_query(q):
            count = q["count"]
            print >>out, "<li><a href='/query?query=%s'>%s (%d/%d %.2f%%)</a></li>" % (urllib.quote_plus(q["q"]), q["q"], count, total, count * 100.0 / total)

        print >>out, "<p>Last 10 queries:</p><ul>"
        for q in oldqueries[:-10:-1]:
            print_query(q)
        print >>out, "</ul>"

        # Remove duplicates
        oldqueries = dict(((x["q"], x) for x in oldqueries)).values()

        print >>out, "<table>"
        print >>out, "<tr><th>10 queries with most results</th><th>10 queries with least results</th></tr>"
        print >>out, "<tr><td>"

        print >>out, "<ul>"
        oldqueries.sort(key=lambda x:x["count"], reverse=True)
        for q in oldqueries[:10]:
            print_query(q)
        print >>out, "</ul>"

        print >>out, "</td><td>"

        print >>out, "<ul>"
        nonempty = [x for x in oldqueries if x["count"] > 0]
        nonempty.sort(key=lambda x:x["count"])
        for q in nonempty[:10]:
            print_query(q)
        print >>out, "</ul>"

        print >>out, "</td></tr>"
        print >>out, "</table>"

    print >>out, '''
</body>
</html>'''
    return out.getvalue()

@route("/query")
@route("/query/")
@post("/query")
@post("/query/")
def query():
    query = bottle.request.POST.get("query", bottle.request.GET.get("query", ""))
    enquire = xapian.Enquire(db)
    enquire.set_query(make_query(query))

    count = 40
    matches = enquire.get_mset(0, count)
    estimated = matches.get_matches_estimated()
    total = db.get_doccount()

    out = StringIO()
    print >>out, '''
<html>
<head><title>Results</title></head>
<body>
<h1>Results for "<b>%s</b>"</h1>
''' % query

    if estimated == 0:
        print >>out, "No results found."
    else:
        # Give as results the first 30 documents; also use them as the key
        # ones to use to compute relevant terms
        rset = xapian.RSet()
        for m in enquire.get_mset(0, 30):
            rset.add_document(m.document.get_docid())

        # Compute the tag cloud
        class NonTagFilter(xapian.ExpandDecider):
            def __call__(self, term):
                return not term[0].isupper() and not term[0].isdigit()
        cloud = []
        maxscore = None
        for res in enquire.get_eset(40, rset, NonTagFilter()):
            # Normalise the score in the interval [0, 1]
            weight = math.log(res.weight)
            if maxscore == None: maxscore = weight
            tag = res.term
            cloud.append([tag, float(weight) / maxscore])
        max_weight = cloud[0][1]
        min_weight = cloud[-1][1]
        cloud.sort(key=lambda x:x[0])

        def mklink(query, term):
            return "/query?query=%s" % urllib.quote_plus(query + " and " + term)
        print >>out, "<h2>Tag cloud</h2>"
        print >>out, "<blockquote>"
        for term, weight in cloud:
            size = 100 + 100.0 * (weight - min_weight) / (max_weight - min_weight)
            print >>out, "<a href='%s' style='font-size:%d%%; color:brown;'>%s</a>" % (mklink(query, term), size, term)
        print >>out, "</blockquote>"

        print >>out, "<h2>Results</h2>"
        print >>out, "<p><a href='/'>Search again</a></p>"

        print >>out, "<p>%d results over %d documents, %.2f%%</p>" % (estimated, total, estimated * 100.0 / total)
        print >>out, "<p>%d/%d results</p>" % (matches.size(), estimated)

        print >>out, "<ul>"
        for m in matches:
            rec = data[m.document.get_data()]
            print >>out, "<li><a href='/item/%s'>%s</a></li>" % (rec["id"], rec["text"])
        print >>out, "</ul>"

        fd = open(QUERYLOG, "a")
        qinfo = dict(q=query, count=estimated)
        pickle.dump(qinfo, fd)
        fd.close()

    print >>out, '''
<a href="/">Search again</a>

</body>
</html>'''
    return out.getvalue()

@route("/item/:id")
@route("/item/:id/")
def show(id):
    rec = data[id]

    out = StringIO()
    print >>out, '''
<html>
<head><title>Result %s</title></head>
<body>
<h1>Raw JSON record for twit %s</h1>
<pre>''' % (rec["id"], rec["id"])

    print >>out, simplejson.dumps(rec, indent=" ")

    print >>out, '''
</pre>
</body>
</html>'''
    return out.getvalue()

@route("/catinfo/:name")
@route("/catinfo/:name/")
def catinfo(name):
    prefix = prefixes[name]
    out = StringIO()
    print >>out, '''
<html>
<head><title>Values for %s</title></head>
<body>
''' % name

    terms = [(x.term[len(prefix):], db.get_termfreq(x.term)) for x in db.allterms(prefix)]
    terms.sort(key=lambda x:x[1], reverse=True)
    freq_min = terms[0][1]
    freq_max = terms[-1][1]

    def mklink(name, term):
        return "/query?query=%s" % urllib.quote_plus(name + ":" + term)

    # Build tag cloud
    print >>out, "<h1>Tag cloud</h1>"
    print >>out, "<blockquote>"
    for term, freq in sorted(terms[:20], key=lambda x:x[0]):
        size = 100 + 100.0 * (freq - freq_min) / (freq_max - freq_min)
        print >>out, "<a href='%s' style='font-size:%d%%; color:brown;'>%s</a>" % (mklink(name, term), size, term)
    print >>out, "</blockquote>"

    print >>out, "<h1>All terms</h1>"
    print >>out, "<table>"
    print >>out, "<tr><th>Occurrences</th><th>Name</th></tr>"
    for term, freq in terms:
        print >>out, "<tr><td>%d</td><td><a href='/query?query=%s'>%s</a></td></tr>" % (freq, urllib.quote_plus(name + ":" + term), term)
    print >>out, "</table>"

    print >>out, '''
</body>
</html>'''
    return out.getvalue()

# Change here for bind host and port
bottle.run(host="0.0.0.0", port=8024)

...and then we presented our work and ended up winning the contest.

This was the story of how we wrote this set of award winning code.

Posted Sat Oct 16 01:36:08 2010 Tags:

Computing time offsets between EXIF and GPS

I like the idea of matching photos to GPS traces. In Debian there is gpscorrelate but it's almost unusable to me because of bug #473362 and it has an awkward way of specifying time offsets.

Here at SoTM10 someone told me that exiftool gained -geosync and -geotag options. So it's just a matter of creating a little tool that shows a photo and asks you to type the GPS time you see in it.

Apparently there are no bindings or GIR files for gtkimageview in Debian, so I'll have to use C.

Here is a C prototype:

/*
 * gpsoffset - Compute EXIF time offset from a photo of a gps display
 *
 * Use with exiftool -geosync=... -geotag trace.gpx DIR
 *
 * Copyright (C) 2009--2010  Enrico Zini <enrico@enricozini.org>
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 */


#define _XOPEN_SOURCE /* glibc2 needs this */
#include <time.h>
#include <gtkimageview/gtkimageview.h>
#include <libexif/exif-data.h>
#include <stdio.h>
#include <stdlib.h>

static int load_time(const char* fname, struct tm* tm)
{
    ExifData* exif_data = exif_data_new_from_file(fname);
    ExifEntry* exif_time = exif_data_get_entry(exif_data, EXIF_TAG_DATE_TIME);
    if (exif_time == NULL)
    {
        fprintf(stderr, "Cannot find EXIF timetamp\n");
        return -1;
    }

    char buf[1024];
    exif_entry_get_value(exif_time, buf, 1024);
    //printf("val2: %s\n", exif_entry_get_value(t2, buf, 1024));

    if (strptime(buf, "%Y:%m:%d %H:%M:%S", tm) == NULL)
    {
        fprintf(stderr, "Cannot match EXIF timetamp\n");
        return -1;
    }

    return 0;
}

static time_t exif_ts;
static GtkWidget* res_lbl;

void date_entry_changed(GtkEditable *editable, gpointer user_data)
{
    const gchar* text = gtk_entry_get_text(GTK_ENTRY(editable));
    struct tm parsed;
    if (strptime(text, "%Y-%m-%d %H:%M:%S", &parsed) == NULL)
    {
        gtk_label_set_text(GTK_LABEL(res_lbl), "Please enter a date as YYYY-MM-DD HH:MM:SS");
    } else {
        time_t img_ts = mktime(&parsed);
        int c;
        int res;
        if (exif_ts < img_ts)
        {
            c = '+';
            res = img_ts - exif_ts;
        }
        else
        {
            c = '-';
            res = exif_ts - img_ts;
        }
        char buf[1024];
        if (res > 3600)
            snprintf(buf, 1024, "Result: %c%ds -geosync=%c%d:%02d:%02d",
                    c, res, c, res / 3600, (res / 60) % 60, res % 60);
        else if (res > 60)
            snprintf(buf, 1024, "Result: %c%ds -geosync=%c%02d:%02d",
                    c, res, c, (res / 60) % 60, res % 60);
        else 
            snprintf(buf, 1024, "Result: %c%ds -geosync=%c%d",
                    c, res, c, res);
        gtk_label_set_text(GTK_LABEL(res_lbl), buf);
    }
}

int main (int argc, char *argv[])
{
    // Work in UTC to avoid mktime applying DST or timezones
    setenv("TZ", "UTC");

    const char* filename = "/home/enrico/web-eddie/galleries/2010/04-05-Uppermill/P1080932.jpg";

    gtk_init (&argc, &argv);

    struct tm exif_time;
    if (load_time(filename, &exif_time) != 0)
        return 1;

    printf("EXIF time: %s\n", asctime(&exif_time));
    exif_ts = mktime(&exif_time);

    GtkWidget* window = gtk_window_new(GTK_WINDOW_TOPLEVEL);
    GtkWidget* vb = gtk_vbox_new(FALSE, 0);
    GtkWidget* hb = gtk_hbox_new(FALSE, 0);
    GtkWidget* lbl = gtk_label_new("Timestamp:");
    GtkWidget* exif_lbl;
    {
        char buf[1024];
        strftime(buf, 1024, "EXIF time: %Y-%m-%d %H:%M:%S", &exif_time);
        exif_lbl = gtk_label_new(buf);
    }
    GtkWidget* date_ent = gtk_entry_new();
    res_lbl = gtk_label_new("Result:");
    GtkWidget* view = gtk_image_view_new();
    GdkPixbuf* pixbuf = gdk_pixbuf_new_from_file(filename, NULL);

    gtk_box_pack_start(GTK_BOX(hb), lbl, FALSE, TRUE, 0);
    gtk_box_pack_start(GTK_BOX(hb), date_ent, TRUE, TRUE, 0);

    gtk_signal_connect(GTK_OBJECT(date_ent), "changed", (GCallback)date_entry_changed, NULL);
    {
        char buf[1024];
        strftime(buf, 1024, "%Y-%m-%d %H:%M:%S", &exif_time);
        gtk_entry_set_text(GTK_ENTRY(date_ent), buf);
    }

    gtk_widget_set_size_request(view, 500, 400);
    gtk_image_view_set_pixbuf(GTK_IMAGE_VIEW(view), pixbuf, TRUE);
    gtk_container_add(GTK_CONTAINER(window), vb);
    gtk_box_pack_start(GTK_BOX(vb), view, TRUE, TRUE, 0);
    gtk_box_pack_start(GTK_BOX(vb), hb, FALSE, TRUE, 0);
    gtk_box_pack_start(GTK_BOX(vb), exif_lbl, FALSE, TRUE, 0);
    gtk_box_pack_start(GTK_BOX(vb), res_lbl, FALSE, TRUE, 0);
    gtk_widget_show_all(window);

    gtk_main ();

    return 0;
}

And here is its simple makefile:

CFLAGS=$(shell pkg-config --cflags gtkimageview libexif)
LDFLAGS=$(shell pkg-config --libs gtkimageview libexif)

gpsoffset: gpsoffset.c

It's a simple prototype but it's a working prototype and seems to do the job for me.

I currently cannot find out why after I click on the text box, there seems to be no way to give the focus back to the image viewer so I can control it with keys.

There is another nice algorithm to compute time offsets to be implemented: you choose a photo taken from a known place and drag it on that place on a map: you can then look for the nearest point on your GPX trace and compute the time offset from that.

I have seen that there are programs for geotagging photos that implement all such algorithms, and have a nice UI, but I haven't seen any in Debian.

Are there any such softwares that can be packaged?

If not, the interpolation and annotation tasks can now already be performed by exiftool, so it's just a matter of building a good UI, and I would love to see someone picking up the task.

Posted Sun Jul 11 12:34:04 2010 Tags:

Searching OSM nodes in Spatialite

Third step of my SoTM10 pet project: finding the POIs.

I put together a query to find all nodes with a given tag inside a bounding box, and also a query to find all the tag values for a given tag name inside a bounding box.

The result is this simple POI search engine:

#
# poisearch - simple geographical POI search engine
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#

from pysqlite2 import dbapi2 as sqlite

class PoiDB(object):
    def __init__(self):
        self.db = sqlite.connect("pois.db")
        self.db.enable_load_extension(True)
        self.db.execute("SELECT load_extension('libspatialite.so')")
        self.oldsearch = []
        self.bbox = None

    def set_bbox(self, xmin, xmax, ymin, ymax):
        '''Set bbox for searches'''
        self.bbox = (xmin, xmax, ymin, ymax)

    def tagid(self, name, val):
        '''Get the database ID for a tag'''
        c = self.db.cursor()
        c.execute("SELECT id FROM tag WHERE name=? AND value=?", (name, val))
        res = None
        for row in c:
            res = row[0]
        return res

    def tagnames(self):
        '''Get all tag names'''
        c = self.db.cursor()
        c.execute("SELECT DISTINCT name FROM tag ORDER BY name")
        for row in c:
            yield row[0]

    def tagvalues(self, name, use_bbox=False):
        '''
        Get all tag values for a given tag name,
        optionally in the current bounding box
        '''
        c = self.db.cursor()
        if self.bbox is None or not use_bbox:
            c.execute("SELECT DISTINCT value FROM tag WHERE name=? ORDER BY value", (name,))
        else:
            c.execute("SELECT DISTINCT tag.value FROM poi, poitag, tag"
                      " WHERE poi.rowid IN (SELECT pkid FROM idx_poi_geom WHERE ("
                      "       xmin >= ? AND xmax <= ? AND ymin >= ? AND ymax <= ?) )"
                      "   AND poitag.tag = tag.id AND poitag.poi = poi.id"
                      "   AND tag.name=?",
                      self.bbox + (name,))
        for row in c:
            yield row[0]

    def search(self, name, val):
        '''Get all name:val tags in the current bounding box'''
        # First resolve the tagid
        tagid = self.tagid(name, val)
        if tagid is None: return

        c = self.db.cursor()
        c.execute("SELECT poi.name, poi.data, X(poi.geom), Y(poi.geom) FROM poi, poitag"
                  " WHERE poi.rowid IN (SELECT pkid FROM idx_poi_geom WHERE ("
                  "       xmin >= ? AND xmax <= ? AND ymin >= ? AND ymax <= ?) )"
                  "   AND poitag.tag = ? AND poitag.poi = poi.id",
                  self.bbox + (tagid,))
        self.oldsearch = []
        for row in c:
            self.oldsearch.append(row)
            yield row[0], simplejson.loads(row[1]), row[2], row[3]

    def count(self, name, val):
        '''Count all name:val tags in the current bounding box'''
        # First resolve the tagid
        tagid = self.tagid(name, val)
        if tagid is None: return

        c = self.db.cursor()
        c.execute("SELECT COUNT(*) FROM poi, poitag"
                  " WHERE poi.rowid IN (SELECT pkid FROM idx_poi_geom WHERE ("
                  "       xmin >= ? AND xmax <= ? AND ymin >= ? AND ymax <= ?) )"
                  "   AND poitag.tag = ? AND poitag.poi = poi.id",
                  self.bbox + (tagid,))
        for row in c:
            return row[0]

    def replay(self):
        for row in self.oldsearch:
            yield row[0], simplejson.loads(row[1]), row[2], row[3]

Problem 3 solved: now on to the next step, building a user interface for it.

Posted Sat Jul 10 15:50:31 2010 Tags:

Importing OSM nodes into Spatialite

Second step of my SoTM10 pet project: creating a searchable database with the points. What a fantastic opportunity to learn Spatialite.

Learning Spatialite is easy. For example, you can use the two tutorials with catchy titles that assume your best wish in life is to create databases out of shapefiles using a pre-built, i386-only executable GUI binary downloaded over an insecure HTTP connection.

To be fair, the second of those tutorials is called "An almost Idiot's Guide", thus expliciting the requirement of being an almost idiot in order to happily acquire and run software in that way.

Alternatively, you can use A quick tutorial to SpatiaLite which is so quick it has examples that lead you to write SQL queries that trigger all sorts of vague exceptions at insert time. But at least it brought me a long way forward, at which point I could just cross reference things with PostGIS documentation to find out the right way of doing things.

So, here's the importer script, which will probably become my reference example for how to get started with Spatialite, and how to use Spatialite from Python:

#!/usr/bin/python

#
# poiimport - import nodes from OSM into a spatialite DB
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#

import xml.sax
import xml.sax.handler
from pysqlite2 import dbapi2 as sqlite
import simplejson
import sys
import os

class OSMPOIReader(xml.sax.handler.ContentHandler):
    '''
    Filter SAX events in a OSM XML file to keep only nodes with names
    '''
    def __init__(self, consumer):
        self.consumer = consumer

    def startElement(self, name, attrs):
        if name == "node":
            self.attrs = attrs
            self.tags = dict()
        elif name == "tag":
            self.tags[attrs["k"]] = attrs["v"]

    def endElement(self, name):
        if name == "node":
            lat = float(self.attrs["lat"])
            lon = float(self.attrs["lon"])
            id = int(self.attrs["id"])
            #dt = parse(self.attrs["timestamp"])
            uid = self.attrs.get("uid", None)
            uid = int(uid) if uid is not None else None
            user = self.attrs.get("user", None)

            self.consumer(lat, lon, id, self.tags, user=user, uid=uid)

class Importer(object):
    '''
    Create the spatialite database and populate it
    '''
    TAG_WHITELIST = set(["amenity", "shop", "tourism", "place"])

    def __init__(self, filename):
        self.db = sqlite.connect(filename)
        self.db.enable_load_extension(True)
        self.db.execute("SELECT load_extension('libspatialite.so')")
        self.db.execute("SELECT InitSpatialMetaData()")
        self.db.execute("INSERT INTO spatial_ref_sys (srid, auth_name, auth_srid,"
                        " ref_sys_name, proj4text) VALUES (4326, 'epsg', 4326,"
                        " 'WGS 84', '+proj=longlat +ellps=WGS84 +datum=WGS84"
                        " +no_defs')")
        self.db.execute("CREATE TABLE poi (id int not null unique primary key,"
                        " name char, data text)")
        self.db.execute("SELECT AddGeometryColumn('poi', 'geom', 4326, 'POINT', 2)")
        self.db.execute("SELECT CreateSpatialIndex('poi', 'geom')")
        self.db.execute("CREATE TABLE tag (id integer primary key autoincrement,"
                        " name char, value char)")
        self.db.execute("CREATE UNIQUE INDEX tagidx ON tag (name, value)")
        self.db.execute("CREATE TABLE poitag (poi int not null, tag int not null)")
        self.db.execute("CREATE UNIQUE INDEX poitagidx ON poitag (poi, tag)")
        self.tagid_cache = dict()

    def tagid(self, k, v):
        key = (k, v)
        res = self.tagid_cache.get(key, None)
        if res is None:
            c = self.db.cursor()
            c.execute("SELECT id FROM tag WHERE name=? AND value=?", key)
            for row in c:
                self.tagid_cache[key] = row[0]
                return row[0]
            self.db.execute("INSERT INTO tag (id, name, value) VALUES (NULL, ?, ?)", key)
            c.execute("SELECT last_insert_rowid()")
            for row in c:
                res = row[0]
            self.tagid_cache[key] = res
        return res

    def __call__(self, lat, lon, id, tags, user=None, uid=None):
        # Acquire tag IDs
        tagids = []
        for k, v in tags.iteritems():
            if k not in self.TAG_WHITELIST: continue
            for val in v.split(";"):
                tagids.append(self.tagid(k, val))

        # Skip elements that don't have the tags we want
        if not tagids: return

        geom = "POINT(%f %f)" % (lon, lat)
        self.db.execute("INSERT INTO poi (id, geom, name, data)"
                        "     VALUES (?, GeomFromText(?, 4326), ?, ?)", 
                (id, geom, tags["name"], simplejson.dumps(tags)))

        for tid in tagids:
            self.db.execute("INSERT INTO poitag (poi, tag) VALUES (?, ?)", (id, tid))


    def done(self):
        self.db.commit()

# Get the output file name
filename = sys.argv[1]

# Ensure we start from scratch
if os.path.exists(filename):
    print >>sys.stderr, filename, "already exists"
    sys.exit(1)

# Import
parser = xml.sax.make_parser()
importer = Importer(filename)
handler = OSMPOIReader(importer)
parser.setContentHandler(handler)
parser.parse(sys.stdin)
importer.done()

Let's run it:

$ ./poiimport pois.db < pois.osm 
SpatiaLite version ..: 2.4.0    Supported Extensions:
        - 'VirtualShape'        [direct Shapefile access]
        - 'VirtualDbf'          [direct Dbf access]
        - 'VirtualText'         [direct CSV/TXT access]
        - 'VirtualNetwork'      [Dijkstra shortest path]
        - 'RTree'               [Spatial Index - R*Tree]
        - 'MbrCache'            [Spatial Index - MBR cache]
        - 'VirtualFDO'          [FDO-OGR interoperability]
        - 'SpatiaLite'          [Spatial SQL - OGC]
PROJ.4 Rel. 4.7.1, 23 September 2009
GEOS version 3.2.0-CAPI-1.6.0
$ ls -l --si pois*
-rw-r--r-- 1 enrico enrico 17M Jul  9 23:44 pois.db
-rw-r--r-- 1 enrico enrico 37M Jul  9 16:20 pois.osm
$ spatialite pois.db
SpatiaLite version ..: 2.4.0    Supported Extensions:
        - 'VirtualShape'        [direct Shapefile access]
        - 'VirtualDbf'          [direct DBF access]
        - 'VirtualText'         [direct CSV/TXT access]
        - 'VirtualNetwork'      [Dijkstra shortest path]
        - 'RTree'               [Spatial Index - R*Tree]
        - 'MbrCache'            [Spatial Index - MBR cache]
        - 'VirtualFDO'          [FDO-OGR interoperability]
        - 'SpatiaLite'          [Spatial SQL - OGC]
PROJ.4 version ......: Rel. 4.7.1, 23 September 2009
GEOS version ........: 3.2.0-CAPI-1.6.0
SQLite version ......: 3.6.23.1
Enter ".help" for instructions
spatialite> select id from tag where name="amenity" and value="fountain";
24
spatialite> SELECT poi.name, poi.data, X(poi.geom), Y(poi.geom) FROM poi, poitag WHERE poi.rowid IN (SELECT pkid FROM idx_poi_geom WHERE (xmin >= 2.56 AND xmax <= 2.90 AND ymin >= 41.84 AND ymax <= 42.00) ) AND poitag.tag = 24 AND poitag.poi = poi.id;
Font Picant de la Cellera|{"amenity": "fountain", "name": "Font Picant de la Cellera"}|2.616045|41.952449
Font de Can Pla|{"amenity": "fountain", "name": "Font de Can Pla"}|2.622354|41.974724
Font de Can Ribes|{"amenity": "fountain", "name": "Font de Can Ribes"}|2.62311|41.979193

It's impressive: I've got all sort of useful information for the whole of Spain in just 17Mb!

Let's put it to practice: I'm thirsty, is there any water fountain nearby?

spatialite> SELECT count(1) FROM poi, poitag WHERE poi.rowid IN (SELECT pkid FROM idx_poi_geom WHERE (xmin >= 2.80 AND xmax <= 2.85 AND ymin >= 41.97 AND ymax <= 42.00) ) AND poitag.tag = 24 AND poitag.poi = poi.id;
0

Ouch! No water fountains mapped in Girona... yet.

Problem 2 solved: now on to the next step, trying to show the results in some usable way.

Posted Sat Jul 10 09:10:35 2010 Tags:

Filtering nodes out of OSM files

I have a pet project here at SoTM10: create a tool for searching nearby POIs while offline.

The idea is to have something in my pocket (FreeRunner or N900), which doesn't require an internet connection, and which can point me at the nearest fountains, post offices, atm machines, bars and so on.

The first step is to obtain a list of POIs.

In theory one can use Xapi but all the known Xapi servers appear to be down at the moment.

Another attempt is to obtain it by filtering all nodes with the tags we want out of a planet OSM extract. I downloaded the Spanish one and set to work.

First I tried with xmlstarlet, but it ate all the RAM and crashed my laptop, because for some reason, on my laptop the Linux kernels up to 2.6.32 (don't now about later ones) like to swap out ALL running apps to cache I/O operations, which mean that heavy I/O operations swap out the very programs performing them, so the system gets caught in some infinite I/O loop and dies. Or at least this is what I've figured out so far.

So, we need SAX. I put together this prototype in Python, which can process a nice 8MB/s of OSM data for quite some time with a constant, low RAM usage:

#!/usr/bin/python

#
# poifilter - extract interesting nodes from OSM XML files
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#


import xml.sax
import xml.sax.handler
import xml.sax.saxutils
import sys

class XMLSAXFilter(xml.sax.handler.ContentHandler):
    '''
    A SAX filter that is a ContentHandler.

    There is xml.sax.saxutils.XMLFilterBase in the standard library but it is
    undocumented, and most of the examples using it you find online are wrong.
    You can look at its source code, and at that point you find out that it is
    an offensive practical joke.
    '''
    def __init__(self, downstream):
        self.downstream = downstream

    # ContentHandler methods

    def setDocumentLocator(self, locator):
        self.downstream.setDocumentLocator(locator)

    def startDocument(self):
        self.downstream.startDocument()

    def endDocument(self):
        self.downstream.endDocument()

    def startPrefixMapping(self, prefix, uri):
        self.downstream.startPrefixMapping(prefix, uri)

    def endPrefixMapping(self, prefix):
        self.downstream.endPrefixMapping(prefix)

    def startElement(self, name, attrs):
        self.downstream.startElement(name, attrs)

    def endElement(self, name):
        self.downstream.endElement(name)

    def startElementNS(self, name, qname, attrs):
        self.downstream.startElementNS(name, qname, attrs)

    def endElementNS(self, name, qname):
        self.downstream.endElementNS(name, qname)

    def characters(self, content):
        self.downstream.characters(content)

    def ignorableWhitespace(self, chars):
        self.downstream.ignorableWhitespace(chars)

    def processingInstruction(self, target, data):
        self.downstream.processingInstruction(target, data)

    def skippedEntity(self, name):
        self.downstream.skippedEntity(name)

class OSMPOIHandler(XMLSAXFilter):
    '''
    Filter SAX events in a OSM XML file to keep only nodes with names
    '''
    PASSTHROUGH = ["osm", "bound"]
    TAG_WHITELIST = set(["amenity", "shop", "tourism", "place"])

    def startElement(self, name, attrs):
        if name in self.PASSTHROUGH:
            self.downstream.startElement(name, attrs)
        elif name == "node":
            self.attrs = attrs
            self.tags = []
            self.propagate = False
        elif name == "tag":
            if self.tags is not None:
                self.tags.append(attrs)
                if attrs["k"] in self.TAG_WHITELIST:
                    self.propagate = True
        else:
            self.tags = None
            self.attrs = None

    def endElement(self, name):
        if name in self.PASSTHROUGH:
            self.downstream.endElement(name)
        elif name == "node":
            if self.propagate:
                self.downstream.startElement("node", self.attrs)
                for attrs in self.tags:
                    self.downstream.startElement("tag", attrs)
                    self.downstream.endElement("tag")
                self.downstream.endElement("node")

    def ignorableWhitespace(self, chars):
        pass

    def characters(self, content):
        pass

# Simple stdin->stdout XMl filter
parser = xml.sax.make_parser()
handler = OSMPOIHandler(xml.sax.saxutils.XMLGenerator(sys.stdout, "utf-8"))
parser.setContentHandler(handler)
parser.parse(sys.stdin)

Let's run it:

$ bzcat /store/osm/spain.osm.bz2 | pv | ./poifilter > pois.osm
[...]
$ ls -l --si pois.osm
-rw-r--r-- 1 enrico enrico 19M Jul 10 23:56 pois.osm
$ xmlstarlet val pois.osm 
pois.osm - valid

Problem 1 solved: now on to the next step: importing the nodes in a database.

Posted Fri Jul 9 16:28:15 2010 Tags:

Tweaking locale settings

I sometimes meet some Italian programmer who prefers his system to be in English, so they get untranslated manpages and error messages.

I sometimes notice that their solution often leaves them something to complain about.

Some set LANG=C and complain they can't see accented characters.

Some set LANG=en_US.UTF-8 and complain that OpenOffice Calc wants dates in the format MM/DD/YYYY which is an Abomination unto Nuggan, as well as unto Me.

But the locales system can do much better than that. In fact, most times people would be extremely happy with LC_MESSAGES in English, and everything else in Italian:

$ locale
LANG=it_IT.UTF-8
LC_CTYPE="it_IT.UTF-8"
LC_NUMERIC="it_IT.UTF-8"
LC_TIME="it_IT.UTF-8"
LC_COLLATE="it_IT.UTF-8"
LC_MONETARY="it_IT.UTF-8"
LC_MESSAGES=en_US.UTF-8
LC_PAPER="it_IT.UTF-8"
LC_NAME="it_IT.UTF-8"
LC_ADDRESS="it_IT.UTF-8"
LC_TELEPHONE="it_IT.UTF-8"
LC_MEASUREMENT="it_IT.UTF-8"
LC_IDENTIFICATION="it_IT.UTF-8"
LC_ALL=

A way to do this (is there a better one?) is to tell the display manager (GDM, KDM...) to use the Italian locale, and then override the right LC_* bits in ~/.xsessionrc:

$ cat ~/.xsessionrc
export LC_MESSAGES=en_US.UTF-8

That does the trick: English messages, with Proper currency, Proper dates, accented letters sort Properly, Proper A4 printer paper, Proper SI units. Even Nuggan would be happy.

Posted Sat Apr 17 15:39:19 2010 Tags:

Temporarily disabling file caching

Does it happen to you that you cp a big, big file (say, similar in order of magnitude to the amount of RAM) and the system becomes rather unusable?

It looks like Linux is saying "let's cache this", and as you copy it will try to free more and more ram in order to cache the big file you're copying. In the end, all the RAM is full with file data that you are not going to need.

This varies according to how /proc/sys/vm/swappiness is set.

I learnt about posix_fadvise and I tried to play with it. The result is this preloadable library that hooks into open(2) and fadvises everything as POSIX_FADV_DONTNEED.

It is all rather awkward. fadvise in that way will discard existing cache pages if the file is already cached, which is too much. Ideally one would like to say "don't cache this because of me" without stepping on the feet of other system activities.

Also, I found I need to also hook into write(2) and run fadvise after every single write, because you can't fadvise a file to be written in its entirety, unless you pass fadvise the file size in advance. But the size of the output file cannot be known by the preloaded library, so meh.

So, now I can run: nocache cp bigfile someplace/ without trashing the existing caches. I can also run nocache tar zxf foo.tar.gz and so on. I wish, of course, that there were no need to do so in the first place.

Here is the nocache library source code, for reference:

/*
 * nocache - LD_PRELOAD library to fadvise written files to not be cached
 *
 * Copyright (C) 2009--2010 Enrico Zini <enrico@enricozini.org>
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 */

#define _XOPEN_SOURCE 600
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <dlfcn.h>
#include <stdarg.h>
#include <errno.h>
#include <stdio.h>

typedef int (*open_t)(const char*, int, ...);
typedef int (*write_t)(int fd, const void *buf, size_t count);

int open(const char *pathname, int flags, ...)
{
    static open_t func = 0;
    int res;
    if (!func)
        func = (open_t)dlsym(RTLD_NEXT, "open");

    // Note: I wanted to add O_DIRECT, but it imposes restriction on buffer
    // alignment
    if (flags & O_CREAT)
    {
        va_list ap;
        va_start(ap, flags);
        mode_t mode = va_arg(ap, mode_t);
        res = func(pathname, flags, mode);
        va_end(ap);
    } else
        res = func(pathname, flags);

    if (res >= 0)
    {
        int saved_errno = errno;
        int z = posix_fadvise(res, 0, 0, POSIX_FADV_DONTNEED);
        if (z != 0) fprintf(stderr, "Cannot fadvise on %s: %m\n", pathname);
        errno = saved_errno;
    }

    return res;
}

int write(int fd, const void *buf, size_t count)
{
    static write_t func = 0;
    int res;
    if (!func)
        func = (write_t)dlsym(RTLD_NEXT, "write");

    res = func(fd, buf, count);

    if (res > 0)
    {
        int saved_errno = errno;
        int z = posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
        if (z != 0) fprintf(stderr, "Cannot fadvise during write: %m\n");
        errno = saved_errno;
    }

    return res;
}

Updates

Steve Schnepp writes:

Robert Love did a O_STREAMING patch for 2.4. It wasn't merged in 2.6 since POSIX_FADV_NOREUSE should be used instead.

But unfortunatly it's currently mapped either as WILLNEED or as a noop.

It seems that there is a google code project that has spawned to control this.

Posted Mon Mar 8 13:26:28 2010 Tags:
Posted Sat Jun 6 00:57:39 2009

Bologna.

Pasto criminale

Si era in coda nella circonvallazione di San Giovanni in Persiceto per andare a una visita chirurgica all'USL di Crevalcore, quando si è notato sulla destra questo sbarazzino ristorante con un po' di patacche sulla porta da qualche guida, tra cui pure la Michelin: http://www.ristorantegiardinetto.it/

Tornati a casa, abbiamo googlato e chiesto a un po' di gente di San Giovanni, e nessuno lo conosceva. Una persona sana di mente avrebbe detto: "se la gente di San Giovanni non lo conosce, ci sarà un motivo". Noi ci siamo invece detti "se non lo conosce nessuno, andiamo a vedere".

La verità è: "Se la gente di San Giovanni va invece al Bertoldo, un motivo ci sarà".

Siamo entrati ottimisti. Menú tipico Bolognese, tortellini, passatelli, tortelloni di ricotta, bolliti. Piú "Menú del giorno" dedicato alla Sicilia con Pasta alla Norma e cose cosí. Bello.

Primi ordinati: Pasta alla Norma, Tortellini in Brodo.

Come secondi: Misto di verdure arrosto, Braciola di maiale. Avrei voluto lo zampone dal carrello dei bolliti, ma siccome è ancora troppo caldo i bolliti non li hanno, per cui ho ripiegato sulla braciola.

Dopo che il cameriere è andato via con l'ordine, mi è venuto il terribile dubbio: "ma se non siamo in stagione per i bolliti, con cosa lo fanno il brodo?".

Arrivano i primi. Dopo un'accurata ricerca che ci ha visti coinvolti entrambi abbiamo trovato una minuscola striciolina di melanzana in mezzo agli spaghetti, che ci ha confermato che effettivamente erano gli spaghetti alla Norma che avevamo ordinato e non un ordine sbagliato di spaghetti al sugo con aggiunta di unto.

I tortellini invece erano fatti a mano, pure piccoli (la sfoglina era brava), e se si scolava via con cura il brodo erano anche buoni.

Finiti i primi, il cameriere ha chiesto "tutto bene?" e io, candidamente ho detto "buoni i tortellini, peccato per il brodo". Non l'ha presa bene. Ha malcelatamente represso un attacco d'ira ed è scappato con la scusa di andarlo a dire al cuoco. È tornato poco dopo con una supercazzola che aveva a che fare col fondo della pentola, alla quale non abbiamo dato molto peso. Ha avuto il buon gusto di non chiedere com'erano gli "spaghetti alla Norma".

Arrivano i secondi. Verdure arrostite ustionanti dentro, che solo il microonde le sa scaldare cosí bene, e farcite di pangrattato intriso in sgradevole unto. In compenso, braciola di maiale talmente secca che solo a guardarla mi si sono asciugate anche le lacrime dagli occhi.

La braciola ho provato a campionarla in vari punti ma alla fine l'ho dovuta lasciare nel piatto. Ho poi aiutato a finire le verdure ai grassi saturi, che in confronto con la braciola erano di-vi-ne.

Quando il cameriere è arrivato a portar via i piatti, un po' rassegnato ha chiesto "non andava bene la braciola?". Ho risposto "sta scherzando?". Mi ha detto "se vuole gliela faccio rifare". Ho detto "grazie, ma è meglio di no". Ha insistito. Ho insistito che portasse il conto.

Il conto ci ha messo un po' ad arrivare. Alla fine ci hanno fatto pagare tutto a prezzo pieno nonostante le lamentele e la braciola lasciata lí. Dopo vado a cercare l'email del feedback della guida Michelin.

Il conto:

  • 2 Coperto e pane: 5€ (il costo di una pizza marinara)
  • 2 Minestra: 19€ (8€ spaghetti conditi con pomodoro e unto strizzato delle melanzane al forno; 11€ tortellini buoni devastati da un brodo indegno)
  • 2 Pietanza: 20€ (10€ verdure al gratin di grassi saturi e 10€ braciola di ieri scaldata col fòn)
  • 1/2 Sangiovese della casa: 5€ (stica)
  • 1l Acqua gasata: 2.50€
  • Totale: 51.50€

Bagno senza carta igienica, con l'asse sconnesso.

Fossimo stati in centro a Venezia il mese d'Agosto, uno lo dà per posto truffa per turisti. Ma nella Bassa, non saper cuocere una braciola, è criminale.

Volevo chiamare il 112: "pronto, vorrei la polizia. Buonasera. Sono in un ristorante a San Giovanni, Circ.ne Italia, 20. Si. Per favore venite subito. Si. Stanno insultando il buon nome del MAIALE. Si, nella Bassa. San Giovanni in Persiceto, ha capito bene. Ok, bene, vi aspetto."

Posted Thu Sep 17 23:59:44 2009 Tags:

Igiene pubblica

Stamattina:

  • Andare in via Gramsci 12
  • Andare alla porta 10 piano terra
  • Prendere il numero bianco con bordo blu
  • Aspettare (due ore)
  • Fare vaccino
  • Andare alla porta 22 piano terra
  • Pagare
  • Tornare alla porta 10 piano terra
  • Consegnare la ricevuta
  • Ritirare il certificato

Mi sono sentito come la tartaruga del logo.

Posted Sat Jun 6 00:57:39 2009 Tags:

Cena di alta classe

Interessante cena di alta classe al ristorante Il Sole di Trebbo di Reno:

[...] tra gli chef di nuova generazione, i più apprezzati, hanno mutato nel corso degli anni la cucina abbinando i piatti della tradizione regionale italiana all'elevata aspirazione alla ricerca, alla sperimentazione mai eccessiva, all'intuito per le novità, facendo dell'equilibrio tra storia e innovazione culinaria l'asso nella manica di questa locanda.

Non ho linkato il loro sito perché è un malvagio monoblocco in Flash e per punizione è introvabile in Google. Si può trovare la loro e-mail, a cui ho scritto per chiedere del menú di capodanno, ma non hanno mai risposto.

Ci andai coi miei vari anni fa e ci lasciò un buon ricordo, quindi ci son tornato con morosa a vedere se ne vale ancora la pena.

Abbiamo mangiato (vado a memoria e non sono descrizioni esatte):

Primi

Io: Cappellacci di ricciola al sapore di funghi con gelato di tartufo nero.

Lei: Zuppa di riso con tortelli di spigola, cime di rapa e odore di arancio.

Secondi

Una porzione in due di: Branzino con scampi arrostiti.

Dolce

Millefoglie allo stracchino con sale affumicato e salsa di kaki.

Vini

Pinot di San Michele Appiano, purtroppo non ricordo la cantina.

OLIVARES Dulce Monastrell spagnolo. Abbiamo chiesto al cuoco quale regione della Spagna, ma non è mai tornato indietro a dircelo. Google compensa: Murcia.

Commenti

In attesa del primo

Mini porzione di ricciola cruda con limone caramellato, paté di olive taggiasca e pappa al pomodoro. Buone le varie parti anche se non formavano un insieme: un assaggino non stupefacente, ma interessante.

Cappellacci di ricciola al sapore di funghi con gelato di tartufo nero

La pasta dei cappellacci non era male, anche se il ripieno mancava o di sale o di sapore, e per percepirlo bisognava prestare molta attenzione.

I funghi, erano amari come il veleno.

Il gelato di tartufo nero, in onore al tartufo nero del tartufo aveva l'odore ma non il sapore. Il sapore era un vago dolcino.

Il sapore che è rimasto in bocca era lo spiacevole amaro dei funghi.

Zuppa di riso con tortelli di spigola, cime di rapa e odore di arancio

La zuppa aveva una consistenza vinilica e un sapore assente. Al primo assaggio il pensiero di entrambi è stato: "si offendono se chiedo del sale?".

I tortelli avevano una buona pasta, il ripieno era percettibile ma l'odore di pesce non era contrastato piú di tanto dall'arancio (che ci stava benino) ma dalle cime di rapa, sgradevolmente amare.

Il sapore che è rimasto in bocca era lo sgradevole amaro delle cime di rapa.

Branzino con scampi arrostiti

Al primo assaggio il branzino non era male, ma al secondo assaggio non diceva piú molto: un po' di sale io ce lo avrei messo. Lo scampo era buono, ma era servito in una specie di lana di patata nera fritta che sapeva di patatine fritte e ne copriva quasi interamente il sapore.

In attesa del dolce

Pallina di gelato alla fragola con sopra una fragola.

Il gelato era fatto con fragole buone, cosa che ho apprezzato, ma a parte gli ingredienti la struttura non ci ha stupiti: a Bologna ci si è abituati a standard di gelato molto alti.

In uno dei due piattini abbiamo trovato un corpo estraneo che stiamo ancora cercando di convincerci non fosse un'unghia. Mi pento di non aver chiamato il cameriere per spiegazioni, è l'abitudine da viaggiatore ad avere poche pippe per questo genere di cose.

Millefoglie allo stracchino con sale affumicato e salsa di kaki

Decisamente il top della serata, verrà ricordato per lungo tempo.

La sfoglia del millefoglie era un capolavoro: ottima da tutti i punti di vista: veramente un piacere.

A romperla col cucchiaio e a sentirla sotto i denti la bocca si aspetta, esige di trovarci in mezzo qualcosa di altrettanto grandioso, e ci rimane male quando ci trova, ebbene sí, dello stracchino che sa di stracchino.

Delusione. Ma non è finita, perché poi arriva il sale affumicato, e a quel punto si entra in un nuovo mondo. Giuro, non ho mai provato nulla di cosí creativamente rivoltante. Mi sono sforzato di finirlo per trovargli un senso, ma niente: se ci ripenso mi viene la nausea. È la prima volta che la parola migliore che trovo per descrivere un dessert è offensivo.

La salsa di cachi che decorava il piatto non era male, ma non poteva che assistere impotente alla tragedia.

Lo stomaco si sta ancora lamentando: non per la difficoltà di digestione, ma per la rabbia del vedere varie parti tutte di alta qualità unite nello scempio di una cacofonia cosí disgustosa.

È la prima volta che il mio stomaco non vuol digerire perché si sente preso in giro.

Dopo il dolce

Un assaggio di dolcetti piú o meno interessanti, su cui spiccava una ciotolina di crema che per entrambi aveva una netta dominante di cloro sotto la quale il sapore di uovo non era neanche male.

Servizio

  • Gli altri tavoli hanno avuto piú assaggetti di pane di noi: a noi solo dei grissini che a me sono parsi un po' vecchi e a lei troppo unti.
  • Vorrei vedere l'etichetta sulla bottiglia mentre mi si fa assaggiare il vino: e invece, era girata dall'altra parte.
  • Mi va bene se uno ha il menú in inglese ma non parla inglese, ma almeno voglio che mi si dia il tempo di tradurre. Non tollero di vedere una faccia irritata se chiedo al cameriere una pausa per tradurre.
  • Se ti chiedo da che parte della Spagna viene il vino da dessert e sei il cuoco che me lo ha suggerito, mi aspetto che tu lo sappia. Se mi dici che non lo sai e che vai a vedere, poi torna anche indietro a dirmelo.

Gli altri clienti

Uno dice magari non ci hanno presi sul serio perché venivamo da un pomeriggio a passeggio per il centro ed eravamo in jeans: come stile sarebbe da migliorare.

Però... però noi non andiamo a fumare in bagno (che poi puzza!) e dopo aver usato il mini asciugamanino di tela monouso lo riponiamo nella cesta degli asciugamanini di tela monouso usati, invece di ripiegarlo e rimetterlo bagnato nella pila di quelli puliti.

Prezzo

Cose che si possono fare allo stesso prezzo:

  • Una cena da Buriani o al Dolce e Salato.
  • Una cena a base di tartufo bianco alla trattoria La Rosa a Sant'Agostino.
  • Una signora mangiata di pesce per due persone alla Giara ad Altedo.
  • Pranzo a base di tartufo bianco a Cà Gabrielli al Corno alle Scale e giornata sulla neve, incluso benzina per il viaggio, skipass e nolo sci.

Questa entry di blog è stata scritta per rappacificarmi col mio stomaco, che prima di iniziare la digestione ha richiesto una chiara presa di posizione.

Ergo, chiara presa di posizione: stasera abbiamo mangiato male.

Vediamo se ora il mio stomaco mi permette di dormire.

Posted Sat Jun 6 00:57:39 2009 Tags:

Allucinazioni

Bologna, 14 dicembre 2005.

Stamattina andavo a Bologna in macchina (capita raramente, ma odio quando capita).

Al primo maggio tiro dritto per andare in via Colombo e la coda dell'occhio tira su un cartellone dei saldi da un capannone sulla destra.

C'è qualcosa di strano. Guardo meglio.

"Niente piú telefonate! Svendita totale per cambio gestione"

Sorrido: il capannone era "Il Mobile di Castel Maggiore".

Posted Sat Jun 6 00:57:39 2009 Tags:

Note sul talk del LinuxDay

Strumenti avanzati per il cazzeggio

Dal Linux Day 2005 di Bologna_.

Debian GNU/Linux è un sistema stabile, sicuro, completo e che risolve qualsiasi tipo di esigenza. Ma quando tutte le esigenze sono risolte? Quando tutti i bisogni sono appagati? Quando tutti i nostri sistemi funzionano affidabili e non possiamo neanche sentirci impegnati supervisionando il pirolare di un defrag? Ci servono maniere creative, geniali, totalmente inutili per perdere il nostro tempo. Fortunatamente, in questo talk ne vedremo un bel po'.

Questi sono gli appunti che ho usato per il talk. Frammentari, ma dovrebbero dare l'idea. Alla pagina del Linux Day 2005 di Bologna si trovano le registrazioni audio e anche quelle video quando saranno pronte.

Introduzione

Definizione

Lo definirei come una maniera di impiegare il tempo che sia creativa, ma soprattutto inutile. Inutile almeno secondo i canoni del rituale corrente e di massa della società, che impongono che le uniche cose utili sono quelle che richiedono ansia e fatica.

Il cazzeggio nella storia

Le piramidi.

Il cazzeggio nella letteratura

Il cazzeggio nell'arte

  • I Dadaisti (di nuovo)
  • Piero Manzoni
  • Marti Guixé

Nella scienza

  • I premi Ignobel

Parte 1: Software a linea di comando

Tool di base

  • sl
  • an (e poi sbagliarsi con man)
  • tama
  • vigor
  • an, wordplay
  • sysvbanner
  • dpkg -L bsdgames | grep /usr/games
    • bcd, ppt, morse
    • countmail
    • hangman
    • number
    • pig (man pig)
    • pom
    • quiz
    • random
    • wargames
  • robotfindskitten
  • fortune

Tool avanzati

Comandi:

dpkg -L filters
formail -I "" -s < .mail/debian-legal | dadadodo -
polygen bloccotraffico | lynx -dump -stdin
polygen uforobot | lynx -dump -stdin | grep -v '^$' | cowsay
for i in *.cow; do echo $i | cowsay -f `basename $i .cow`; done | less
polygen pythoniser | fmt | b1ff | cowsay -f eyes
polygen -X 50 unieuro | dadadodo - | festival --tts --language italian
polygen screensaver
randtype
bogosort

Parte 2: Software grafici

Tool di base

  • cappuccino
  • Orologi
    • sunclock, daliclock, xarclock -update 1
    • xearth
    • xplanet con gadgets
  • xteddy
  • kodo
  • xdesktopwaves
  • xlaby
  • xlaby + kodo

Tool avanzati

xscreensaver, xscreensaver-gl, rss-glx:

/usr/lib/xscreensaver/noseguy -program "polygen unieuro"
mkfifo pippo
tail -f pippo | festival --tts --language italian
/usr/lib/xscreensaver/noseguy -program "polygen unieuro|tee /home/enrico/pippo |fmt"
ll /usr/lib/xscreensaver
phosphor -program bash
phosphor -program ’xscreensaver-text | tee /dev/stderr | festival --tts’
(come si velocizza?)
phosphor -program 'polygen -X 50 unieuro | dadadodo - | tee /dev/stderr | festival --tts --language italian'
matrixview

Parte 3: Cazzeggio con strumenti seri

  • Quasi seri
    • debtags search game::toys
    • xtartan -list
    • gdesklets gkrellm
  • Seri
    • mappa caratteri + ctrl-shift
    • guppi
    • LDAP -- GEEZ! Multisync can do LDAP synchronization! I could add a new user in my mobile phone and have a UNIX account automatically created for it! :)
    • graphviz

Conclusione

  • xfs_fsr
Posted Sat Jun 6 00:57:39 2009 Tags:

Linux Day 2006

Fico!

L'anno scorso siamo finiti sui giornali, quest'anno addirittura al TG1, e con un ottimo servizio, pure.

Belli i talk a Bologna, che hanno avuto un taglio meno da programmatori e piú da creativi: è stato molto bello vedere come lavorano con Linux un musicista e un fotografo.

Mezzo fallito invece il LIP (Linux Installation Party), che ha visto pochi partecipanti. La spiegazione che va per la maggiore è che ormai non serve piú un gran aiuto per installare Linux, e salvo in quei computer che richiedono riti voodoo e santini di Sgala, le distribuzioni moderne van su da sole.

Mi piacciono i talk creativi: in Venezuela ho visto un talk fatto da un professionista di Blender del progetto Plumiferos: spettacolare!

Spettacolare allo stesso modo vedere ieri Daniele usare Ardour e un mare di altri sintetizzatori, effetti e periferiche Midi.

Obbligatoria nota cazzeggio: durante la cena abbiamo creato una grammatica polygen per generare nomi di ditte di informatica italiane. Per esempio:

  • Sporcogel
  • Fossyjet
  • Plottigel
  • Polpydent
  • Pulicyd
  • Pulysnell
  • Pulytel

A quando il prossimo talk di sera?

Aggiunta: articolo su Linux.com.

Posted Sat Jun 6 00:57:39 2009 Tags:
Posted Sat Jun 6 00:57:39 2009