Pages about Debian.
A proposal to solve gender imbalance in Debian
We've done all we can so far: Debian Women, the Diversity Statement, the anti-harassment contact, gender neutral language, lots of education all round, but we still suffer from a strong gender imbalance in Debian.
I think that the reason is that the majority group of cisgender men in the project, although they don't actively work /against/ the rest, still have /no incentive/ to be inclusive, and generally do not understand what bearing a female name online is like.
I think it is about time we addressed that, and after a lot of thinking and discussing with many other concerned debianers, I think I have just the right proposal, which is twofold.
The first part is this: since the goal is to have an equal gender perception in Debian, we can just decide to only approve one obviously-male-named DD for every obviously-female-named one. That's right: no new obviously-male-named DD unless an obviously-female-named DD has just been approved.
It may sound like affirmative action gone wild, but please stop a moment to think about it: this would create precisely the right incentive for the currently dominant group of developers to be inclusive! People shouldn't just assume they can get a Debian account regardless of what happens around them. We already ask NM candidates to fix RC bugs and, well, gender imbalance should be treated as an RC bug, and everyone should feel compelled to join the effort to fix it!
Now, of course since there currently are many male names in NM but not a single female name, it would not be reasonable to just stop the flow of new developers into the project: that would just have the effect to make us starve on personpower.
So here is the second part of the proposal: the one-female-name-one-male-name policy will not be enforced for, say, a year. But during this year, everyone currently in NM or joining NM will be asked to adopt a female identity.
Crazy? No, genius! It's about time people understood what it means to get advances in private every time they make a public contribution! What better way than just trying it out for themselves?
On top of that, as more and more female names appear in Debian changelogs, fake or not, people will finally start to understand that it does not matter what name is used to sign the contribution, but the contribution itself.
I don't know if this will give us a community where people realise female or male contributions are equally valuable, which is what I hope, or a community where people will think that everyone is a cisgender man even if they have female names. In the end, really, it does not matter. Either way, we finally get to have a community where everyone is /guaranteed/ to be treated the same.
But gender imbalance isn't the only imbalance we have in Debian. People accrue a reputation over time, good or bad, and this reputation tends to stick on you for years, regardless of how you may change, for better or for worse. When we evaluate the merit a contribution, we should not be biased by the reputation of the contributor! How can new contributors be taken seriously otherwise? I believe we are loosing lots of fresh, good ideas this way. And how much damage could be wrecked on the project by a well-respected contributor, like a Debian Account Manager, who is having a funny day?
I think we can address this just as I propose to address gender imbalance: let's swap identities from time to time, like it usually happens with nametags at the end of Debconfs. Let's see gregoa upload a patched versions of python3.2, and enrico upload a new upstream version of eglibc! See if we won't finally have some peer review at last!
Reputation and real identities have many merits, but we have come to rely too much on them, and it is hurting us. It is time we did something about it, before it is too late!
Thanks for the group hug!
Francesca started a DPL game and I've been mentioned a few times, by people I like deeply. Thank you! However I don't intend to run, and I hope I won't disappoint those who nominated me by saying so. But I don't think of it in terms of letting people down: I can't let anyone down since I never mentioned I'd like to run in the first place. Rather, I like to think that I've just received a wonderful group hug, and hey, wow, come here and let me hug you back! <3
And let me hug some more:
- Gregor Herrmann, I'm in a constant state of awe for what you have done with the Perl team. If you think that some of what you did could also be done as a DPL, please go for it!
- Bdale Garbee, I didn't think there could be such a thing as a reliable source of common sense until I got to know you. And you've been DPL and quite a good deal of other things and definitely know the drill. And you can't say you're busy with your day job. The last thing I want is for you to get bored after your retirement! :P
- Paul Tagliamonte, you seem to bring an incredible energy in everything you get involved with. Let's admit it: Zack's been a perfect DPL in many aspects, he has been incredibly, inhumanly smooth, but he was a bit boring. Zack has left Debian as a perfectly working train steaming on professionally towards awesomeness. Let's give it colours! Let's give it excitement! Let's give it creativity, and silliness! We need someone to cheer Christian up, and I if you can't do it, I can't think of anyone else who can!
I cannot think of a fourth DPL candidate right now and I don't want to postpone
this post indefinitely. Think about it this way: you three are so good I can't
think of a fourth one right now 
There are actually lots of people I admire in Debian. I tried to name a few without thinking, but I wasn't thinking so I lost count as soon as I ran out of fingers. I know however that many enjoy to stay out of the spotlight and keep their fun focused on a few specific things. I am one of those myself.
Oh dear, FOSDEM was too short, can I have DebConf soon?
In the meantime, let's have some fun with the DPL campaign.
On praising people, and on success
This morning I was pointing out to friends how excellent is mako's post on Aaron Swartz, and I thought it'd be nice if we didn't have to wait for people to die before telling the world how awesome and inspirational they are.
Then Russ posted an article about work, success and motivation and I went to tell my friends how awesome and inspirational he is.
I, too, see myself as somehow successful, and I, too, don't identify in the usual stereotype of success. I don't want to stop being a craftsman to become a manager, I don't get a high from having power over other people, I don't define my value in terms of my profits.
At a glance, people don't see me as successful, until they get to know me better. They they realise that I'm not at all unhappy about my life.
I have a job that I like, I write Free Software and it gets used and appreciated, my colleagues are friends, who respect me and my opinion, and I respect them and theirs.
I can work from home. In fact, I can work from everywhere as long as I have my laptop with me. I can sustain a long distance relationship because I can work from the house of my partner when I'm visiting. Two days ago I worked from the bar of a farm on top of a hill, because I was on the road, it was close by, and what the hell, it's a wonderful place to be.
To me success means that I can care about the quality of my life, that I have the luxury of caring about little things that make my day, of trying to make good ideas sustainable, of working a bit more when I'm on fire, and of working a bit less when there's something wonderful in the world to see, or someone interesting in the world to meet.
Russ, the way I read your article, you are questioning what "success" means, and you are spot on. People should be able to define "success" as whatever works for them and pursue it freely. Only then success becomes something that is worth praising when it is achieved. Only then it becomes inspirational.
I like how you managed to put into words something that has been for a long time in some corner of my mind and I hadn't yet managed or bothered to bring into the spotlight.
You have the insight and the confidence of seeing something in an insightful but non-mainstream way, and say "you know what? That actually makes sense."
Sometimes I read your post, nod a lot and realise how important something actually is, how that is actually such an important part of myself. And now that you took it out for me to see it, I can appreciate how valuable it is, and make sure I don't accidentally lose it.
Thanks! That's another one I owe you. It's just the kind of thing I shouldn't wait before letting you know.
Evolution's old odd mail folders to mbox
Something wrong happened in my dad's Evolution. It just would get stuck checking mail forever, with no useful diagnostic that I could find. Fun. Not.
Anyway, I solved by resetting everything to factory defaults, moving away all gconf entries and .evolution/ files. Then it started to work again, of course then I needed to reconfigure it from scratch.
It turned out however that some old mail was only archived locally, and in a kind of weird format that looks like this:
$ ls -la Enrico/
total 336
drwx------ 2 enrico enrico 4096 Jul 23 03:05 .
drwxr-xr-x 7 enrico enrico 4096 Jul 23 03:12 ..
-rw------- 1 enrico enrico 3230 Dec 4 2010 113.HEADER
-rw------- 1 enrico enrico 14521 Dec 4 2010 113.TEXT
-rw------- 1 enrico enrico 3209 Oct 22 2010 134.HEADER
-rw------- 1 enrico enrico 2937 Oct 22 2010 134.TEXT
-rw------- 1 enrico enrico 3116 Jun 27 2011 15.
-rw------- 1 enrico enrico 3678 Jun 27 2011 168.
-rw------- 1 enrico enrico 73 Apr 27 2009 22.1.MIME
-rw------- 1 enrico enrico 3199 Apr 27 2009 22.2
-rw------- 1 enrico enrico 88 Apr 27 2009 22.2.MIME
[...]
I couldn't even find the name of that mail folder layout, let alone conversion tools. So I had to sit down and waste my sunday break writing software to convert that to a mbox file. Here's the tool, may it save you the awful time I had today: http://anonscm.debian.org/gitweb/?p=users/enrico/evo2mbox.git
Note: feel free to fork it, or send patches, but don't bother with feature requests. Evolution isn't and won't be a personal interest of mine. Anything that makes an afternoon at my parents more tiresome than a whole busy month of paid work, doesn't deserve to be.
Luckily they now seem to have changed the local folder format to Maildir.
Giving away distromatch
at last year's Fosdem I tried to inject a lot of energy into distromatch but shortly afterwards I've had to urgently rewrite the nm.debian.org website.
After Lars Wirzenius GTDFH talks in Bologna and Varese I wrote a tool which, among other things, is able to scan my home dir and list how many projects I'm working on.
The output was scary. Like, they are too many. Like, I couldn't even recite the list out of memory. And since I couldn't do that, I had no idea there were so many. And I kept being stressful because I couldn't manage to take care of them all properly.
Now that I became conscious of the situation, it's time to deal with it like a grown up, and politely back off from some of my irresponsible responsibilities.
Distromatch is one of them. It had just started as a proof of concept prototype, and I had the vision that it could be the basis for a fantastic culture of sharing and exchange of information across distributions.
I need to distinguish the vision from the responsibility. I still have that vision for distromatch, but I cannot take responsibility for making it happen.
So I am giving it up to anyone who has the time and resources to pick up that responsibility.
Current status
It works well enough as a prototype. I believe it can successfully map a large enough slice of packages, that one can prototype stuff based on it.
I have for example used it to export the Debtags categories for other distros, and the resulting file looked big enough to be used for prototyping category-based features on distributions that don't have them yet.
I think it also works well enough to support a few common use cases, like sharing screenshots, or doing most of the work of converting dependency lists from a distro to another.
And finally, anyone can deploy it, and work on it.
Existing data sources
Everything I index in the Debian distromatch deployment is available at http://dde.debian.net/exports/distromatch/. The rpm-based data in there comes from an export script I wrote that runs on Sophie, but which I cannot maintain properly.
This is an experimental export of Fedora and OpenSUSE data: http://tmp.vuntz.net/misc/distromatch/distromatch-opensuse-fedora.tar
All existing export scripts are found in distromatch git repo on gitorious.
Contacts I gathered at Fosdem
At Fosdem I devoted quite some work to get contacts from all possible distributions and software repositories, so that distromatch could be hooked into them. Here is a dump of what I have collected:
- Debian: me
- OpenSuse: Vincent Untz and Adrian Schröter
- Fedora: Tom "Spot" Callaway
- Arch: Tasser on IRC
- CPAN: contact the people of https://metacpan.org/, on
irc.perl.org:#metacpanor make an issue on github - NetBSD: ask on
#netbsdon Freenode - FreeBSD: Baptiste Daroussin (bapt)
- Mageia: Olivier Thauvin
Some of those contacts may have "expired" in the meantime: I wouldn't assume all of them still remember talking with me, although most probably still do.
My commitment for the time being
I am happy to commit, at the moment, to maintaining a working data export for Debian data. I can take responsibility for making it so that the Debian data for it stays up to date, and to fix it asap if it isn't the case.
I hope that now someone can take distromatch over from me, and make it grow to achieve its great potential.
More diversity in Debian skills
This blog post has been co-authored with Francesca Ciceri.
In his Debconf talk, zack said:
We need to understand how to invite people with different backgrounds than packaging to join the Debian project [...] I don't know what exactly, but we need to do more to attract those kinds of people.
Francesca and I know what we could do: make other kinds of contributions visible.
Basically, we should track and acknowledge the contributions of webmasters, translators, programmers, sysadmins, event organisers, and so on, at the same level as what we do for packagers: DDPO, minechangelogs, Portfolio...
For any non-packaging activity that we can make visible and credited, we get:
to acknowledge the people who do it, and show that they are active contributors in the project;
to acknowledge the work that gets done, and show the actual amount of non-packaging work that gets done in Debian every day;
to allow non-packagers to have a reputation, too: first of all, they deserve it, and among other things, it would make nm processing trivial.
Here's an example: who's the lead translator for German? And if you are German, who's the lead translator for Spanish? Czech? Thai? I (Enrico) don't know the answers, not even for Italian, but we all should! Or at least it should be trivial to find out.
To start to change this, is just a matter of programming.
Francesca already worked on a list of trackable data sources, at least for translators.
Here are some more details, related to translation:
Translations can be tracked via the i18n robot (and relative statistics). This works only with teams who activated the robot and actively use the pseudo-urls in their messages on localisation mailing lists. Some translators don't bother to do it but it's ok to only support the main workflow. It beats extracting .po files from
l10n-tagged BTS bugs at any rate.DPN and website translations: for wml pages there's a specific field to be extracted for each translated page: grep for
maintainer="name"on normal wml pages, while for DPN translations we have a specifictranslator="name"field. The problem is that this field is not mandatory, so sometimes there's no indication of the maintainer. Again, it's ok to only support the main workflow.Anyway, this is preferable to the cvs log: often the commit is done by the coordinator of the team and not by the actual translator. See above for the alternative solution of using the statistics provided by the i18n bot.
DDTSS: since the new release of DDTSS-Django, done by Martijn van Oosterhout about a year ago, the contributions are by default non-anonymous. This should be easy to track.
http://wiki.debian.org: it is more complicated because in the wiki we do not have a proper l10n translation workflow, so the only thing that can be tracked are changelogs
$LANG/*pages. A nice idea would be to have translated pages list the version of the page that was translated and who did the translation.translation of debian manuals and release notes: usually in the translation of manuals and long documentation there is a specific translator field.
And here are some notes about other fields:
DPN editors: for each issue there's a list of editors at the bottom of the page. In the wml: grep for
editor=.Artwork: artwork submitted via debianart are easy to track on the portal. Anyway usually you can find the author in the license and copyright file.
Programming: the only thing we have is the list of services which can be expanded if needed.
Press and publicity: there seems to be not much besides svn logs.
l10n-english: The Smith Review Project page has some tracking links. Other activities can probably only be tracked, at the moment, via mailing list activity.
Events: we can use the "main coordinator" field on
www.debian.org/events/$year/$date-$eventname.wml: grep for<define-tag coord>; for events not published on the http://www.debian.org, but only on http://wiki.debian.org, the coordinator or the contact for the event is usually present on the page itself.Sysadmins: we haven't asked DSA.
And finally, if you are still wondering who those translation coordinators are, they are listed here, although not all teams keep that page up to date.
Of course, when a data source is too hard to mine, it can make sense to see if the workflow could be improved, rather than spending months writing compicated mining code.
This is a fun project for people at Debconf to get together and try.
If by the end of the conference we had a way to credit some group of non-packaging contributors, even if just one like translators or website contributors, at least we would finally have started having official trackers for the activities of non-packagers.
Resolving IP addresses in vim
A friend on IRC said: "I wish vim had a command to resolve all the IP addresses in a block of text".
But it does:
:<block>!perl -MSocket -pe 's/(\d+\.\d+\.\d+\.\d+)/gethostbyaddr(inet_aton($1), AF_INET)/ge'
If you use it often, put the perl command in a one-liner script and call it an editor macro. It works on other editors, too, and even without an editor at all. And it can be scripted!
We live with the power of Unix every day, so much that we risk forgetting how awesome it is.
SQLAlchemy, MySQL and sql_mode=traditional
As everyone should know, by default MySQL is an embarassing stupid toy:
mysql> create table foo (val integer not null);
Query OK, 0 rows affected (0.03 sec)
mysql> insert into foo values (1/0);
ERROR 1048 (23000): Column 'val' cannot be null
mysql> insert into foo values (1);
Query OK, 1 row affected (0.00 sec)
mysql> update foo set val=1/0 where val=1;
Query OK, 1 row affected, 1 warning (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 1
mysql> select * from foo;
+-----+
| val |
+-----+
| 0 |
+-----+
1 row in set (0.00 sec)
Luckily, you can tell it to stop being embarassingly stupid:
mysql> set sql_mode="traditional";
Query OK, 0 rows affected (0.00 sec)
mysql> update foo set val=1/0 where val=0;
ERROR 1365 (22012): Division by 0
(There is an even better sql mode you can choose, though: it is called "Install PostgreSQL")
Unfortunately, I've been hired to work on a project that relies on the embarassing stupid behaviour of MySQL, so I cannot set sql_mode=traditional globally or the existing house of cards will collapse.
Here is how you set it session-wide with SQLAlchemy 0.6.x: it took me quite a while to find out:
import sqlalchemy.interfaces
# Without this, MySQL will silently insert invalid values in the
# database, causing very long debugging sessions in the long run
class DontBeSilly(sqlalchemy.interfaces.PoolListener):
def connect(self, dbapi_con, connection_record):
cur = dbapi_con.cursor()
cur.execute("SET SESSION sql_mode='TRADITIONAL'")
cur = None
engine = create_engine(..., listeners=[DontBeSilly()])
Why does it take all that effort is beyond me. I'd have expected this to be turned on by default, possibly with a switch that insane people could use to turn it off.
Debtags for derivative distributions
Sometimes I do cool stuff and I forget to announce it.
Ok, so I recently announced a new Debtags website.
I forgot to say in the announcement that the new website does not only know of Debian packages: see for example this page, at the very bottom it says: "Distributions: oneiric, precise, sid, testing".
This means that already, here and now, debtags.debian.net can be used to tag packages from both Debian and Ubuntu, and can easily be extended to cover the entire Debian ecosystem.
If you are a package maintainer, you will notice that your maintainer page shows your packages from everywhere. If you want to filter things a bit, for example hide obsolete packages from an old Debian Stable or Ubuntu LTS, just click on the "Settings" link on the top right to configure the page.
How it works
The magic is in this mergepackages script, which is run daily, and exports merged Packages files at dde.debian.net. The debtags.debian.net concept of Packages and Sources files are just those all-merged.gz and all-merged-sources.gz.
The merging is simple: that rebuild script processes files in order, and the first version of a package that is found is chosen as the base for the one that will go in the merged Packages file. Some fields like "Description" are just taken from this pivot package, others like Architecture or dependencies are merged into it. It's arbitrary, but works for me: the result has all the packages with all their possible architectures and dependencies, and is ready to be indexed with apt-xapian-index.
At the moment I pull data from Debian and Ubuntu, but you can see that the script can easily be extended to pull data from any Debian-style ftp archive, so any Debian derivative can go in. I've already started negotiations with the Derivatives Census on how to add any Debian derivative and keep the list up to date.
How to export tags for your own distribution
I'll use Ubuntu as an example since the data is already available.
The way you add Debtags to the Ubuntu packages file is just this one:
- Get the full reviewed tag database
- Optionally filter out those packages that you are not interested in
- Tweak this script to build an overrides file.
- Give the overrides file to your favourite ftp archive building tool.
The make-overrides is a bit rusty: if you improve it, please send me your
changes.
That is it, nothing else required, no excuses, it's ready, here, now!
Hitches and gotchas
This merged Packages file is a bit of a hack, and suffers from name conflicts across distributions, where two different softwares are packaged in two different distributions with the same name.
Ideally, name conflicts should not happen: if a derivative decided to package
kate and call it gedit, they deserve to have it tagged uitoolkit::gtk.
I think it's rather important that the whole Debian ecosystem works as much as
possible with a single package namespace.
However, that reasoning fails if you take time into account: packages get
renamed, like git and chromium, and may mean completely different things,
for example, if you compare Debian Stable with Debian Sid.
This last is a problem caused by debtags only working with package names but not package versions. I have a strategy in mind based on being able to override the stable tag database using headers in debian/control; it still needs some details sorted out, but I'm confident we will be able to address these issues properly soon enough.
Why stop at the Debian ecosystem?
Why indeed. I'm clearly trying to use FOSDEM, and the CrossDistribution devroom as the venue to discuss just that.
Python list gotcha
Suppose in python you're building a list of buckets:
>>> a = [[]] * 10 >>> print a [[], [], [], [], [], [], [], [], [], []]
Looks good. However:
>>> a[5].append(1) >>> print a [[1], [1], [1], [1], [1], [1], [1], [1], [1], [1]]
Surprising? What happens here is that multiplying the list replicates the reference to the same empty list. You have the exact same mutable list replicated 10 times: instead of 10 buckets, you have 10 references to 1 bucket: therefore if appending to one it looks like one appends to all.
What you need here is a way to invoke the list constructor [] multiple times:
>>> a = [[] for i in range(10)] >>> print a [[], [], [], [], [], [], [], [], [], []] >>> a[5].append(1) >>> print a [[], [], [], [], [], [1], [], [], [], []]
a mistake like this can take quite a bit of time to track down.