Latest posts for tag debian

Further reading

Talk notes

Intro

  • I'm not speaking for the whole of DAM
  • Motivation in part is personal frustration, and need to set boundaries and negotiate expectations

Debian Account Managers

  • history

Responsibility for official membership

  • approve account creation
  • manage the New Member Process and nm.debian.org
  • close MIA accounts
  • occasional emergency termination of accounts
  • handle Emeritus
  • with lots of help from FrontDesk and MIA teams (big shoutout)

What DAM is not

  • we are not mediators
  • we are not a community management team
  • a list or IRC moderation team
  • we are not responsible for vision or strategic choices about how people are expected to interact in Debian
  • We shouldn't try and solve things because they need solving

Unexpected responsibilities

  • Over time, the community has grown larger and more complex, in a larger and more complex online environment
  • Enforcing the Diversity Statement and the Code of Conduct
  • Emergency list moderation
    • we have ended up using DAM warnings to compensate for the lack of list moderation, at least twice
  • contributors.debian.org (mostly only because of me, but it would be good to have its own team)

DAM warnings

  • except for rare glaring cases, patterns of behaviour / intentions / taking feedback in, are more relevant than individual incidents
  • we do not set out to fix people. It is enough for us to get people to acknowledge a problem
    • if they can't acknowledge a problem they're probably out
    • once a problem is acknowledged, fixing it could be their implementation detail
    • then again it's not that easy to get a number of troublesome people to acknowledge problems, so we go back to the problem of deciding when enough is enough

DAM warnings?

  • I got to a point where I look at DAM warnings as potential signals that DAM has ended up with the ball that everyone else in Debian dropped.
  • DAM warning means we haven't gotten to a last resort situation yet, meaning that it probably shouldn't be DAM dealing with this at this point
  • Everyone in the project can write a person "do you realise there's an issue here? Can you do something to stop?", and give them a chance to reflect on issues or ignore them, and build their reputation accordingly.
  • People in Debian should not have to endure, completey powerless, as trolls drag painful list discussions indefinitely until all the trolled people run out of energy and leave. At the same time, people who abuse a list should expect to be suspended or banned from the list, not have their Debian membership put into question (unless it is a recurring pattern of behaviour).
  • The push to grow DAM warnings as a tool, is a sign of the rest of Debian passing on their responsibilities, and DAM picking them up.
  • Then in DAM we end up passing on things, too, because we also don't have the energy to face another intensive megametathread, and as we take actions for things that shouldn't quite be our responsibility, we face a higher level of controversy, and therefore demotivation.
  • Also, as we take actions for things that shouldn't be our responsibility, and work on a higher level of controversy, our legitimacy is undermined (and understandably so)
    • there's a pothole on my street that never gets filled, so at some point I go out and fill it. Then people thank me, people complain I shouldn't have, people complain I didn't fill it right, people appreciate the gesture and invite me to learn how to fix potholes better, people point me out to more potholes, and then complain that potholes don't get fixed properly on the whole street. I end up being the problem, instead of whoever had responsibility of the potholes but wasn't fixing them
  • The Community Team, the Diversity Team, and individual developers, have no energy or entitlement for explaining what a healthy community looks like, and DAM is left with that responsibility in the form of accountability for their actions: to issue, say, a DAM warning for bullying, we are expected to explain what is bullying, and how that kind of behaviour constitutes bullying, in a way that is understandable by the whole project.
  • Since there isn't consensus in the project about what bullying loos like, we end up having to define it in a warning, which again is a responsibility we shouldn't have, and we need to do it because we have an escalated situation at hand, but we can't do it right

House rules

Interpreting house rules

  • you can't encode common sense about people behaviour in written rules: no matter how hard you try, people will find ways to cheat that
  • so one can use rules as a guideline, and someone responsible for the bits that can't go into rules.
    • context matters, privilege/oppression matters, patterns matter, histor matters
  • example:
    • call a person out for breaking a rule
    • get DARVO in response
    • state that DARVO is not acceptable
    • get concern trolling against margninalised people and accuse them of DARVO if they complain
  • example: assume good intentions vs enabling
  • example: rule lawyering and Figure skating
  • this cannot be solved by GRs: I/we (DAM)/possibly also we (Debian) don't want to do GRs about evaluating people

Governance by bullying

  • How to DoS discussions in Debian
    • example: gender, minority groups, affirmative action, inclusion, anything about the community team itself, anything about the CoC, systemd, usrmerge, dam warnings, expulsions
      • think of a topic. Think about sending a mail to debian-project about it. If you instinctively shiver at the thought, this is probably happening
      • would you send a mail about that to -project / -devel?
      • can you think of other topics?
    • it is an effective way of governance as it excludes topics from public discussion
  • A small number of people abuse all this, intentionally or not, to effectively manipulate decision making in the project.
  • Instead of using the rules of the community to bring forth the issues one cares about, it costs less energy to make it unthinkable or unbearable to have a discussion on issues one doesn't want to progress. What one can't stop constructively, one can oppose destructively.
  • even regularly diverting the discussion away from the original point or concern is enough to derail it without people realising you're doing it
  • This is an effective strategy for a few reckless people to unilaterally direct change, in the current state of Debian, at the cost of the health and the future of the community as a whole.
  • There are now a number of important issues nobody has the energy to discuss, because experience says that energy requirements to bring them to the foreground and deal with the consequences are anticipated to be disproportionate.
  • This is grave, as we're talking about trolling and bullying as malicious power moves to work around the accepted decision making structures of our community.
  • Solving this is out of scope for this talk, but it is urgent nevertheless, and can't be solved by expecting DAM to fix it

How about the Community Team?

  • It is also a small group of people who cannot pick up the responsibility of doing what the community isn't doing for itself
  • I believe we need to recover the Community Team: it's been years that every time they write something in public, they get bullied by the same recurring small group of people (see governance by bullying above)

How about DAM?

  • I was just saying that we are not the emergency catch all
  • When the only enforcement you have is "nuclear escalation", there's nothing you can do until it's too late, and meanwhile lots of people suffer (this was written before Russia invaded Ukraine)
  • Also, when issues happen on public lists, the BTS, or on IRC, some of the perpetrators are also outside of the jurisdiction of DAM, which shows how DAM is not the tool for this

How about the DPL?

  • Talking about emergency catch alls, don't they have enough to do already?

Concentrating responsibility

  • Concentrating all responsibility on social issues on a single point creates a scapegoat: we're blamed for any conduct issue, and we're blamed for any action we take on conduct issues
    • also, when you are a small group you are personally identified with it. Taking action on a person may mean making a new enemy, and becoming a target for harassment, retaliation, or even just the general unwarranted hostility of someone who is left with an axe to grind
  • As long as responsibility is centralised, any action one takes as a response of one micro-aggression (or one micro-aggression too many) is an overreaction. Distributing that responsibility allows a finer granularity of actions to be taken
    • you don't call the police to tell someone they're being annoying at the pub: the people at the pub will tell you you're being annoying, and the police is called if you want to beat them up in response
  • We are also a community where we have no tool to give feedback to posts, so it still looks good to nitpick stupid details with smart-looking tranchant one-liners, or elaborate confrontational put-downs, and one doesn't get the feedback of "that did not help". Compare with discussing https://salsa.debian.org/debian/grow-your-ideas/ which does have this kind of feedback
    • the lack of moderation and enforcement makes the Debian community ideal for easy baiting, concern trolling, dog whistling, and related fun, and people not empowered can be so manipulated to troll those responsible
    • if you're fragile in Debian, people will play cat and mouse with you. It might be social awkwardness, or people taking themselves too serious, but it can easily become bullying, and with no feedback it's hard to tell and course correct
  • Since DAM and DPL are where the ball stops, everyone else in Debian can afford to let the ball drop.
  • More generally, if only one group is responsible, nobody else is

Empowering developers

  • Police alone does not make a community safe: a community makes a community safe.
  • DDs currently have no power to act besides complaining to DAM, or complaining to Community Team that then can only pass complaints on to DAM.
    • you could act directly, but currently nobody has your back if the (micro-)aggression then starts extending to you, too
  • From no power comes no responsibility. And yet, the safety of a community is sustainable only if it is the responsibility of every member of the community.
  • don't wait for DAM as the only group who can do something
  • people should be able to address issues in smaller groups, without escalation at project level
  • but people don't have the tools for that
  • I/we've shouldered this responsibility for far too long because nobody else was doing it, and it's time the whole Debian community gets its act together and picks up this responsibility as they should be. You don't get to not care just because there's a small number of people who is caring for you.

What needs to happen

  • distinguish DAM decisions from decisions that are more about vision and direction, and would require more representation
  • DAM warnings shouldn't belong in DAM
  • who is responsible for interpretation of the CoC?
  • deciding what to do about controversial people shouldn't belong in DAM
  • curation of the community shouldn't belong in DAM
  • can't do this via GRs, it's a mess to do a GR to decide how acceptable is a specific person's behaviour, and a lot of this requires more and more frequent micro-decisions than one'd do via GRs

Back in 2017 I did work to setup a cross-building toolchain for QT Creator, that takes advantage of Debian's packaging for all the dependency ecosystem.

It ended with cbqt which is a little script that sets up a chroot to hold cross-build-dependencies, to avoid conflicting with packages in the host system, and sets up a qmake alternative to make use of them.

Today I'm dusting off that work, to ensure it works on Debian bullseye.

Resetting QT Creator

To make things reproducible, I wanted to reset QT Creator's configuration.

Besides purging and reinstalling the package, one needs to manually remove:

  • ~/.config/QtProject
  • ~/.cache/QtProject/
  • /usr/share/qtcreator/QtProject which is where configuration is stored if you used sdktool to programmatically configure Qt Creator (see for example this post and see Debian bug #1012561.

Updating cbqt

Easy start, change the distribution for the chroot:

-DIST_CODENAME = "stretch"
+DIST_CODENAME = "bullseye"

Adding LIBDIR

Something else does not work:

Test$ qmake-armhf -makefile
Info: creating stash file …/Test/.qmake.stash
Test$ make
[...]
/usr/bin/arm-linux-gnueabihf-g++ -Wl,-O1 -Wl,-rpath-link,…/armhf/lib/arm-linux-gnueabihf -Wl,-rpath-link,…/armhf/usr/lib/arm-linux-gnueabihf -Wl,-rpath-link,…/armhf/usr/lib/ -o Test main.o mainwindow.o moc_mainwindow.o   …/armhf/usr/lib/arm-linux-gnueabihf/libQt5Widgets.so …/armhf/usr/lib/arm-linux-gnueabihf/libQt5Gui.so …/armhf/usr/lib/arm-linux-gnueabihf/libQt5Core.so -lGLESv2 -lpthread
/usr/lib/gcc-cross/arm-linux-gnueabihf/10/../../../../arm-linux-gnueabihf/bin/ld: cannot find -lGLESv2
collect2: error: ld returned 1 exit status
make: *** [Makefile:146: Test] Error 1

I figured that now I also need to set QMAKE_LIBDIR and not just QMAKE_RPATHLINKDIR:

--- a/cbqt
+++ b/cbqt
@@ -241,18 +241,21 @@ include(../common/linux.conf)
 include(../common/gcc-base-unix.conf)
 include(../common/g++-unix.conf)

+QMAKE_LIBDIR += {chroot.abspath}/lib/arm-linux-gnueabihf
+QMAKE_LIBDIR += {chroot.abspath}/usr/lib/arm-linux-gnueabihf
+QMAKE_LIBDIR += {chroot.abspath}/usr/lib/
 QMAKE_RPATHLINKDIR += {chroot.abspath}/lib/arm-linux-gnueabihf
 QMAKE_RPATHLINKDIR += {chroot.abspath}/usr/lib/arm-linux-gnueabihf
 QMAKE_RPATHLINKDIR += {chroot.abspath}/usr/lib/

Now it links again:

Test$ qmake-armhf -makefile
Test$ make
/usr/bin/arm-linux-gnueabihf-g++ -Wl,-O1 -Wl,-rpath-link,…/armhf/lib/arm-linux-gnueabihf -Wl,-rpath-link,…/armhf/usr/lib/arm-linux-gnueabihf -Wl,-rpath-link,…/armhf/usr/lib/ -o Test main.o mainwindow.o moc_mainwindow.o   -L…/armhf/lib/arm-linux-gnueabihf -L…/armhf/usr/lib/arm-linux-gnueabihf -L…/armhf/usr/lib/ …/armhf/usr/lib/arm-linux-gnueabihf/libQt5Widgets.so …/armhf/usr/lib/arm-linux-gnueabihf/libQt5Gui.so …/armhf/usr/lib/arm-linux-gnueabihf/libQt5Core.so -lGLESv2 -lpthread

Making it work in Qt Creator

Time to try it in Qt Creator, and sadly it fails:

/armhf/usr/lib/arm-linux-gnueabihf/qt5/mkspecs/features/toolchain.prf:76: Variable QMAKE_CXX.COMPILER_MACROS is not defined.

QMAKE_CXX.COMPILER_MACROS is not defined

I traced it to this bit in armhf/usr/lib/arm-linux-gnueabihf/qt5/mkspecs/features/toolchain.prf (nonrelevant bits deleted):

isEmpty($${target_prefix}.COMPILER_MACROS) {
    msvc {
        # …
    } else: gcc|ghs {
        vars = $$qtVariablesFromGCC($$QMAKE_CXX)
    }
    for (v, vars) {
        # …
        $${target_prefix}.COMPILER_MACROS += $$v
    }
    cache($${target_prefix}.COMPILER_MACROS, set stash)
} else {
    # …
}

It turns out that qmake is not able to realise that the compiler is gcc, so vars does not get set, nothing is set in COMPILER_MACROS, and qmake fails.

Reproducing it on the command line

When run manually, however, qmake-armhf worked, so it would be good to know how Qt Creator is actually running qmake. Since it frustratingly does not show what commands it runs, I'll have to strace it:

strace -e trace=execve --string-limit=123456 -o qtcreator.trace -f qtcreator

And there it is:

$ grep qmake- qtcreator.trace
1015841 execve("/usr/local/bin/qmake-armhf", ["/usr/local/bin/qmake-armhf", "-query"], 0x56096e923040 /* 54 vars */) = 0
1015865 execve("/usr/local/bin/qmake-armhf", ["/usr/local/bin/qmake-armhf", "…/Test/Test.pro", "-spec", "arm-linux-gnueabihf", "CONFIG+=debug", "CONFIG+=qml_debug"], 0x7f5cb4023e20 /* 55 vars */) = 0

I run the command manually and indeed I reproduce the problem:

$ /usr/local/bin/qmake-armhf Test.pro -spec arm-linux-gnueabihf CONFIG+=debug CONFIG+=qml_debug
…/armhf/usr/lib/arm-linux-gnueabihf/qt5/mkspecs/features/toolchain.prf:76: Variable QMAKE_CXX.COMPILER_MACROS is not defined.

I try removing options until I find the one that breaks it and... now it's always broken! Even manually running qmake-armhf, like I did earlier, stopped working:

$ rm .qmake.stash
$ qmake-armhf -makefile
…/armhf/usr/lib/arm-linux-gnueabihf/qt5/mkspecs/features/toolchain.prf:76: Variable QMAKE_CXX.COMPILER_MACROS is not defined.

Debugging toolchain.prf

I tried purging and reinstalling qtcreator, and recreating the chroot, but qmake-armhf is staying broken. I'll let that be, and try to debug toolchain.prf.

By grepping gcc in the mkspecs directory, I managed to figure out that:

  • The } else: gcc|ghs { test is matching the value(s) of QMAKE_COMPILER
  • QMAKE_COMPILER can have multiple values, separated by space
  • If in armhf/usr/lib/arm-linux-gnueabihf/qt5/mkspecs/arm-linux-gnueabihf/qmake.conf I set QMAKE_COMPILER = gcc arm-linux-gnueabihf-gcc, then things work again.

Sadly, I failed to find reference documentation for QMAKE_COMPILER's syntax and behaviour. I also failed to find why qmake-armhf worked earlier, and I am also failing to restore the system to a situation where it works again. Maybe I dreamt that it worked? I had some manual change laying around from some previous fiddling with things?

Anyway at least now I have the fix:

--- a/cbqt
+++ b/cbqt
@@ -248,7 +248,7 @@ QMAKE_RPATHLINKDIR += {chroot.abspath}/lib/arm-linux-gnueabihf
 QMAKE_RPATHLINKDIR += {chroot.abspath}/usr/lib/arm-linux-gnueabihf
 QMAKE_RPATHLINKDIR += {chroot.abspath}/usr/lib/

-QMAKE_COMPILER          = {chroot.arch_triplet}-gcc
+QMAKE_COMPILER          = gcc {chroot.arch_triplet}-gcc

 QMAKE_CC                = /usr/bin/{chroot.arch_triplet}-gcc

Fixing a compiler mismatch warning

In setting up the kit, Qt Creator also complained that the compiler from qmake did not match the one configured in the kit. That was easy to fix, by pointing at the host system cross-compiler in qmake.conf:

 QMAKE_COMPILER          = {chroot.arch_triplet}-gcc

-QMAKE_CC                = {chroot.arch_triplet}-gcc
+QMAKE_CC                = /usr/bin/{chroot.arch_triplet}-gcc

 QMAKE_LINK_C            = $$QMAKE_CC
 QMAKE_LINK_C_SHLIB      = $$QMAKE_CC

-QMAKE_CXX               = {chroot.arch_triplet}-g++
+QMAKE_CXX               = /usr/bin/{chroot.arch_triplet}-g++

 QMAKE_LINK              = $$QMAKE_CXX
 QMAKE_LINK_SHLIB        = $$QMAKE_CXX

Updated setup instructions

Create an armhf environment:

sudo cbqt ./armhf --create --verbose

Create a qmake wrapper that builds with this environment:

sudo ./cbqt ./armhf --qmake -o /usr/local/bin/qmake-armhf

Install the build-dependencies that you need:

# Note: :arch is added automatically to package names if no arch is explicitly specified
sudo ./cbqt ./armhf --install libqt5svg5-dev libmosquittopp-dev qtwebengine5-dev

Build with qmake

Use qmake-armhf instead of qmake and it works perfectly:

qmake-armhf -makefile
make

Set up Qt Creator

Configure a new Kit in Qt Creator:

  1. Tools/Options, then Kits, then Add
  2. Name: armhf (or anything you like)
  3. In the Qt Versions tab, click Add then set the path of the new Qt to /usr/local/bin/qmake-armhf. Click Apply.
  4. Back in the Kits, select the Qt version you just created in the Qt version field
  5. In Compilers, select the ARM versions of GCC. If they do not appear, install crossbuild-essential-armhf, then in the Compilers tab click Re-detect and then Apply to make them available for selection
  6. Dismiss the dialog with "OK": the new kit is ready

Now you can choose the default kit to build and run locally, and the armhf kit for remote cross-development.

I tried looking at sdktool to automate this step, and it requires a nontrivial amount of work to do it reliably, so these manual instructions will have to do.

Credits

This has been done as part of my work with Truelite.

Anarcat's "procmail considered harmful" post convinced me to get my act together and finally migrate my venerable procmail based setup to sieve.

My setup was nontrivial, so I migrated with an intermediate step in which sieve scripts would by default pipe everything to procmail, which allowed me to slowly move rules from procmailrc to sieve until nothing remained in procmailrc.

Here's what I did.

Literature review

https://brokkr.net/2019/10/31/lets-do-dovecot-slowly-and-properly-part-3-lmtp/ has a guide quite aligned with current Debian, and could be a starting point to get an idea of the work to do.

https://wiki.dovecot.org/HowTo/PostfixDovecotLMTP is way more terse, but more aligned with my intentions. Reading the former helped me in understanding the latter.

https://datatracker.ietf.org/doc/html/rfc5228 has the full Sieve syntax.

https://doc.dovecot.org/configuration_manual/sieve/pigeonhole_sieve_interpreter/ has the list of Sieve features supported by Dovecot.

https://doc.dovecot.org/settings/pigeonhole/ has the reference on Dovecot's sieve implementation.

https://raw.githubusercontent.com/dovecot/pigeonhole/master/doc/rfc/spec-bosch-sieve-extprograms.txt is the hard to find full reference for the functions introduced by the extprograms plugin.

Debugging tools:

  • doveconf to dump dovecot's configuration to see if what it understands matches what I mean
  • sieve-test parses sieve scripts: sieve-test file.sieve /dev/null is a quick and dirty syntax check

Backup of all mails processed

One thing I did with procmail was to generate a monthly mailbox with all incoming email, with something like this:

BACKUP="/srv/backupts/test-`date +%Y-%m-d`.mbox"

:0c
$BACKUP

I did not find an obvious way in sieve to create montly mailboxes, so I redesigned that system using Postfix's always_bcc feature, piping everything to an archive user.

I'll then recreate the monthly archiving using a chewmail script that I can simply run via cron.

Configure dovecot

apt install dovecot-sieve dovecot-lmtpd

I added this to the local dovecot configuration:

service lmtp {
  unix_listener /var/spool/postfix/private/dovecot-lmtp {
    user = postfix
    group = postfix
    mode = 0666
  }
}

protocol lmtp {
  mail_plugins = $mail_plugins sieve
}

plugin {
  sieve = file:~/.sieve;active=~/.dovecot.sieve
}

This makes Dovecot ready to receive mail from Postfix via a lmtp unix socket created in Postfix's private chroot.

It also activates the sieve plugin, and uses ~/.sieve as a sieve script.

The script can be a file or a directory; if it is a directory, ~/.dovecot.sieve will be a symlink pointing to the .sieve file to run.

This is a feature I'm not yet using, but if one day I want to try enabling UIs to edit sieve scripts, that part is ready.

Delegate to procmail

To make sieve scripts that delegate to procmail, I enabled the sieve_extprograms plugin:

 plugin {
   sieve = file:~/.sieve;active=~/.dovecot.sieve
+  sieve_plugins = sieve_extprograms
+  sieve_extensions +vnd.dovecot.pipe
+  sieve_pipe_bin_dir = /usr/local/lib/dovecot/sieve-pipe
+  sieve_trace_dir = ~/.sieve-trace
+  sieve_trace_level = matching
+  sieve_trace_debug = yes
 }

and then created a script for it:

mkdir -p /usr/local/lib/dovecot/sieve-pipe/
(echo "#!/bin/sh'; echo "exec /usr/bin/procmail") > /usr/local/lib/dovecot/sieve-pipe/procmail
chmod 0755 /usr/local/lib/dovecot/sieve-pipe/procmail

And I can have a sieve script that delegates processing to procmail:

require "vnd.dovecot.pipe";

pipe "procmail";

Activate the postfix side

These changes switched local delivery over to Dovecot:

--- a/roles/mailserver/templates/dovecot.conf
+++ b/roles/mailserver/templates/dovecot.conf
@@ -25,6 +25,8 @@+auth_username_format = %Ln
+diff --git a/roles/mailserver/templates/main.cf b/roles/mailserver/templates/main.cf
index d2c515a..d35537c 100644
--- a/roles/mailserver/templates/main.cf
+++ b/roles/mailserver/templates/main.cf
@@ -64,8 +64,7 @@ virtual_alias_domains =-mailbox_command = procmail -a "$EXTENSION"
-mailbox_size_limit = 0
+mailbox_transport = lmtp:unix:private/dovecot-lmtp

Without auth_username_format = %Ln dovecot won't be able to understand usernames sent by postfix in my specific setup.

Moving rules over to sieve

This is mostly straightforward, with the luxury of being able to do it a bit at a time.

The last tricky bit was how to call spamc from sieve, as in some situations I reduce system load by running the spamfilter only on a prefiltered selection of incoming emails.

For this I enabled the filter directive in sieve:

 plugin {
   sieve = file:~/.sieve;active=~/.dovecot.sieve
   sieve_plugins = sieve_extprograms
-  sieve_extensions +vnd.dovecot.pipe
+  sieve_extensions +vnd.dovecot.pipe +vnd.dovecot.filter
   sieve_pipe_bin_dir = /usr/local/lib/dovecot/sieve-pipe
+  sieve_filter_bin_dir = /usr/local/lib/dovecot/sieve-filter
   sieve_trace_dir = ~/.sieve-trace
   sieve_trace_level = matching
   sieve_trace_debug = yes
 }

Then I created a filter script:

mkdir -p /usr/local/lib/dovecot/sieve-filter/"
(echo "#!/bin/sh'; echo "exec /usr/bin/spamc") > /usr/local/lib/dovecot/sieve-filter/spamc
chmod 0755 /usr/local/lib/dovecot/sieve-filter/spamc

And now what was previously:

:0 fw
| /usr/bin/spamc

:0
* ^X-Spam-Status: Yes
.spam/

Can become:

require "vnd.dovecot.filter";
require "fileinto";

filter "spamc";

if header :contains "x-spam-level" "**************" {
    discard;
} elsif header :matches "X-Spam-Status" "Yes,*" {
    fileinto "spam";
}

Updates

Ansgar mentioned that it's possible to replicate the monthly mailbox using the variables and date extensions, with a hacky trick from the extensions' RFC:

require "date"
require "variables"

if currentdate :matches "month" "*" { set "month" "${1}"; }
if currentdate :matches "year" "*" { set "year" "${1}"; }

fileinto :create "${month}-${year}";

This morning we realised that a test case failed on Fedora 34 only (the link is in Italian) and we set to debugging.

The initial analysis

This is the initial reproducer:

$ PROJ_DEBUG=3 python setup.py test
test_recipe (tests.test_litota3.TestLITOTA3NordArkimetIFS) ... pj_open_lib(proj.db): call fopen(/lib64/../share/proj/proj.db) - succeeded
proj_create: Open of /lib64/../share/proj/proj.db failed
pj_open_lib(proj.db): call fopen(/lib64/../share/proj/proj.db) - succeeded
proj_create: no database context specified
Cannot instantiate source_crs
EXCEPTION in py_coast(): ProjP: cannot create crs to crs from [EPSG:4326] to [+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +over +units=m +no_defs]
ERROR

Note that opening /lib64/../share/proj/proj.db sometimes succeeds, sometimes fails. It's some kind of Schrödinger path, which works or not depending on how you observe it:

# ls -lad /lib64
lrwxrwxrwx 1 1000 1000 9 Jan 26  2021 /lib64 -> usr/lib64

$ ls -la /lib64/../share/proj/proj.db
-rw-r--r-- 1 root root 8925184 Jan 28  2021 /lib64/../share/proj/proj.db

$ cd /lib64/../share/proj/

$ cd /lib64
$ cd ..
$ cd share
-bash: cd: share: No such file or directory

And indeed, stat(2) finds it, and sqlite doesn't (the file is a sqlite database):

$ stat /lib64/../share/proj/proj.db
  File: /lib64/../share/proj/proj.db
  Size: 8925184     Blocks: 17432      IO Block: 4096   regular file
Device: 33h/51d Inode: 56907       Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-11-08 14:09:12.334350779 +0100
Modify: 2021-01-28 05:38:11.000000000 +0100
Change: 2021-11-08 13:42:51.758874327 +0100
 Birth: 2021-11-08 13:42:51.710874051 +0100

$ sqlite3 /lib64/../share/proj/proj.db
Error: unable to open database "/lib64/../share/proj/proj.db": unable to open database file

A minimal reproducer

Later on we started stripping layers of code towards a minimal reproducer: here it is. It works or doesn't work depending on whether proj is linked explicitly, or via MagPlus:

$ cat tc.cc
#include <magics/ProjP.h>

int main() {
    magics::ProjP p("EPSG:4326", "+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +over +units=m +no_defs");
    return 0;
}

$ g++ -o tc  tc.cc -I/usr/include/magics  -lMagPlus
$ ./tc
proj_create: Open of /lib64/../share/proj/proj.db failed
proj_create: no database context specified
terminate called after throwing an instance of 'magics::MagicsException'
  what():  ProjP: cannot create crs to crs from [EPSG:4326] to [+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +over +units=m +no_defs]
Aborted (core dumped)

$ g++ -o tc  tc.cc -I/usr/include/magics -lproj  -lMagPlus
$ ./tc

What is going on here?

A difference between the two is the path used to link to libproj.so:

$ ldd ./tc | grep proj
    libproj.so.19 => /lib64/libproj.so.19 (0x00007fd4919fb000)
$ g++ -o tc  tc.cc -I/usr/include/magics   -lMagPlus
$ ldd ./tc | grep proj
    libproj.so.19 => /lib64/../lib64/libproj.so.19 (0x00007f6d1051b000)

Common sense screams that this should not matter, but we chased an intuition and found that one of the ways proj looks for its database is relative to its shared library.

Indeed, gdb in hand, that dladdr call returns /lib64/../lib64/libproj.so.19.

From /lib64/../lib64/libproj.so.19, proj strips two paths from the end, presumably to pass from something like /something/usr/lib/libproj.so to /something/usr.

So, dladdr returns /lib64/../lib64/libproj.so.19, which becomes /lib64/../, which becomes /lib64/../share/proj/proj.db, which exists on the file system and is used as a path to the database.

But depending how you look at it, that path might or might not be valid: it passes the stat(2) check that stops the lookup for candidate paths, but sqlite is unable to open it.

Why does the other path work?

By linking libproj.so in the other way, dladdr returns /lib64/libproj.so.19, which becomes /share/proj/proj.db, which doesn't exist, which triggers a fallback to a PROJ_LIB constant defined at compile time, which is a path that works no matter how you look at it.

Why that weird path with libMagPlus?

To complete the picture, we found that libMagPlus.so is packaged with a rpath set, which is known to cause trouble

# readelf -d /usr/lib64/libMagPlus.so|grep rpath
 0x000000000000000f (RPATH)              Library rpath: [$ORIGIN/../lib64]

The workaround

We found that one can set PROJ_LIB in the environment to override the normal proj database lookup. Building on that, we came up with a simple way to override it on Fedora 34 only:

    if distro is not None and distro.linux_distribution()[:2] == ("Fedora", "34") and "PROJ_LIB" not in os.environ:
         self.env_overrides["PROJ_LIB"] = "/usr/share/proj/"

This has been a most edifying and educational debugging session, with only the necessary modicum of curses and swearwords. Working in a team of excellent people really helps.

help2man is quite nice for autogenerating manpages from command line help, making sure that they stay up to date as command line options evolve.

It works quite well, except for commands with subcommands, like Python programs that use argparse's add_subparser.

So, here's a quick hack that calls help2man for each subcommand, and stitches everything together in a simple manpage.

#!/usr/bin/python3

import re
import shutil
import sys
import subprocess
import tempfile

# TODO: move to argparse
command = sys.argv[1]

# Use setup.py to get the program version
res = subprocess.run([sys.executable, "setup.py", "--version"], stdout=subprocess.PIPE, text=True, check=True)
version = res.stdout.strip()

# Call the main commandline help to get a list of subcommands
res = subprocess.run([sys.executable, command, "--help"], stdout=subprocess.PIPE, text=True, check=True)
subcommands = re.sub(r'^.+\{(.+)\}.+$', r'\1', res.stdout, flags=re.DOTALL).split(',')

# Generate a help2man --include file with an extra section for each subcommand
with tempfile.NamedTemporaryFile("wt") as tf:
    print("[>DESCRIPTION]", file=tf)

    for subcommand in subcommands:
        res = subprocess.run(
                ["help2man", f"--name={command}", "--section=1",
                 "--no-info", "--version-string=dummy", f"./{command} {subcommand}"],
                stdout=subprocess.PIPE, text=True, check=True)
        subcommand_doc = re.sub(r'^.+.SH DESCRIPTION', '', res.stdout, flags=re.DOTALL)
        print(".SH ", subcommand.upper(), " SUBCOMMAND", file=tf)
        tf.write(subcommand_doc)

    with open(f"{command}.1.in", "rt") as fd:
        shutil.copyfileobj(fd, tf)

    tf.flush()

    # Call help2man on the main command line help, with the extra include file
    # we just generated
    subprocess.run(
            ["help2man", f"--include={tf.name}", f"--name={command}",
             "--section=1", "--no-info", f"--version-string={version}",
             "--output=arkimaps.1", "./arkimaps"],
            check=True)

I had this nightmare where I had a very, very important confcall.

I joined with Chrome. Chrome said Failed to access your microphone - Cannot use microphone for an unknown reason. Could not start audio source.

I joined with Firefox. Firefox chose Monitor of Built-in Audio Analog Stereo as a microphone, and did not let me change it. Not in the browser, not in pavucontrol.

I joined with the browser on my phone, and the webpage said This meeting needs to use your microphone and camera. Select *Allow* when your browser asks for permissions. But the question never came.

I could hear people talking. I had very important things to say. I tried typing them in the chat window, but they weren't seeing it. The meeting ended. I was on the verge of tears.

Tell me, Mr. Anderson, what good is a phone call when you are unable to speak?

Since this nightmare happened for real, including the bit about tears in the end, let's see that it doesn't happen again. I should now have three working systems, which hopefully won't all break again all at the same time.

Fixing Chrome

I can reproduce this reliably, on Bullseye's standard Chromium 90.0.4430.212-1, just launched on an empty profile, no extensions.

The webpage has camera and microphone allowed. Chrome doesn't show up in the recording tab of pulseaudio. Nothing on Chrome's stdout/stderr.

JavaScript console has:

Logger.js:154 2021-09-10Txx:xx:xx.xxxZ [features/base/tracks] Failed to create local tracks
Array(2)
DOMException: Could not start audio source

I found the answer here:

I had the similar problem once with chromium. i could solve it by switching in preferences->microphone-> from "default" to "intern analog stereo".

Opening the little popup next to the microphone/mute button allows choosing other microphones, which work. Only "Same as system (Default)" does not work.

Fixing Firefox

I have firefox-esr 78.13.0esr-1~deb11u1. In Jitsi, microphone selection is disabled on the toolbar and in the settings menu. In pavucontrol, changing the recording device for Firefox has no effect. If for some reason the wrong microphone got chosen, those are not ways of fixing it.

What I found works is to click on the camera permission icon, remove microphone permission, then reload the page. At that point Firefox will ask for permission again, and that microphone selection seems to work.

Relevant bugs: on Jitsi and on Firefox. Since this is well known (once you find the relevant issues), I'd have appreciated Jitsi at least showing a link to an explanation of workarounds on Firefox, instead of just disabling microphone selection.

Fixing Jitsi on the phone side

I really don't want to preemptively give camera and microphone permissions to my phone browser. I noticed that there's the Jitsi app on F-Droid and much as I hate to use an app when a website would work, at least in this case it's a way to keep the permission sets separate, so I installed that.

Fixing pavucontrol?

I tried to find out why I can't change input device for FireFox on pavucontrol. I only managed to find an Ask Ubuntu question with no answer and a Unix StackExchange question with no answer.

I'm creating a program that uses the web browser for its user interface, and I'm reasonably sure I'm not the first person doing this.

Normally such a problem would listen to a port on localhost, and tell the browser to connect to it. Bonus points for listening to a randomly allocated free port, so that one does not need to involve some amount of luck to get the program started.

However, using a local port still means that any user on the local machine can connect to it, which is generally a security issue.

A possible solution would be to use AF_UNIX Unix Domain Sockets, which are supported by various web servers, but as far as I understand not currently by browsers. I checked Firefox and Chrome, and they currently seem to fail to even acknowledge the use case.

I'm reasonably sure I'm not the first person doing this, and yes, it's intended as an understatement.

So, dear Lazyweb, is there a way to securely use a browser as a UI for a user's program, without exposing access to the backend to other users in the system?

Access token in the URL

Emanuele Di Giacomo suggests to add an access token to the URL that gets passed to the browser.

This would work to protect access on localhost: even if the application cannot use HTTPS, other users cannot see packets that go through the local interface, so both the access token and the session cookie that one could send afterwards would be protected.

Network namespaces

I thought about isolating server and browser in a private network namespace with something like unshare(1), but it seems to require root.

Johannes Schauer Marin Rodrigues wrote to correct that:

It's possible to unshare the network namespace by first unsharing the user namespace and thus becoming root which is possible without being root since #898446 got fixed.

For example you can run this as the normal user:

lxc-usernsexec -- lxc-unshare -s NETWORK -- ip addr

If you don't want to depend on lxc, you can write a wrapper in Perl or Python. I have a Perl implementation of that in mmdebstrap.

Firewalling

Martin Schuster wrote to suggest another option:

I had the same issue. My approach was "weird", but worked: Block /outgoing/ connections to the port, unless the uid is correct. That might be counter-intuitive, but of course all connections /to/ localhost will be done /from/ localhost also.

Something like:

iptables -A OUTPUT -p tcp -d localhost --dport 8123 -m owner --uid-owner joe -j ACCEPT

iptables -A OUTPUT -p tcp -d localhost --dport 8123 -j REJECT

User checking with /proc/net/tcp

23:37 #debian-rant < _jwilk:#debian-rant> enrico: Re https://www.enricozini.org/blog/2021/debian/run-a-webserver-for-a-specific-user-only/, on Linux you can check /proc/net/tcp to see if the connection comes from the right user. I've seen it implemented here: https://sources.debian.org/src/agedu/9723-1/httpd.c/#L389 23:37 #debian-rant < _jwilk:#debian-rant> But... 23:40 #debian-rant < _jwilk:#debian-rant> The trouble is that https://evil.example.org/ can include and the browser will happily make that request. 23:42 #debian-rant < _jwilk:#debian-rant> This is the same user from the OS point view, so /proc/net/tcp or iptables trickery doesn't help.

This is part of a series of posts on ideas for an ansible-like provisioning system, implemented in Transilience.

Unit testing some parts of Transilience, like the apt and systemd actions, or remote Mitogen connections, can really use a containerized system for testing.

To have that, I reused my work on nspawn-runner. to build a simple and very fast system of ephemeral containers, with minimal dependencies, based on systemd-nspawn and btrfs snapshots:

Setup

To be able to use systemd-nspawn --ephemeral, the chroots needs to be btrfs subvolumes. If you are not running on a btrfs filesystem, you can create one to run the tests, even on a file:

fallocate -l 1.5G testfile
/usr/sbin/mkfs.btrfs testfile
sudo mount -o loop testfile test_chroots/

I created a script to setup the test environment, here is an extract:

mkdir -p test_chroots

cat << EOF > "test_chroots/CACHEDIR.TAG"
Signature: 8a477f597d28d172789f06886806bc55
# chroots used for testing transilience, can be regenerated with make-test-chroot
EOF

btrfs subvolume create test_chroots/buster
eatmydata debootstrap --variant=minbase --include=python3,dbus,systemd buster test_chroots/buster

CACHEDIR.TAG is a nice trick to tell backup software not to bother backing up the contents of this directory, since it can be easily regenerated.

eatmydata is optional, and it speeds up debootstrap quite a bit.

Running unittest with sudo

Here's a simple helper to drop root as soon as possible, and regain it only when needed. Note that it needs $SUDO_UID and $SUDO_GID, that are set by sudo, to know which user to drop into:

class ProcessPrivs:
    """
    Drop root privileges and regain them only when needed
    """
    def __init__(self):
        self.orig_uid, self.orig_euid, self.orig_suid = os.getresuid()
        self.orig_gid, self.orig_egid, self.orig_sgid = os.getresgid()

        if "SUDO_UID" not in os.environ:
            raise RuntimeError("Tests need to be run under sudo")

        self.user_uid = int(os.environ["SUDO_UID"])
        self.user_gid = int(os.environ["SUDO_GID"])

        self.dropped = False

    def drop(self):
        """
        Drop root privileges
        """
        if self.dropped:
            return
        os.setresgid(self.user_gid, self.user_gid, 0)
        os.setresuid(self.user_uid, self.user_uid, 0)
        self.dropped = True

    def regain(self):
        """
        Regain root privileges
        """
        if not self.dropped:
            return
        os.setresuid(self.orig_suid, self.orig_suid, self.user_uid)
        os.setresgid(self.orig_sgid, self.orig_sgid, self.user_gid)
        self.dropped = False

    @contextlib.contextmanager
    def root(self):
        """
        Regain root privileges for the duration of this context manager
        """
        if not self.dropped:
            yield
        else:
            self.regain()
            try:
                yield
            finally:
                self.drop()

    @contextlib.contextmanager
    def user(self):
        """
        Drop root privileges for the duration of this context manager
        """
        if self.dropped:
            yield
        else:
            self.drop()
            try:
                yield
            finally:
                self.regain()


privs = ProcessPrivs()
privs.drop()

As soon as this module is loaded, root privileges are dropped, and can be regained for as little as possible using a handy context manager:

   with privs.root():
       subprocess.run(["systemd-run", ...], check=True, capture_output=True)

Using the chroot from test cases

The infrastructure to setup and spin down ephemeral machine is relatively simple, once one has worked out the nspawn incantations:

class Chroot:
    """
    Manage an ephemeral chroot
    """
    running_chroots: Dict[str, "Chroot"] = {}

    def __init__(self, name: str, chroot_dir: Optional[str] = None):
        self.name = name
        if chroot_dir is None:
            self.chroot_dir = self.get_chroot_dir(name)
        else:
            self.chroot_dir = chroot_dir
        self.machine_name = f"transilience-{uuid.uuid4()}"

    def start(self):
        """
        Start nspawn on this given chroot.

        The systemd-nspawn command is run contained into its own unit using
        systemd-run
        """
        unit_config = [
            'KillMode=mixed',
            'Type=notify',
            'RestartForceExitStatus=133',
            'SuccessExitStatus=133',
            'Slice=machine.slice',
            'Delegate=yes',
            'TasksMax=16384',
            'WatchdogSec=3min',
        ]

        cmd = ["systemd-run"]
        for c in unit_config:
            cmd.append(f"--property={c}")

        cmd.extend((
            "systemd-nspawn",
            "--quiet",
            "--ephemeral",
            f"--directory={self.chroot_dir}",
            f"--machine={self.machine_name}",
            "--boot",
            "--notify-ready=yes"))

        log.info("%s: starting machine using image %s", self.machine_name, self.chroot_dir)

        log.debug("%s: running %s", self.machine_name, " ".join(shlex.quote(c) for c in cmd))
        with privs.root():
            subprocess.run(cmd, check=True, capture_output=True)
        log.debug("%s: started", self.machine_name)
        self.running_chroots[self.machine_name] = self

    def stop(self):
        """
        Stop the running ephemeral containers
        """
        cmd = ["machinectl", "terminate", self.machine_name]
        log.debug("%s: running %s", self.machine_name, " ".join(shlex.quote(c) for c in cmd))
        with privs.root():
            subprocess.run(cmd, check=True, capture_output=True)
        log.debug("%s: stopped", self.machine_name)
        del self.running_chroots[self.machine_name]

    @classmethod
    def create(cls, chroot_name: str) -> "Chroot":
        """
        Start an ephemeral machine from the given master chroot
        """
        res = cls(chroot_name)
        res.start()
        return res

    @classmethod
    def get_chroot_dir(cls, chroot_name: str):
        """
        Locate a master chroot under test_chroots/
        """
        chroot_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "test_chroots", chroot_name))
        if not os.path.isdir(chroot_dir):
            raise RuntimeError(f"{chroot_dir} does not exists or is not a chroot directory")
        return chroot_dir


# We need to use atextit, because unittest won't run
# tearDown/tearDownClass/tearDownModule methods in case of KeyboardInterrupt
# and we need to make sure to terminate the nspawn containers at exit
@atexit.register
def cleanup():
    # Use a list to prevent changing running_chroots during iteration
    for chroot in list(Chroot.running_chroots.values()):
        chroot.stop()

And here's a TestCase mixin that starts a containerized systems and opens a Mitogen connection to it:

class ChrootTestMixin:
    """
    Mixin to run tests over a setns connection to an ephemeral systemd-nspawn
    container running one of the test chroots
    """
    chroot_name = "buster"

    @classmethod
    def setUpClass(cls):
        super().setUpClass()
        import mitogen
        from transilience.system import Mitogen
        cls.broker = mitogen.master.Broker()
        cls.router = mitogen.master.Router(cls.broker)
        cls.chroot = Chroot.create(cls.chroot_name)
        with privs.root():
            cls.system = Mitogen(
                    cls.chroot.name, "setns", kind="machinectl",
                    python_path="/usr/bin/python3",
                    container=cls.chroot.machine_name, router=cls.router)

    @classmethod
    def tearDownClass(cls):
        super().tearDownClass()
        cls.system.close()
        cls.broker.shutdown()
        cls.chroot.stop()

Running tests

Once the tests are set up, everything goes on as normal, except one needs to run nose2 with sudo:

sudo nose2-3

Spin up time for containers is pretty fast, and the tests drop root as soon as possible, and only regain it for as little as needed.

Also, dependencies for all this are minimal and available on most systems, and the setup instructions seem pretty straightforward

This is part of a series of posts on ideas for an ansible-like provisioning system, implemented in Transilience.

Mitogen is a great library, but scarily complicated, and I've been wondering how hard it would be to make alternative connection methods for Transilience.

Here's a wild idea: can I package a whole Transilience playbook, plus dependencies, in a zipapp, then send the zipapp to the machine to be provisioned, and run it locally?

It turns out I can.

Creating the zipapp

This is somewhat hackish, but until I can rely on Python 3.9's improved importlib.resources module, I cannot think of a better way:

    def zipapp(self, target: str, interpreter=None):
        """
        Bundle this playbook into a self-contained zipapp
        """
        import zipapp
        import jinja2
        import transilience
        if interpreter is None:
            interpreter = sys.executable

        if getattr(transilience.__loader__, "archive", None):
            # Recursively iterating module directories requires Python 3.9+
            raise NotImplementedError("Cannot currently create a zipapp from a zipapp")

        with tempfile.TemporaryDirectory() as workdir:
            # Copy transilience
            shutil.copytree(os.path.dirname(__file__), os.path.join(workdir, "transilience"))
            # Copy jinja2
            shutil.copytree(os.path.dirname(jinja2.__file__), os.path.join(workdir, "jinja2"))
            # Copy argv[0] as __main__.py
            shutil.copy(sys.argv[0], os.path.join(workdir, "__main__.py"))
            # Copy argv[0]/roles
            role_dir = os.path.join(os.path.dirname(sys.argv[0]), "roles")
            if os.path.isdir(role_dir):
                shutil.copytree(role_dir, os.path.join(workdir, "roles"))
            # Turn everything into a zipapp
            zipapp.create_archive(workdir, target, interpreter=interpreter, compressed=True)

Since the zipapp contains not just the playbook, the roles, and the roles' assets, but also Transilience and Jinja2, it can run on any system that has a Python 3.7+ interpreter, and nothing else!

I added it to the standard set of playbook command line options, so any Transilience playbook can turn itself into a self-contained zipapp:

$ ./provision --help
usage: provision [-h] [-v] [--debug] [-C] [--local LOCAL]
                 [--ansible-to-python role | --ansible-to-ast role | --zipapp file.pyz]
[...]
  --zipapp file.pyz     bundle this playbook in a self-contained executable
                        python zipapp

Loading assets from the zipapp

I had to create ZipFile varieties of some bits of infrastructure in Transilience, to load templates, files, and Ansible yaml files from zip files.

You can see above a way to detect if a module is loaded from a zipfile: check if the module's __loader__ attribute has an archive attribute.

Here's a Jinja2 template loader that looks into a zip:

class ZipLoader(jinja2.BaseLoader):
    def __init__(self, archive: zipfile.ZipFile, root: str):
        self.zipfile = archive
        self.root = root

    def get_source(self, environment: jinja2.Environment, template: str):
        path = os.path.join(self.root, template)
        with self.zipfile.open(path, "r") as fd:
            source = fd.read().decode()
        return source, None, lambda: True

I also created a FileAsset abstract interface to represent a local file, and had Role.lookup_file return an appropriate instance:

    def lookup_file(self, path: str) -> str:
        """
        Resolve a pathname inside the place where the role assets are stored.
        Returns a pathname to the file
        """
        if self.role_assets_zipfile is not None:
            return ZipFileAsset(self.role_assets_zipfile, os.path.join(self.role_assets_root, path))
        else:
            return LocalFileAsset(os.path.join(self.role_assets_root, path))

An interesting side effect of having smarter local file accessors is that I can cache the contents of small files and transmit them to the remote host together with the other action parameters, saving a potential network round trip for each builtin.copy action that has a small source.

The result

The result is kind of fun:

$ time ./provision --zipapp test.pyz

real    0m0.203s
user    0m0.174s
sys 0m0.029s

$ time scp test.pyz root@test:
test.pyz                                                                                                         100%  528KB 388.9KB/s   00:01

real    0m1.576s
user    0m0.010s
sys 0m0.007s

And on the remote:

# time ./test.pyz --local=test
2021-06-29 18:05:41,546 test: [connected 0.000s]
[...]
2021-06-29 18:12:31,555 test: 88 total actions in 0.00ms: 87 unchanged, 0 changed, 1 skipped, 0 failed, 0 not executed.

real    0m0.979s
user    0m0.783s
sys 0m0.172s

Compare with a Mitogen run:

$ time PYTHONPATH=../transilience/ ./provision
2021-06-29 18:13:44 test: [connected 0.427s]
[...]
2021-06-29 18:13:46 test: 88 total actions in 2.50s: 87 unchanged, 0 changed, 1 skipped, 0 failed, 0 not executed.

real    0m2.697s
user    0m0.856s
sys 0m0.042s

From a single test run, not a good benchmark, it's 0.203 + 1.576 + 0.979 = 2.758s with the zipapp and 2.697s with Mitogen. Even if I've been lucky, it's a similar order of magnitude.

What can I use this for?

This was mostly a fun hack.

It could however be the basis for a Fabric-based connector, or a clusterssh-based connector, or for bundling a Transilience playbook into an installation image, or to add a provisioning script to the boot partition of a Raspberry Pi. It looks like an interesting trick to have up one's sleeve.

One could even build an Ansible-based connector(!) in which a simple Ansible playbook, with no facts gathering, is used to build the zipapp, push it to remote systems and run it. That would be the wackiest way of speeding up Ansible, ever!

Next: using Systemd containers with unittest, for Transilience's test suite.

This is part of a series of posts on ideas for an ansible-like provisioning system, implemented in Transilience.

I thought a lot of what I managed to do so far with Transilience would be impossible, but then here I am. How about Ansible conditionals? Those must be impossible, right?

Let's give it a try.

A quick recon of Ansible sources

Looking into Ansible's sources, when expressions are lists of strings AND-ed together.

The expressions are Jinja2 expressions that Ansible pastes into a mini-template, renders, and checks the string that comes out.

A quick recon of Jinja2

Jinja2 has a convenient function (jinja2.Environment.compile_expression) that compiles a template snippet into a Python function.

It can also parse a template into an AST that can be inspected in various ways.

Evaluating Ansible conditionals in Python

Environment.compile_expression seems to really do precisely what we need for this, straight out of the box.

There is an issue with the concept of "defined": for Ansible it seems to mean "the variable is present in the template context". In Transilience instead, all variables are fields in the Role dataclass, and can be None when not set.

This means that we need to remove variables that are set to None before passing the parameters to the compiled Jinjae expression:

class Conditional:
    """
    An Ansible conditional expression
    """
    def __init__(self, engine: template.Engine, body: str):
        # Original unparsed expression
        self.body: str = body
        # Expression compiled to a callable
        self.expression: Callable = engine.env.compile_expression(body)

    def evaluate(self, ctx: Dict[str, Any]):
        ctx = {name: val for name, val in ctx.items() if val is not None}
        return self.expression(**ctx)

Generating Python code

Transilience does not only support running Ansible roles, but also converting them to Python code. I can keep this up by traversing the Jinja2 AST generating Python expressions.

The code is straightforward enough that I can throw in a bit of pattern matching to make some expressions more idiomatic for Python:

class Conditional:
    def __init__(self, engine: template.Engine, body: str):
    ...
        parser = jinja2.parser.Parser(engine.env, body, state='variable')
        self.jinja2_ast: nodes.Node = parser.parse_expression()

    def get_python_code(self) -> str:
        return to_python_code(self.jinja2_ast


def to_python_code(node: nodes.Node) -> str:
    if isinstance(node, nodes.Name):
        if node.ctx == "load":
            return f"self.{node.name}"
        else:
            raise NotImplementedError(f"jinja2 Name nodes with ctx={node.ctx!r} are not supported: {node!r}")
    elif isinstance(node, nodes.Test):
        if node.name == "defined":
            return f"{to_python_code(node.node)} is not None"
        elif node.name == "undefined":
            return f"{to_python_code(node.node)} is None"
        else:
            raise NotImplementedError(f"jinja2 Test nodes with name={node.name!r} are not supported: {node!r}")
    elif isinstance(node, nodes.Not):
        if isinstance(node.node, nodes.Test):
            # Special case match well-known structures for more idiomatic Python
            if node.node.name == "defined":
                return f"{to_python_code(node.node.node)} is None"
            elif node.node.name == "undefined":
                return f"{to_python_code(node.node.node)} is not None"
        elif isinstance(node.node, nodes.Name):
            return f"not {to_python_code(node.node)}"
        return f"not ({to_python_code(node.node)})"
    elif isinstance(node, nodes.Or):
        return f"({to_python_code(node.left)} or {to_python_code(node.right)})"
    elif isinstance(node, nodes.And):
        return f"({to_python_code(node.left)} and {to_python_code(node.right)})"
    else:
        raise NotImplementedError(f"jinja2 {node.__class__} nodes are not supported: {node!r}")

Scanning for variables

Lastly, I can implement scanning conditionals for variable references to add as fields to the Role dataclass:

class FindVars(jinja2.visitor.NodeVisitor):
    def __init__(self):
        self.found: Set[str] = set()

    def visit_Name(self, node):
        if node.ctx == "load":
            self.found.add(node.name)


class Conditional:
    ...
    def list_role_vars(self) -> Sequence[str]:
        fv = FindVars()
        fv.visit(self.jinja2_ast)
        return fv.found

The result in action

Take this simple Ansible task:

---
 - name: Example task
   file:
      state: touch
      path: /tmp/test
   when: (is_test is defined and is_test) or debug is defined

Run it through ./provision --ansible-to-python test and you get:

from __future__ import annotations
from typing import Any
from transilience import role
from transilience.actions import builtin, facts


@role.with_facts([facts.Platform])
class Role(role.Role):
    # Role variables used by templates
    debug: Any = None
    is_test: Any = None

    def all_facts_available(self):
        if ((self.is_test is not None and self.is_test)
                or self.debug is not None):
            self.add(
                builtin.file(path='/tmp/test', state='touch'),
                name='Example task')

Besides one harmless set of parentheses too much, what I wasn't sure would be possible is there, right there, staring at me with a mischievous grin.

Next: Building a Transilience playbook in a zipapp.