Enrico's blog

This is a common logging pattern in Python, to have loggers related to module names:

import logging

log = logging.getLogger(__name__)


class Bill:
    def load_bill(self, filename: str):
        log.info("%s: loading file", filename)

I often however find myself wanting to have loggers related to something context-dependent, like the kind of file that is being processed. For example, I'd like to log loading of bill loading when done by the expenses module, and not when done by the printing module.

I came up with a little hack that keeps the same API as before, and allows to propagate a context dependent logger to the code called:

# Call this file log.py
from __future__ import annotations
import contextlib
import contextvars
import logging

_log: contextvars.ContextVar[logging.Logger] = contextvars.ContextVar('log', default=logging.getLogger())


@contextlib.contextmanager
def logger(name: str):
    """
    Set a default logger for the duration of this context manager
    """
    old = _log.set(logging.getLogger(name))
    try:
        yield
    finally:
        _log.reset(old)


def debug(*args, **kw):
    _log.get().debug(*args, **kw)


def info(*args, **kw):
    _log.get().info(*args, **kw)


def warning(*args, **kw):
    _log.get().warning(*args, **kw)


def error(*args, **kw):
    _log.get().error(*args, **kw)

And now I can do this:

from . import log

# …
    with log.logger("expenses"):
        bill = load_bill(filename)


# This code did not change!
class Bill:
    def load_bill(self, filename: str):
        log.info("%s: loading file", filename)

Anarcat's "procmail considered harmful" post convinced me to get my act together and finally migrate my venerable procmail based setup to sieve.

My setup was nontrivial, so I migrated with an intermediate step in which sieve scripts would by default pipe everything to procmail, which allowed me to slowly move rules from procmailrc to sieve until nothing remained in procmailrc.

Here's what I did.

Literature review

https://brokkr.net/2019/10/31/lets-do-dovecot-slowly-and-properly-part-3-lmtp/ has a guide quite aligned with current Debian, and could be a starting point to get an idea of the work to do.

https://wiki.dovecot.org/HowTo/PostfixDovecotLMTP is way more terse, but more aligned with my intentions. Reading the former helped me in understanding the latter.

https://datatracker.ietf.org/doc/html/rfc5228 has the full Sieve syntax.

https://doc.dovecot.org/configuration_manual/sieve/pigeonhole_sieve_interpreter/ has the list of Sieve features supported by Dovecot.

https://doc.dovecot.org/settings/pigeonhole/ has the reference on Dovecot's sieve implementation.

https://raw.githubusercontent.com/dovecot/pigeonhole/master/doc/rfc/spec-bosch-sieve-extprograms.txt is the hard to find full reference for the functions introduced by the extprograms plugin.

Debugging tools:

  • doveconf to dump dovecot's configuration to see if what it understands matches what I mean
  • sieve-test parses sieve scripts: sieve-test file.sieve /dev/null is a quick and dirty syntax check

Backup of all mails processed

One thing I did with procmail was to generate a monthly mailbox with all incoming email, with something like this:

BACKUP="/srv/backupts/test-`date +%Y-%m-d`.mbox"

:0c
$BACKUP

I did not find an obvious way in sieve to create montly mailboxes, so I redesigned that system using Postfix's always_bcc feature, piping everything to an archive user.

I'll then recreate the monthly archiving using a chewmail script that I can simply run via cron.

Configure dovecot

apt install dovecot-sieve dovecot-lmtpd

I added this to the local dovecot configuration:

service lmtp {
  unix_listener /var/spool/postfix/private/dovecot-lmtp {
    user = postfix
    group = postfix
    mode = 0666
  }
}

protocol lmtp {
  mail_plugins = $mail_plugins sieve
}

plugin {
  sieve = file:~/.sieve;active=~/.dovecot.sieve
}

This makes Dovecot ready to receive mail from Postfix via a lmtp unix socket created in Postfix's private chroot.

It also activates the sieve plugin, and uses ~/.sieve as a sieve script.

The script can be a file or a directory; if it is a directory, ~/.dovecot.sieve will be a symlink pointing to the .sieve file to run.

This is a feature I'm not yet using, but if one day I want to try enabling UIs to edit sieve scripts, that part is ready.

Delegate to procmail

To make sieve scripts that delegate to procmail, I enabled the sieve_extprograms plugin:

 plugin {
   sieve = file:~/.sieve;active=~/.dovecot.sieve
+  sieve_plugins = sieve_extprograms
+  sieve_extensions +vnd.dovecot.pipe
+  sieve_pipe_bin_dir = /usr/local/lib/dovecot/sieve-pipe
+  sieve_trace_dir = ~/.sieve-trace
+  sieve_trace_level = matching
+  sieve_trace_debug = yes
 }

and then created a script for it:

mkdir -p /usr/local/lib/dovecot/sieve-pipe/
(echo "#!/bin/sh'; echo "exec /usr/bin/procmail") > /usr/local/lib/dovecot/sieve-pipe/procmail
chmod 0755 /usr/local/lib/dovecot/sieve-pipe/procmail

And I can have a sieve script that delegates processing to procmail:

require "vnd.dovecot.pipe";

pipe "procmail";

Activate the postfix side

These changes switched local delivery over to Dovecot:

--- a/roles/mailserver/templates/dovecot.conf
+++ b/roles/mailserver/templates/dovecot.conf
@@ -25,6 +25,8 @@+auth_username_format = %Ln
+diff --git a/roles/mailserver/templates/main.cf b/roles/mailserver/templates/main.cf
index d2c515a..d35537c 100644
--- a/roles/mailserver/templates/main.cf
+++ b/roles/mailserver/templates/main.cf
@@ -64,8 +64,7 @@ virtual_alias_domains =-mailbox_command = procmail -a "$EXTENSION"
-mailbox_size_limit = 0
+mailbox_transport = lmtp:unix:private/dovecot-lmtp

Without auth_username_format = %Ln dovecot won't be able to understand usernames sent by postfix in my specific setup.

Moving rules over to sieve

This is mostly straightforward, with the luxury of being able to do it a bit at a time.

The last tricky bit was how to call spamc from sieve, as in some situations I reduce system load by running the spamfilter only on a prefiltered selection of incoming emails.

For this I enabled the filter directive in sieve:

 plugin {
   sieve = file:~/.sieve;active=~/.dovecot.sieve
   sieve_plugins = sieve_extprograms
-  sieve_extensions +vnd.dovecot.pipe
+  sieve_extensions +vnd.dovecot.pipe +vnd.dovecot.filter
   sieve_pipe_bin_dir = /usr/local/lib/dovecot/sieve-pipe
+  sieve_filter_bin_dir = /usr/local/lib/dovecot/sieve-filter
   sieve_trace_dir = ~/.sieve-trace
   sieve_trace_level = matching
   sieve_trace_debug = yes
 }

Then I created a filter script:

mkdir -p /usr/local/lib/dovecot/sieve-filter/"
(echo "#!/bin/sh'; echo "exec /usr/bin/spamc") > /usr/local/lib/dovecot/sieve-filter/spamc
chmod 0755 /usr/local/lib/dovecot/sieve-filter/spamc

And now what was previously:

:0 fw
| /usr/bin/spamc

:0
* ^X-Spam-Status: Yes
.spam/

Can become:

require "vnd.dovecot.filter";
require "fileinto";

filter "spamc";

if header :contains "x-spam-level" "**************" {
    discard;
} elsif header :matches "X-Spam-Status" "Yes,*" {
    fileinto "spam";
}

Updates

Ansgar mentioned that it's possible to replicate the monthly mailbox using the variables and date extensions, with a hacky trick from the extensions' RFC:

require "date"
require "variables"

if currentdate :matches "month" "*" { set "month" "${1}"; }
if currentdate :matches "year" "*" { set "year" "${1}"; }

fileinto :create "${month}-${year}";

Suppose you have a tool that archives images, or scientific data, and it has a test suite. It would be good to collect sample files for the test suite, but they are often so big one can't really bloat the repository with them.

But does the test suite need everything that is in those files? Not necesarily. For example, if one's testing code that reads EXIF metadata, one doesn't care about what is in the image.

That technique works extemely well. I can take GRIB files that are several megabytes in size, zero out their data payload, and get nice 1Kb samples for the test suite.

I've started to collect and organise the little hacks I use for this into a tool I called mktestsample:

$ mktestsample -v samples1/*
2021-11-23 20:16:32 INFO common samples1/cosmo_2d+0.grib: size went from 335168b to 120b
2021-11-23 20:16:32 INFO common samples1/grib2_ifs.arkimet: size went from 4993448b to 39393b
2021-11-23 20:16:32 INFO common samples1/polenta.jpg: size went from 3191475b to 94517b
2021-11-23 20:16:32 INFO common samples1/test-ifs.grib: size went from 1986469b to 4860b

Those are massive savings, but I'm not satisfied about those almost 94Kb of JPEG:

$ ls -la samples1/polenta.jpg
-rw-r--r-- 1 enrico enrico 94517 Nov 23 20:16 samples1/polenta.jpg
$ gzip samples1/polenta.jpg
$ ls -la samples1/polenta.jpg.gz
-rw-r--r-- 1 enrico enrico 745 Nov 23 20:16 samples1/polenta.jpg.gz

I believe I did all I could: completely blank out image data, set quality to zero, maximize subsampling, and tweak quantization to throw everything away.

Still, the result is a 94Kb file that can be gzipped down to 745 bytes. Is there something I'm missing?

I suppose JPEG is better at storing an image than at storing the lack of an image. I cannot really complain :)

I can still commit compressed samples of large images to a git repository, taking very little data indeed. That's really nice!

This morning we realised that a test case failed on Fedora 34 only (the link is in Italian) and we set to debugging.

The initial analysis

This is the initial reproducer:

$ PROJ_DEBUG=3 python setup.py test
test_recipe (tests.test_litota3.TestLITOTA3NordArkimetIFS) ... pj_open_lib(proj.db): call fopen(/lib64/../share/proj/proj.db) - succeeded
proj_create: Open of /lib64/../share/proj/proj.db failed
pj_open_lib(proj.db): call fopen(/lib64/../share/proj/proj.db) - succeeded
proj_create: no database context specified
Cannot instantiate source_crs
EXCEPTION in py_coast(): ProjP: cannot create crs to crs from [EPSG:4326] to [+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +over +units=m +no_defs]
ERROR

Note that opening /lib64/../share/proj/proj.db sometimes succeeds, sometimes fails. It's some kind of Schrödinger path, which works or not depending on how you observe it:

# ls -lad /lib64
lrwxrwxrwx 1 1000 1000 9 Jan 26  2021 /lib64 -> usr/lib64

$ ls -la /lib64/../share/proj/proj.db
-rw-r--r-- 1 root root 8925184 Jan 28  2021 /lib64/../share/proj/proj.db

$ cd /lib64/../share/proj/

$ cd /lib64
$ cd ..
$ cd share
-bash: cd: share: No such file or directory

And indeed, stat(2) finds it, and sqlite doesn't (the file is a sqlite database):

$ stat /lib64/../share/proj/proj.db
  File: /lib64/../share/proj/proj.db
  Size: 8925184     Blocks: 17432      IO Block: 4096   regular file
Device: 33h/51d Inode: 56907       Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-11-08 14:09:12.334350779 +0100
Modify: 2021-01-28 05:38:11.000000000 +0100
Change: 2021-11-08 13:42:51.758874327 +0100
 Birth: 2021-11-08 13:42:51.710874051 +0100

$ sqlite3 /lib64/../share/proj/proj.db
Error: unable to open database "/lib64/../share/proj/proj.db": unable to open database file

A minimal reproducer

Later on we started stripping layers of code towards a minimal reproducer: here it is. It works or doesn't work depending on whether proj is linked explicitly, or via MagPlus:

$ cat tc.cc
#include <magics/ProjP.h>

int main() {
    magics::ProjP p("EPSG:4326", "+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +over +units=m +no_defs");
    return 0;
}

$ g++ -o tc  tc.cc -I/usr/include/magics  -lMagPlus
$ ./tc
proj_create: Open of /lib64/../share/proj/proj.db failed
proj_create: no database context specified
terminate called after throwing an instance of 'magics::MagicsException'
  what():  ProjP: cannot create crs to crs from [EPSG:4326] to [+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +over +units=m +no_defs]
Aborted (core dumped)

$ g++ -o tc  tc.cc -I/usr/include/magics -lproj  -lMagPlus
$ ./tc

What is going on here?

A difference between the two is the path used to link to libproj.so:

$ ldd ./tc | grep proj
    libproj.so.19 => /lib64/libproj.so.19 (0x00007fd4919fb000)
$ g++ -o tc  tc.cc -I/usr/include/magics   -lMagPlus
$ ldd ./tc | grep proj
    libproj.so.19 => /lib64/../lib64/libproj.so.19 (0x00007f6d1051b000)

Common sense screams that this should not matter, but we chased an intuition and found that one of the ways proj looks for its database is relative to its shared library.

Indeed, gdb in hand, that dladdr call returns /lib64/../lib64/libproj.so.19.

From /lib64/../lib64/libproj.so.19, proj strips two paths from the end, presumably to pass from something like /something/usr/lib/libproj.so to /something/usr.

So, dladdr returns /lib64/../lib64/libproj.so.19, which becomes /lib64/../, which becomes /lib64/../share/proj/proj.db, which exists on the file system and is used as a path to the database.

But depending how you look at it, that path might or might not be valid: it passes the stat(2) check that stops the lookup for candidate paths, but sqlite is unable to open it.

Why does the other path work?

By linking libproj.so in the other way, dladdr returns /lib64/libproj.so.19, which becomes /share/proj/proj.db, which doesn't exist, which triggers a fallback to a PROJ_LIB constant defined at compile time, which is a path that works no matter how you look at it.

Why that weird path with libMagPlus?

To complete the picture, we found that libMagPlus.so is packaged with a rpath set, which is known to cause trouble

# readelf -d /usr/lib64/libMagPlus.so|grep rpath
 0x000000000000000f (RPATH)              Library rpath: [$ORIGIN/../lib64]

The workaround

We found that one can set PROJ_LIB in the environment to override the normal proj database lookup. Building on that, we came up with a simple way to override it on Fedora 34 only:

    if distro is not None and distro.linux_distribution()[:2] == ("Fedora", "34") and "PROJ_LIB" not in os.environ:
         self.env_overrides["PROJ_LIB"] = "/usr/share/proj/"

This has been a most edifying and educational debugging session, with only the necessary modicum of curses and swearwords. Working in a team of excellent people really helps.

help2man is quite nice for autogenerating manpages from command line help, making sure that they stay up to date as command line options evolve.

It works quite well, except for commands with subcommands, like Python programs that use argparse's add_subparser.

So, here's a quick hack that calls help2man for each subcommand, and stitches everything together in a simple manpage.

#!/usr/bin/python3

import re
import shutil
import sys
import subprocess
import tempfile

# TODO: move to argparse
command = sys.argv[1]

# Use setup.py to get the program version
res = subprocess.run([sys.executable, "setup.py", "--version"], stdout=subprocess.PIPE, text=True, check=True)
version = res.stdout.strip()

# Call the main commandline help to get a list of subcommands
res = subprocess.run([sys.executable, command, "--help"], stdout=subprocess.PIPE, text=True, check=True)
subcommands = re.sub(r'^.+\{(.+)\}.+$', r'\1', res.stdout, flags=re.DOTALL).split(',')

# Generate a help2man --include file with an extra section for each subcommand
with tempfile.NamedTemporaryFile("wt") as tf:
    print("[>DESCRIPTION]", file=tf)

    for subcommand in subcommands:
        res = subprocess.run(
                ["help2man", f"--name={command}", "--section=1",
                 "--no-info", "--version-string=dummy", f"./{command} {subcommand}"],
                stdout=subprocess.PIPE, text=True, check=True)
        subcommand_doc = re.sub(r'^.+.SH DESCRIPTION', '', res.stdout, flags=re.DOTALL)
        print(".SH ", subcommand.upper(), " SUBCOMMAND", file=tf)
        tf.write(subcommand_doc)

    with open(f"{command}.1.in", "rt") as fd:
        shutil.copyfileobj(fd, tf)

    tf.flush()

    # Call help2man on the main command line help, with the extra include file
    # we just generated
    subprocess.run(
            ["help2man", f"--include={tf.name}", f"--name={command}",
             "--section=1", "--no-info", f"--version-string={version}",
             "--output=arkimaps.1", "./arkimaps"],
            check=True)

I had to package a nontrivial Python codebase, and I needed to put dependencies in setup.py.

I could do git grep -h import | sort -u, then review the output by hand, but I lacked the motivation for it. Much better to take a stab at solving the general problem

The result is at https://github.com/spanezz/python-devel-tools.

One fun part is scanning a directory tree, using ast to find import statements scattered around the code:

class Scanner:
    def __init__(self):
        self.names: Set[str] = set()

    def scan_dir(self, root: str):
        for dirpath, dirnames, filenames, dir_fd in os.fwalk(root):
            for fn in filenames:
                if fn.endswith(".py"):
                    with dirfd_open(fn, dir_fd=dir_fd) as fd:
                        self.scan_file(fd, os.path.join(dirpath, fn))
                st = os.stat(fn, dir_fd=dir_fd)
                if st.st_mode & (stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH):
                    with dirfd_open(fn, dir_fd=dir_fd) as fd:
                        try:
                            lead = fd.readline()
                        except UnicodeDecodeError:
                            continue
                        if re_python_shebang.match(lead):
                            fd.seek(0)
                            self.scan_file(fd, os.path.join(dirpath, fn))

    def scan_file(self, fd: TextIO, pathname: str):
        log.info("Reading file %s", pathname)
        try:
            tree = ast.parse(fd.read(), pathname)
        except SyntaxError as e:
            log.warning("%s: file cannot be parsed", pathname, exc_info=e)
            return

        self.scan_tree(tree)

    def scan_tree(self, tree: ast.AST):
        for stm in tree.body:
            if isinstance(stm, ast.Import):
                for alias in stm.names:
                    if not isinstance(alias.name, str):
                        print("NAME", repr(alias.name), stm)
                    self.names.add(alias.name)
            elif isinstance(stm, ast.ImportFrom):
                if stm.module is not None:
                    self.names.add(stm.module)
            elif hasattr(stm, "body"):
                self.scan_tree(stm)

Another fun part is grouping the imported module names by where in sys.path they have been found:

    scanner = Scanner()
    scanner.scan_dir(args.dir)

    sys.path.append(args.dir)
    by_sys_path: Dict[str, List[str]] = collections.defaultdict(list)
    for name in sorted(scanner.names):
        spec = importlib.util.find_spec(name)
        if spec is None or spec.origin is None:
            by_sys_path[""].append(name)
        else:
            for sp in sys.path:
                if spec.origin.startswith(sp):
                    by_sys_path[sp].append(name)
                    break
            else:
                by_sys_path[spec.origin].append(name)

    for sys_path, names in sorted(by_sys_path.items()):
        print(f"{sys_path or 'unidentified'}:")
        for name in names:
            print(f"  {name}")

An example. It's kind of nice how it can at least tell apart stdlib modules so one doesn't need to read through those:

$ ./scan-imports …/himblick
unidentified:
  changemonitor
  chroot
  cmdline
  mediadir
  player
  server
  settings
  static
  syncer
  utils
…/himblick:
  himblib.cmdline
  himblib.host_setup
  himblib.player
  himblib.sd
/usr/lib/python3.9:
  __future__
  argparse
  asyncio
  collections
  configparser
  contextlib
  datetime
  io
  json
  logging
  mimetypes
  os
  pathlib
  re
  secrets
  shlex
  shutil
  signal
  subprocess
  tempfile
  textwrap
  typing
/usr/lib/python3/dist-packages:
  asyncssh
  parted
  progressbar
  pyinotify
  setuptools
  tornado
  tornado.escape
  tornado.httpserver
  tornado.ioloop
  tornado.netutil
  tornado.web
  tornado.websocket
  yaml
built-in:
  sys
  time

Maybe such a tool already exists and works much better than this? From a quick search I didn't find it, and it was fun to (re)invent it.

Updates:

Jakub Wilk pointed out to an old python-modules script that finds Debian dependencies.

The AST scanning code should be refactored to use ast.NodeVisitor.

I had this nightmare where I had a very, very important confcall.

I joined with Chrome. Chrome said Failed to access your microphone - Cannot use microphone for an unknown reason. Could not start audio source.

I joined with Firefox. Firefox chose Monitor of Built-in Audio Analog Stereo as a microphone, and did not let me change it. Not in the browser, not in pavucontrol.

I joined with the browser on my phone, and the webpage said This meeting needs to use your microphone and camera. Select *Allow* when your browser asks for permissions. But the question never came.

I could hear people talking. I had very important things to say. I tried typing them in the chat window, but they weren't seeing it. The meeting ended. I was on the verge of tears.

Tell me, Mr. Anderson, what good is a phone call when you are unable to speak?

Since this nightmare happened for real, including the bit about tears in the end, let's see that it doesn't happen again. I should now have three working systems, which hopefully won't all break again all at the same time.

Fixing Chrome

I can reproduce this reliably, on Bullseye's standard Chromium 90.0.4430.212-1, just launched on an empty profile, no extensions.

The webpage has camera and microphone allowed. Chrome doesn't show up in the recording tab of pulseaudio. Nothing on Chrome's stdout/stderr.

JavaScript console has:

Logger.js:154 2021-09-10Txx:xx:xx.xxxZ [features/base/tracks] Failed to create local tracks
Array(2)
DOMException: Could not start audio source

I found the answer here:

I had the similar problem once with chromium. i could solve it by switching in preferences->microphone-> from "default" to "intern analog stereo".

Opening the little popup next to the microphone/mute button allows choosing other microphones, which work. Only "Same as system (Default)" does not work.

Fixing Firefox

I have firefox-esr 78.13.0esr-1~deb11u1. In Jitsi, microphone selection is disabled on the toolbar and in the settings menu. In pavucontrol, changing the recording device for Firefox has no effect. If for some reason the wrong microphone got chosen, those are not ways of fixing it.

What I found works is to click on the camera permission icon, remove microphone permission, then reload the page. At that point Firefox will ask for permission again, and that microphone selection seems to work.

Relevant bugs: on Jitsi and on Firefox. Since this is well known (once you find the relevant issues), I'd have appreciated Jitsi at least showing a link to an explanation of workarounds on Firefox, instead of just disabling microphone selection.

Fixing Jitsi on the phone side

I really don't want to preemptively give camera and microphone permissions to my phone browser. I noticed that there's the Jitsi app on F-Droid and much as I hate to use an app when a website would work, at least in this case it's a way to keep the permission sets separate, so I installed that.

Fixing pavucontrol?

I tried to find out why I can't change input device for FireFox on pavucontrol. I only managed to find an Ask Ubuntu question with no answer and a Unix StackExchange question with no answer.

I'm creating a program that uses the web browser for its user interface, and I'm reasonably sure I'm not the first person doing this.

Normally such a problem would listen to a port on localhost, and tell the browser to connect to it. Bonus points for listening to a randomly allocated free port, so that one does not need to involve some amount of luck to get the program started.

However, using a local port still means that any user on the local machine can connect to it, which is generally a security issue.

A possible solution would be to use AF_UNIX Unix Domain Sockets, which are supported by various web servers, but as far as I understand not currently by browsers. I checked Firefox and Chrome, and they currently seem to fail to even acknowledge the use case.

I'm reasonably sure I'm not the first person doing this, and yes, it's intended as an understatement.

So, dear Lazyweb, is there a way to securely use a browser as a UI for a user's program, without exposing access to the backend to other users in the system?

Access token in the URL

Emanuele Di Giacomo suggests to add an access token to the URL that gets passed to the browser.

This would work to protect access on localhost: even if the application cannot use HTTPS, other users cannot see packets that go through the local interface, so both the access token and the session cookie that one could send afterwards would be protected.

Network namespaces

I thought about isolating server and browser in a private network namespace with something like unshare(1), but it seems to require root.

Johannes Schauer Marin Rodrigues wrote to correct that:

It's possible to unshare the network namespace by first unsharing the user namespace and thus becoming root which is possible without being root since #898446 got fixed.

For example you can run this as the normal user:

lxc-usernsexec -- lxc-unshare -s NETWORK -- ip addr

If you don't want to depend on lxc, you can write a wrapper in Perl or Python. I have a Perl implementation of that in mmdebstrap.

Firewalling

Martin Schuster wrote to suggest another option:

I had the same issue. My approach was "weird", but worked: Block /outgoing/ connections to the port, unless the uid is correct. That might be counter-intuitive, but of course all connections /to/ localhost will be done /from/ localhost also.

Something like:

iptables -A OUTPUT -p tcp -d localhost --dport 8123 -m owner --uid-owner joe -j ACCEPT

iptables -A OUTPUT -p tcp -d localhost --dport 8123 -j REJECT

User checking with /proc/net/tcp

23:37 #debian-rant < _jwilk:#debian-rant> enrico: Re https://www.enricozini.org/blog/2021/debian/run-a-webserver-for-a-specific-user-only/, on Linux you can check /proc/net/tcp to see if the connection comes from the right user. I've seen it implemented here: https://sources.debian.org/src/agedu/9723-1/httpd.c/#L389 23:37 #debian-rant < _jwilk:#debian-rant> But... 23:40 #debian-rant < _jwilk:#debian-rant> The trouble is that https://evil.example.org/ can include and the browser will happily make that request. 23:42 #debian-rant < _jwilk:#debian-rant> This is the same user from the OS point view, so /proc/net/tcp or iptables trickery doesn't help.

This is part of a series of posts on ideas for an ansible-like provisioning system, implemented in Transilience.

Unit testing some parts of Transilience, like the apt and systemd actions, or remote Mitogen connections, can really use a containerized system for testing.

To have that, I reused my work on nspawn-runner. to build a simple and very fast system of ephemeral containers, with minimal dependencies, based on systemd-nspawn and btrfs snapshots:

Setup

To be able to use systemd-nspawn --ephemeral, the chroots needs to be btrfs subvolumes. If you are not running on a btrfs filesystem, you can create one to run the tests, even on a file:

fallocate -l 1.5G testfile
/usr/sbin/mkfs.btrfs testfile
sudo mount -o loop testfile test_chroots/

I created a script to setup the test environment, here is an extract:

mkdir -p test_chroots

cat << EOF > "test_chroots/CACHEDIR.TAG"
Signature: 8a477f597d28d172789f06886806bc55
# chroots used for testing transilience, can be regenerated with make-test-chroot
EOF

btrfs subvolume create test_chroots/buster
eatmydata debootstrap --variant=minbase --include=python3,dbus,systemd buster test_chroots/buster

CACHEDIR.TAG is a nice trick to tell backup software not to bother backing up the contents of this directory, since it can be easily regenerated.

eatmydata is optional, and it speeds up debootstrap quite a bit.

Running unittest with sudo

Here's a simple helper to drop root as soon as possible, and regain it only when needed. Note that it needs $SUDO_UID and $SUDO_GID, that are set by sudo, to know which user to drop into:

class ProcessPrivs:
    """
    Drop root privileges and regain them only when needed
    """
    def __init__(self):
        self.orig_uid, self.orig_euid, self.orig_suid = os.getresuid()
        self.orig_gid, self.orig_egid, self.orig_sgid = os.getresgid()

        if "SUDO_UID" not in os.environ:
            raise RuntimeError("Tests need to be run under sudo")

        self.user_uid = int(os.environ["SUDO_UID"])
        self.user_gid = int(os.environ["SUDO_GID"])

        self.dropped = False

    def drop(self):
        """
        Drop root privileges
        """
        if self.dropped:
            return
        os.setresgid(self.user_gid, self.user_gid, 0)
        os.setresuid(self.user_uid, self.user_uid, 0)
        self.dropped = True

    def regain(self):
        """
        Regain root privileges
        """
        if not self.dropped:
            return
        os.setresuid(self.orig_suid, self.orig_suid, self.user_uid)
        os.setresgid(self.orig_sgid, self.orig_sgid, self.user_gid)
        self.dropped = False

    @contextlib.contextmanager
    def root(self):
        """
        Regain root privileges for the duration of this context manager
        """
        if not self.dropped:
            yield
        else:
            self.regain()
            try:
                yield
            finally:
                self.drop()

    @contextlib.contextmanager
    def user(self):
        """
        Drop root privileges for the duration of this context manager
        """
        if self.dropped:
            yield
        else:
            self.drop()
            try:
                yield
            finally:
                self.regain()


privs = ProcessPrivs()
privs.drop()

As soon as this module is loaded, root privileges are dropped, and can be regained for as little as possible using a handy context manager:

   with privs.root():
       subprocess.run(["systemd-run", ...], check=True, capture_output=True)

Using the chroot from test cases

The infrastructure to setup and spin down ephemeral machine is relatively simple, once one has worked out the nspawn incantations:

class Chroot:
    """
    Manage an ephemeral chroot
    """
    running_chroots: Dict[str, "Chroot"] = {}

    def __init__(self, name: str, chroot_dir: Optional[str] = None):
        self.name = name
        if chroot_dir is None:
            self.chroot_dir = self.get_chroot_dir(name)
        else:
            self.chroot_dir = chroot_dir
        self.machine_name = f"transilience-{uuid.uuid4()}"

    def start(self):
        """
        Start nspawn on this given chroot.

        The systemd-nspawn command is run contained into its own unit using
        systemd-run
        """
        unit_config = [
            'KillMode=mixed',
            'Type=notify',
            'RestartForceExitStatus=133',
            'SuccessExitStatus=133',
            'Slice=machine.slice',
            'Delegate=yes',
            'TasksMax=16384',
            'WatchdogSec=3min',
        ]

        cmd = ["systemd-run"]
        for c in unit_config:
            cmd.append(f"--property={c}")

        cmd.extend((
            "systemd-nspawn",
            "--quiet",
            "--ephemeral",
            f"--directory={self.chroot_dir}",
            f"--machine={self.machine_name}",
            "--boot",
            "--notify-ready=yes"))

        log.info("%s: starting machine using image %s", self.machine_name, self.chroot_dir)

        log.debug("%s: running %s", self.machine_name, " ".join(shlex.quote(c) for c in cmd))
        with privs.root():
            subprocess.run(cmd, check=True, capture_output=True)
        log.debug("%s: started", self.machine_name)
        self.running_chroots[self.machine_name] = self

    def stop(self):
        """
        Stop the running ephemeral containers
        """
        cmd = ["machinectl", "terminate", self.machine_name]
        log.debug("%s: running %s", self.machine_name, " ".join(shlex.quote(c) for c in cmd))
        with privs.root():
            subprocess.run(cmd, check=True, capture_output=True)
        log.debug("%s: stopped", self.machine_name)
        del self.running_chroots[self.machine_name]

    @classmethod
    def create(cls, chroot_name: str) -> "Chroot":
        """
        Start an ephemeral machine from the given master chroot
        """
        res = cls(chroot_name)
        res.start()
        return res

    @classmethod
    def get_chroot_dir(cls, chroot_name: str):
        """
        Locate a master chroot under test_chroots/
        """
        chroot_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "test_chroots", chroot_name))
        if not os.path.isdir(chroot_dir):
            raise RuntimeError(f"{chroot_dir} does not exists or is not a chroot directory")
        return chroot_dir


# We need to use atextit, because unittest won't run
# tearDown/tearDownClass/tearDownModule methods in case of KeyboardInterrupt
# and we need to make sure to terminate the nspawn containers at exit
@atexit.register
def cleanup():
    # Use a list to prevent changing running_chroots during iteration
    for chroot in list(Chroot.running_chroots.values()):
        chroot.stop()

And here's a TestCase mixin that starts a containerized systems and opens a Mitogen connection to it:

class ChrootTestMixin:
    """
    Mixin to run tests over a setns connection to an ephemeral systemd-nspawn
    container running one of the test chroots
    """
    chroot_name = "buster"

    @classmethod
    def setUpClass(cls):
        super().setUpClass()
        import mitogen
        from transilience.system import Mitogen
        cls.broker = mitogen.master.Broker()
        cls.router = mitogen.master.Router(cls.broker)
        cls.chroot = Chroot.create(cls.chroot_name)
        with privs.root():
            cls.system = Mitogen(
                    cls.chroot.name, "setns", kind="machinectl",
                    python_path="/usr/bin/python3",
                    container=cls.chroot.machine_name, router=cls.router)

    @classmethod
    def tearDownClass(cls):
        super().tearDownClass()
        cls.system.close()
        cls.broker.shutdown()
        cls.chroot.stop()

Running tests

Once the tests are set up, everything goes on as normal, except one needs to run nose2 with sudo:

sudo nose2-3

Spin up time for containers is pretty fast, and the tests drop root as soon as possible, and only regain it for as little as needed.

Also, dependencies for all this are minimal and available on most systems, and the setup instructions seem pretty straightforward

This is part of a series of posts on ideas for an ansible-like provisioning system, implemented in Transilience.

Mitogen is a great library, but scarily complicated, and I've been wondering how hard it would be to make alternative connection methods for Transilience.

Here's a wild idea: can I package a whole Transilience playbook, plus dependencies, in a zipapp, then send the zipapp to the machine to be provisioned, and run it locally?

It turns out I can.

Creating the zipapp

This is somewhat hackish, but until I can rely on Python 3.9's improved importlib.resources module, I cannot think of a better way:

    def zipapp(self, target: str, interpreter=None):
        """
        Bundle this playbook into a self-contained zipapp
        """
        import zipapp
        import jinja2
        import transilience
        if interpreter is None:
            interpreter = sys.executable

        if getattr(transilience.__loader__, "archive", None):
            # Recursively iterating module directories requires Python 3.9+
            raise NotImplementedError("Cannot currently create a zipapp from a zipapp")

        with tempfile.TemporaryDirectory() as workdir:
            # Copy transilience
            shutil.copytree(os.path.dirname(__file__), os.path.join(workdir, "transilience"))
            # Copy jinja2
            shutil.copytree(os.path.dirname(jinja2.__file__), os.path.join(workdir, "jinja2"))
            # Copy argv[0] as __main__.py
            shutil.copy(sys.argv[0], os.path.join(workdir, "__main__.py"))
            # Copy argv[0]/roles
            role_dir = os.path.join(os.path.dirname(sys.argv[0]), "roles")
            if os.path.isdir(role_dir):
                shutil.copytree(role_dir, os.path.join(workdir, "roles"))
            # Turn everything into a zipapp
            zipapp.create_archive(workdir, target, interpreter=interpreter, compressed=True)

Since the zipapp contains not just the playbook, the roles, and the roles' assets, but also Transilience and Jinja2, it can run on any system that has a Python 3.7+ interpreter, and nothing else!

I added it to the standard set of playbook command line options, so any Transilience playbook can turn itself into a self-contained zipapp:

$ ./provision --help
usage: provision [-h] [-v] [--debug] [-C] [--local LOCAL]
                 [--ansible-to-python role | --ansible-to-ast role | --zipapp file.pyz]
[...]
  --zipapp file.pyz     bundle this playbook in a self-contained executable
                        python zipapp

Loading assets from the zipapp

I had to create ZipFile varieties of some bits of infrastructure in Transilience, to load templates, files, and Ansible yaml files from zip files.

You can see above a way to detect if a module is loaded from a zipfile: check if the module's __loader__ attribute has an archive attribute.

Here's a Jinja2 template loader that looks into a zip:

class ZipLoader(jinja2.BaseLoader):
    def __init__(self, archive: zipfile.ZipFile, root: str):
        self.zipfile = archive
        self.root = root

    def get_source(self, environment: jinja2.Environment, template: str):
        path = os.path.join(self.root, template)
        with self.zipfile.open(path, "r") as fd:
            source = fd.read().decode()
        return source, None, lambda: True

I also created a FileAsset abstract interface to represent a local file, and had Role.lookup_file return an appropriate instance:

    def lookup_file(self, path: str) -> str:
        """
        Resolve a pathname inside the place where the role assets are stored.
        Returns a pathname to the file
        """
        if self.role_assets_zipfile is not None:
            return ZipFileAsset(self.role_assets_zipfile, os.path.join(self.role_assets_root, path))
        else:
            return LocalFileAsset(os.path.join(self.role_assets_root, path))

An interesting side effect of having smarter local file accessors is that I can cache the contents of small files and transmit them to the remote host together with the other action parameters, saving a potential network round trip for each builtin.copy action that has a small source.

The result

The result is kind of fun:

$ time ./provision --zipapp test.pyz

real    0m0.203s
user    0m0.174s
sys 0m0.029s

$ time scp test.pyz root@test:
test.pyz                                                                                                         100%  528KB 388.9KB/s   00:01

real    0m1.576s
user    0m0.010s
sys 0m0.007s

And on the remote:

# time ./test.pyz --local=test
2021-06-29 18:05:41,546 test: [connected 0.000s]
[...]
2021-06-29 18:12:31,555 test: 88 total actions in 0.00ms: 87 unchanged, 0 changed, 1 skipped, 0 failed, 0 not executed.

real    0m0.979s
user    0m0.783s
sys 0m0.172s

Compare with a Mitogen run:

$ time PYTHONPATH=../transilience/ ./provision
2021-06-29 18:13:44 test: [connected 0.427s]
[...]
2021-06-29 18:13:46 test: 88 total actions in 2.50s: 87 unchanged, 0 changed, 1 skipped, 0 failed, 0 not executed.

real    0m2.697s
user    0m0.856s
sys 0m0.042s

From a single test run, not a good benchmark, it's 0.203 + 1.576 + 0.979 = 2.758s with the zipapp and 2.697s with Mitogen. Even if I've been lucky, it's a similar order of magnitude.

What can I use this for?

This was mostly a fun hack.

It could however be the basis for a Fabric-based connector, or a clusterssh-based connector, or for bundling a Transilience playbook into an installation image, or to add a provisioning script to the boot partition of a Raspberry Pi. It looks like an interesting trick to have up one's sleeve.

One could even build an Ansible-based connector(!) in which a simple Ansible playbook, with no facts gathering, is used to build the zipapp, push it to remote systems and run it. That would be the wackiest way of speeding up Ansible, ever!

Next: using Systemd containers with unittest, for Transilience's test suite.