Latest posts for tag debian
Anarcat's "procmail considered harmful" post convinced me to get my act together and finally migrate my venerable procmail based setup to sieve.
My setup was nontrivial, so I migrated with an intermediate step in which sieve
scripts would by default pipe everything to procmail, which allowed me to
slowly move rules from procmailrc
to sieve until nothing remained in
procmailrc
.
Here's what I did.
Literature review
https://brokkr.net/2019/10/31/lets-do-dovecot-slowly-and-properly-part-3-lmtp/ has a guide quite aligned with current Debian, and could be a starting point to get an idea of the work to do.
https://wiki.dovecot.org/HowTo/PostfixDovecotLMTP is way more terse, but more aligned with my intentions. Reading the former helped me in understanding the latter.
https://datatracker.ietf.org/doc/html/rfc5228 has the full Sieve syntax.
https://doc.dovecot.org/configuration_manual/sieve/pigeonhole_sieve_interpreter/ has the list of Sieve features supported by Dovecot.
https://doc.dovecot.org/settings/pigeonhole/ has the reference on Dovecot's sieve implementation.
https://raw.githubusercontent.com/dovecot/pigeonhole/master/doc/rfc/spec-bosch-sieve-extprograms.txt is the hard to find full reference for the functions introduced by the extprograms plugin.
Debugging tools:
- doveconf to dump dovecot's configuration to see if what it understands matches what I mean
- sieve-test
parses sieve scripts:
sieve-test file.sieve /dev/null
is a quick and dirty syntax check
Backup of all mails processed
One thing I did with procmail was to generate a monthly mailbox with all incoming email, with something like this:
BACKUP="/srv/backupts/test-`date +%Y-%m-d`.mbox"
:0c
$BACKUP
I did not find an obvious way in sieve to create montly mailboxes, so I
redesigned that system using Postfix's
always_bcc
feature, piping everything to an archive user.
I'll then recreate the monthly archiving using a chewmail script that I can simply run via cron.
Configure dovecot
apt install dovecot-sieve dovecot-lmtpd
I added this to the local dovecot configuration:
service lmtp {
unix_listener /var/spool/postfix/private/dovecot-lmtp {
user = postfix
group = postfix
mode = 0666
}
}
protocol lmtp {
mail_plugins = $mail_plugins sieve
}
plugin {
sieve = file:~/.sieve;active=~/.dovecot.sieve
}
This makes Dovecot ready to receive mail from Postfix via a lmtp unix socket created in Postfix's private chroot.
It also activates the sieve plugin, and uses ~/.sieve
as a sieve script.
The script can be a file or a directory; if it is a directory,
~/.dovecot.sieve
will be a symlink pointing to the .sieve
file to run.
This is a feature I'm not yet using, but if one day I want to try enabling UIs to edit sieve scripts, that part is ready.
Delegate to procmail
To make sieve scripts that delegate to procmail, I enabled the
sieve_extprograms
plugin:
plugin {
sieve = file:~/.sieve;active=~/.dovecot.sieve
+ sieve_plugins = sieve_extprograms
+ sieve_extensions +vnd.dovecot.pipe
+ sieve_pipe_bin_dir = /usr/local/lib/dovecot/sieve-pipe
+ sieve_trace_dir = ~/.sieve-trace
+ sieve_trace_level = matching
+ sieve_trace_debug = yes
}
and then created a script for it:
mkdir -p /usr/local/lib/dovecot/sieve-pipe/
(echo "#!/bin/sh'; echo "exec /usr/bin/procmail") > /usr/local/lib/dovecot/sieve-pipe/procmail
chmod 0755 /usr/local/lib/dovecot/sieve-pipe/procmail
And I can have a sieve script that delegates processing to procmail:
require "vnd.dovecot.pipe";
pipe "procmail";
Activate the postfix side
These changes switched local delivery over to Dovecot:
--- a/roles/mailserver/templates/dovecot.conf
+++ b/roles/mailserver/templates/dovecot.conf
@@ -25,6 +25,8 @@
…
+auth_username_format = %Ln
+
…
diff --git a/roles/mailserver/templates/main.cf b/roles/mailserver/templates/main.cf
index d2c515a..d35537c 100644
--- a/roles/mailserver/templates/main.cf
+++ b/roles/mailserver/templates/main.cf
@@ -64,8 +64,7 @@ virtual_alias_domains =
…
-mailbox_command = procmail -a "$EXTENSION"
-mailbox_size_limit = 0
+mailbox_transport = lmtp:unix:private/dovecot-lmtp
…
Without auth_username_format = %Ln
dovecot won't be able to understand
usernames sent by postfix in my specific setup.
Moving rules over to sieve
This is mostly straightforward, with the luxury of being able to do it a bit at a time.
The last tricky bit was how to call spamc
from sieve, as in some situations I
reduce system load by running the spamfilter only on a prefiltered selection of
incoming emails.
For this I enabled the filter
directive in sieve:
plugin {
sieve = file:~/.sieve;active=~/.dovecot.sieve
sieve_plugins = sieve_extprograms
- sieve_extensions +vnd.dovecot.pipe
+ sieve_extensions +vnd.dovecot.pipe +vnd.dovecot.filter
sieve_pipe_bin_dir = /usr/local/lib/dovecot/sieve-pipe
+ sieve_filter_bin_dir = /usr/local/lib/dovecot/sieve-filter
sieve_trace_dir = ~/.sieve-trace
sieve_trace_level = matching
sieve_trace_debug = yes
}
Then I created a filter script:
mkdir -p /usr/local/lib/dovecot/sieve-filter/"
(echo "#!/bin/sh'; echo "exec /usr/bin/spamc") > /usr/local/lib/dovecot/sieve-filter/spamc
chmod 0755 /usr/local/lib/dovecot/sieve-filter/spamc
And now what was previously:
:0 fw
| /usr/bin/spamc
:0
* ^X-Spam-Status: Yes
.spam/
Can become:
require "vnd.dovecot.filter";
require "fileinto";
filter "spamc";
if header :contains "x-spam-level" "**************" {
discard;
} elsif header :matches "X-Spam-Status" "Yes,*" {
fileinto "spam";
}
Updates
Ansgar mentioned that it's possible to replicate the monthly mailbox using the variables and date extensions, with a hacky trick from the extensions' RFC:
require "date"
require "variables"
if currentdate :matches "month" "*" { set "month" "${1}"; }
if currentdate :matches "year" "*" { set "year" "${1}"; }
fileinto :create "${month}-${year}";
This morning we realised that a test case failed on Fedora 34 only (the link is in Italian) and we set to debugging.
The initial analysis
This is the initial reproducer:
$ PROJ_DEBUG=3 python setup.py test
test_recipe (tests.test_litota3.TestLITOTA3NordArkimetIFS) ... pj_open_lib(proj.db): call fopen(/lib64/../share/proj/proj.db) - succeeded
proj_create: Open of /lib64/../share/proj/proj.db failed
pj_open_lib(proj.db): call fopen(/lib64/../share/proj/proj.db) - succeeded
proj_create: no database context specified
Cannot instantiate source_crs
EXCEPTION in py_coast(): ProjP: cannot create crs to crs from [EPSG:4326] to [+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +over +units=m +no_defs]
ERROR
Note that opening /lib64/../share/proj/proj.db
sometimes succeeds, sometimes
fails. It's some kind of Schrödinger path, which works or not depending on how
you observe it:
# ls -lad /lib64
lrwxrwxrwx 1 1000 1000 9 Jan 26 2021 /lib64 -> usr/lib64
$ ls -la /lib64/../share/proj/proj.db
-rw-r--r-- 1 root root 8925184 Jan 28 2021 /lib64/../share/proj/proj.db
$ cd /lib64/../share/proj/
$ cd /lib64
$ cd ..
$ cd share
-bash: cd: share: No such file or directory
And indeed, stat(2)
finds it, and sqlite doesn't (the file is a sqlite
database):
$ stat /lib64/../share/proj/proj.db
File: /lib64/../share/proj/proj.db
Size: 8925184 Blocks: 17432 IO Block: 4096 regular file
Device: 33h/51d Inode: 56907 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2021-11-08 14:09:12.334350779 +0100
Modify: 2021-01-28 05:38:11.000000000 +0100
Change: 2021-11-08 13:42:51.758874327 +0100
Birth: 2021-11-08 13:42:51.710874051 +0100
$ sqlite3 /lib64/../share/proj/proj.db
Error: unable to open database "/lib64/../share/proj/proj.db": unable to open database file
A minimal reproducer
Later on we started stripping layers of code towards a minimal reproducer: here it is. It works or doesn't work depending on whether proj is linked explicitly, or via MagPlus:
$ cat tc.cc
#include <magics/ProjP.h>
int main() {
magics::ProjP p("EPSG:4326", "+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +over +units=m +no_defs");
return 0;
}
$ g++ -o tc tc.cc -I/usr/include/magics -lMagPlus
$ ./tc
proj_create: Open of /lib64/../share/proj/proj.db failed
proj_create: no database context specified
terminate called after throwing an instance of 'magics::MagicsException'
what(): ProjP: cannot create crs to crs from [EPSG:4326] to [+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +over +units=m +no_defs]
Aborted (core dumped)
$ g++ -o tc tc.cc -I/usr/include/magics -lproj -lMagPlus
$ ./tc
What is going on here?
A difference between the two is the path used to link to libproj.so:
$ ldd ./tc | grep proj
libproj.so.19 => /lib64/libproj.so.19 (0x00007fd4919fb000)
$ g++ -o tc tc.cc -I/usr/include/magics -lMagPlus
$ ldd ./tc | grep proj
libproj.so.19 => /lib64/../lib64/libproj.so.19 (0x00007f6d1051b000)
Common sense screams that this should not matter, but we chased an intuition and found that one of the ways proj looks for its database is relative to its shared library.
Indeed, gdb in hand, that dladdr
call returns /lib64/../lib64/libproj.so.19
.
From /lib64/../lib64/libproj.so.19
, proj strips two paths from the end,
presumably to pass from something like /something/usr/lib/libproj.so
to /something/usr
.
So, dladdr
returns /lib64/../lib64/libproj.so.19
, which becomes
/lib64/../
, which becomes /lib64/../share/proj/proj.db
, which exists on the
file system and is used as a path to the database.
But depending how you look at it, that path might or might not be valid: it
passes the stat(2)
check that stops the lookup for candidate paths, but
sqlite is unable to open it.
Why does the other path work?
By linking libproj.so in the other way, dladdr
returns
/lib64/libproj.so.19
, which becomes /share/proj/proj.db
, which doesn't
exist, which triggers a fallback
to a PROJ_LIB
constant defined at compile time, which is a path that works no
matter how you look at it.
Why that weird path with libMagPlus?
To complete the picture, we found that libMagPlus.so
is packaged with a
rpath set, which is known to cause
trouble
# readelf -d /usr/lib64/libMagPlus.so|grep rpath
0x000000000000000f (RPATH) Library rpath: [$ORIGIN/../lib64]
The workaround
We found that one can set PROJ_LIB
in the environment to override the normal
proj database lookup. Building on that, we came up with a simple way to
override it on Fedora 34 only:
if distro is not None and distro.linux_distribution()[:2] == ("Fedora", "34") and "PROJ_LIB" not in os.environ:
self.env_overrides["PROJ_LIB"] = "/usr/share/proj/"
This has been a most edifying and educational debugging session, with only the necessary modicum of curses and swearwords. Working in a team of excellent people really helps.
help2man is quite nice for autogenerating manpages from command line help, making sure that they stay up to date as command line options evolve.
It works quite well, except for commands with subcommands, like Python programs
that use argparse's add_subparser
.
So, here's a quick hack that calls help2man for each subcommand, and stitches everything together in a simple manpage.
#!/usr/bin/python3
import re
import shutil
import sys
import subprocess
import tempfile
# TODO: move to argparse
command = sys.argv[1]
# Use setup.py to get the program version
res = subprocess.run([sys.executable, "setup.py", "--version"], stdout=subprocess.PIPE, text=True, check=True)
version = res.stdout.strip()
# Call the main commandline help to get a list of subcommands
res = subprocess.run([sys.executable, command, "--help"], stdout=subprocess.PIPE, text=True, check=True)
subcommands = re.sub(r'^.+\{(.+)\}.+$', r'\1', res.stdout, flags=re.DOTALL).split(',')
# Generate a help2man --include file with an extra section for each subcommand
with tempfile.NamedTemporaryFile("wt") as tf:
print("[>DESCRIPTION]", file=tf)
for subcommand in subcommands:
res = subprocess.run(
["help2man", f"--name={command}", "--section=1",
"--no-info", "--version-string=dummy", f"./{command} {subcommand}"],
stdout=subprocess.PIPE, text=True, check=True)
subcommand_doc = re.sub(r'^.+.SH DESCRIPTION', '', res.stdout, flags=re.DOTALL)
print(".SH ", subcommand.upper(), " SUBCOMMAND", file=tf)
tf.write(subcommand_doc)
with open(f"{command}.1.in", "rt") as fd:
shutil.copyfileobj(fd, tf)
tf.flush()
# Call help2man on the main command line help, with the extra include file
# we just generated
subprocess.run(
["help2man", f"--include={tf.name}", f"--name={command}",
"--section=1", "--no-info", f"--version-string={version}",
"--output=arkimaps.1", "./arkimaps"],
check=True)
I had this nightmare where I had a very, very important confcall.
I joined with Chrome. Chrome said Failed to access your microphone -
Cannot use microphone for an unknown reason. Could not start audio source
.
I joined with Firefox. Firefox chose Monitor of Built-in Audio Analog Stereo
as a microphone, and did not let me change it. Not in the browser, not in
pavucontrol.
I joined with the browser on my phone, and the webpage said This meeting needs
to use your microphone and camera. Select *Allow* when your browser asks for
permissions
. But the question never came.
I could hear people talking. I had very important things to say. I tried typing them in the chat window, but they weren't seeing it. The meeting ended. I was on the verge of tears.
Tell me, Mr. Anderson, what good is a phone call when you are unable to speak?
Since this nightmare happened for real, including the bit about tears in the end, let's see that it doesn't happen again. I should now have three working systems, which hopefully won't all break again all at the same time.
Fixing Chrome
I can reproduce this reliably, on Bullseye's standard Chromium
90.0.4430.212-1
, just launched on an empty profile, no extensions.
The webpage has camera and microphone allowed. Chrome doesn't show up in the recording tab of pulseaudio. Nothing on Chrome's stdout/stderr.
JavaScript console has:
Logger.js:154 2021-09-10Txx:xx:xx.xxxZ [features/base/tracks] Failed to create local tracks
Array(2)
DOMException: Could not start audio source
I found the answer here:
I had the similar problem once with chromium. i could solve it by switching in preferences->microphone-> from "default" to "intern analog stereo".
Opening the little popup next to the microphone/mute button allows choosing other microphones, which work. Only "Same as system (Default)" does not work.
Fixing Firefox
I have firefox-esr 78.13.0esr-1~deb11u1
. In Jitsi, microphone selection is
disabled on the toolbar and in the settings menu. In pavucontrol, changing the
recording device for Firefox has no effect. If for some reason the wrong
microphone got chosen, those are not ways of fixing it.
What I found works is to click on the camera permission icon, remove microphone permission, then reload the page. At that point Firefox will ask for permission again, and that microphone selection seems to work.
Relevant bugs: on Jitsi and on Firefox. Since this is well known (once you find the relevant issues), I'd have appreciated Jitsi at least showing a link to an explanation of workarounds on Firefox, instead of just disabling microphone selection.
Fixing Jitsi on the phone side
I really don't want to preemptively give camera and microphone permissions to my phone browser. I noticed that there's the Jitsi app on F-Droid and much as I hate to use an app when a website would work, at least in this case it's a way to keep the permission sets separate, so I installed that.
Fixing pavucontrol?
I tried to find out why I can't change input device for FireFox on pavucontrol. I only managed to find an Ask Ubuntu question with no answer and a Unix StackExchange question with no answer.
I'm creating a program that uses the web browser for its user interface, and I'm reasonably sure I'm not the first person doing this.
Normally such a problem would listen to a port on localhost
, and tell the
browser to connect to it. Bonus points for
listening to a randomly allocated free port,
so that one does not need to involve some amount of luck to get the program
started.
However, using a local port still means that any user on the local machine can connect to it, which is generally a security issue.
A possible solution would be to use AF_UNIX
Unix Domain Sockets, which are
supported by various web servers, but as far as I understand not currently by
browsers. I checked Firefox and Chrome,
and they currently seem to fail to even acknowledge the use case.
I'm reasonably sure I'm not the first person doing this, and yes, it's intended as an understatement.
So, dear Lazyweb, is there a way to securely use a browser as a UI for a user's program, without exposing access to the backend to other users in the system?
Access token in the URL
Emanuele Di Giacomo suggests to add an access token to the URL that gets passed to the browser.
This would work to protect access on localhost: even if the application cannot use HTTPS, other users cannot see packets that go through the local interface, so both the access token and the session cookie that one could send afterwards would be protected.
Network namespaces
I thought about isolating server and browser in a private network namespace
with something like unshare(1)
, but it seems to require root.
Johannes Schauer Marin Rodrigues wrote to correct that:
It's possible to unshare the network namespace by first unsharing the user namespace and thus becoming root which is possible without being root since #898446 got fixed.
For example you can run this as the normal user:
lxc-usernsexec -- lxc-unshare -s NETWORK -- ip addr
If you don't want to depend on lxc, you can write a wrapper in Perl or Python. I have a Perl implementation of that in mmdebstrap.
Firewalling
Martin Schuster wrote to suggest another option:
I had the same issue. My approach was "weird", but worked: Block /outgoing/ connections to the port, unless the uid is correct. That might be counter-intuitive, but of course all connections /to/ localhost will be done /from/ localhost also.
Something like:
iptables -A OUTPUT -p tcp -d localhost --dport 8123 -m owner --uid-owner joe -j ACCEPT
iptables -A OUTPUT -p tcp -d localhost --dport 8123 -j REJECT
User checking with /proc/net/tcp
23:37 #debian-rant < _jwilk:#debian-rant> enrico: Re https://www.enricozini.org/blog/2021/debian/run-a-webserver-for-a-specific-user-only/,
on Linux you can check /proc/net/tcp to see if the connection comes from the right user. I've seen
it implemented here: https://sources.debian.org/src/agedu/9723-1/httpd.c/#L389
23:37 #debian-rant < _jwilk:#debian-rant> But...
23:40 #debian-rant < _jwilk:#debian-rant> The trouble is that https://evil.example.org/ can include
and the browser will happily make that request.
23:42 #debian-rant < _jwilk:#debian-rant> This is the same user from the OS point view, so /proc/net/tcp or iptables trickery doesn't help.
This is part of a series of posts on ideas for an ansible-like provisioning system, implemented in Transilience.
Unit testing some parts of Transilience, like the apt and systemd actions, or remote Mitogen connections, can really use a containerized system for testing.
To have that, I reused my work on nspawn-runner. to build a simple and very fast system of ephemeral containers, with minimal dependencies, based on systemd-nspawn and btrfs snapshots:
Setup
To be able to use systemd-nspawn --ephemeral
, the chroots needs to be btrfs
subvolumes. If you are not running on a btrfs filesystem, you can create one to
run the tests, even on a file:
fallocate -l 1.5G testfile
/usr/sbin/mkfs.btrfs testfile
sudo mount -o loop testfile test_chroots/
I created a script to setup the test environment, here is an extract:
mkdir -p test_chroots cat << EOF > "test_chroots/CACHEDIR.TAG" Signature: 8a477f597d28d172789f06886806bc55 # chroots used for testing transilience, can be regenerated with make-test-chroot EOF btrfs subvolume create test_chroots/buster eatmydata debootstrap --variant=minbase --include=python3,dbus,systemd buster test_chroots/buster
CACHEDIR.TAG
is a nice trick to tell backup software not to bother backing up
the contents of this directory, since it can be easily regenerated.
eatmydata
is optional, and it speeds up debootstrap quite a bit.
Running unittest
with sudo
Here's a simple helper to drop root as soon as possible, and regain it only
when needed. Note that it needs $SUDO_UID
and $SUDO_GID
, that are set by
sudo
, to know which user to drop into:
class ProcessPrivs: """ Drop root privileges and regain them only when needed """ def __init__(self): self.orig_uid, self.orig_euid, self.orig_suid = os.getresuid() self.orig_gid, self.orig_egid, self.orig_sgid = os.getresgid() if "SUDO_UID" not in os.environ: raise RuntimeError("Tests need to be run under sudo") self.user_uid = int(os.environ["SUDO_UID"]) self.user_gid = int(os.environ["SUDO_GID"]) self.dropped = False def drop(self): """ Drop root privileges """ if self.dropped: return os.setresgid(self.user_gid, self.user_gid, 0) os.setresuid(self.user_uid, self.user_uid, 0) self.dropped = True def regain(self): """ Regain root privileges """ if not self.dropped: return os.setresuid(self.orig_suid, self.orig_suid, self.user_uid) os.setresgid(self.orig_sgid, self.orig_sgid, self.user_gid) self.dropped = False @contextlib.contextmanager def root(self): """ Regain root privileges for the duration of this context manager """ if not self.dropped: yield else: self.regain() try: yield finally: self.drop() @contextlib.contextmanager def user(self): """ Drop root privileges for the duration of this context manager """ if self.dropped: yield else: self.drop() try: yield finally: self.regain() privs = ProcessPrivs() privs.drop()
As soon as this module is loaded, root privileges are dropped, and can be regained for as little as possible using a handy context manager:
with privs.root(): subprocess.run(["systemd-run", ...], check=True, capture_output=True)
Using the chroot from test cases
The infrastructure to setup and spin down ephemeral machine is relatively simple, once one has worked out the nspawn incantations:
class Chroot: """ Manage an ephemeral chroot """ running_chroots: Dict[str, "Chroot"] = {} def __init__(self, name: str, chroot_dir: Optional[str] = None): self.name = name if chroot_dir is None: self.chroot_dir = self.get_chroot_dir(name) else: self.chroot_dir = chroot_dir self.machine_name = f"transilience-{uuid.uuid4()}" def start(self): """ Start nspawn on this given chroot. The systemd-nspawn command is run contained into its own unit using systemd-run """ unit_config = [ 'KillMode=mixed', 'Type=notify', 'RestartForceExitStatus=133', 'SuccessExitStatus=133', 'Slice=machine.slice', 'Delegate=yes', 'TasksMax=16384', 'WatchdogSec=3min', ] cmd = ["systemd-run"] for c in unit_config: cmd.append(f"--property={c}") cmd.extend(( "systemd-nspawn", "--quiet", "--ephemeral", f"--directory={self.chroot_dir}", f"--machine={self.machine_name}", "--boot", "--notify-ready=yes")) log.info("%s: starting machine using image %s", self.machine_name, self.chroot_dir) log.debug("%s: running %s", self.machine_name, " ".join(shlex.quote(c) for c in cmd)) with privs.root(): subprocess.run(cmd, check=True, capture_output=True) log.debug("%s: started", self.machine_name) self.running_chroots[self.machine_name] = self def stop(self): """ Stop the running ephemeral containers """ cmd = ["machinectl", "terminate", self.machine_name] log.debug("%s: running %s", self.machine_name, " ".join(shlex.quote(c) for c in cmd)) with privs.root(): subprocess.run(cmd, check=True, capture_output=True) log.debug("%s: stopped", self.machine_name) del self.running_chroots[self.machine_name] @classmethod def create(cls, chroot_name: str) -> "Chroot": """ Start an ephemeral machine from the given master chroot """ res = cls(chroot_name) res.start() return res @classmethod def get_chroot_dir(cls, chroot_name: str): """ Locate a master chroot under test_chroots/ """ chroot_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "test_chroots", chroot_name)) if not os.path.isdir(chroot_dir): raise RuntimeError(f"{chroot_dir} does not exists or is not a chroot directory") return chroot_dir # We need to use atextit, because unittest won't run # tearDown/tearDownClass/tearDownModule methods in case of KeyboardInterrupt # and we need to make sure to terminate the nspawn containers at exit @atexit.register def cleanup(): # Use a list to prevent changing running_chroots during iteration for chroot in list(Chroot.running_chroots.values()): chroot.stop()
And here's a TestCase
mixin that starts a containerized systems and opens a Mitogen
connection to it:
class ChrootTestMixin: """ Mixin to run tests over a setns connection to an ephemeral systemd-nspawn container running one of the test chroots """ chroot_name = "buster" @classmethod def setUpClass(cls): super().setUpClass() import mitogen from transilience.system import Mitogen cls.broker = mitogen.master.Broker() cls.router = mitogen.master.Router(cls.broker) cls.chroot = Chroot.create(cls.chroot_name) with privs.root(): cls.system = Mitogen( cls.chroot.name, "setns", kind="machinectl", python_path="/usr/bin/python3", container=cls.chroot.machine_name, router=cls.router) @classmethod def tearDownClass(cls): super().tearDownClass() cls.system.close() cls.broker.shutdown() cls.chroot.stop()
Running tests
Once the tests are set up, everything goes on as normal, except one needs to
run nose2
with sudo:
sudo nose2-3
Spin up time for containers is pretty fast, and the tests drop root as soon as possible, and only regain it for as little as needed.
Also, dependencies for all this are minimal and available on most systems, and the setup instructions seem pretty straightforward
This is part of a series of posts on ideas for an ansible-like provisioning system, implemented in Transilience.
Mitogen is a great library, but scarily complicated, and I've been wondering how hard it would be to make alternative connection methods for Transilience.
Here's a wild idea: can I package a whole Transilience playbook, plus dependencies, in a zipapp, then send the zipapp to the machine to be provisioned, and run it locally?
It turns out I can.
Creating the zipapp
This is somewhat hackish, but until I can rely on Python 3.9's improved
importlib.resources
module, I cannot think of a better way:
def zipapp(self, target: str, interpreter=None):
"""
Bundle this playbook into a self-contained zipapp
"""
import zipapp
import jinja2
import transilience
if interpreter is None:
interpreter = sys.executable
if getattr(transilience.__loader__, "archive", None):
# Recursively iterating module directories requires Python 3.9+
raise NotImplementedError("Cannot currently create a zipapp from a zipapp")
with tempfile.TemporaryDirectory() as workdir:
# Copy transilience
shutil.copytree(os.path.dirname(__file__), os.path.join(workdir, "transilience"))
# Copy jinja2
shutil.copytree(os.path.dirname(jinja2.__file__), os.path.join(workdir, "jinja2"))
# Copy argv[0] as __main__.py
shutil.copy(sys.argv[0], os.path.join(workdir, "__main__.py"))
# Copy argv[0]/roles
role_dir = os.path.join(os.path.dirname(sys.argv[0]), "roles")
if os.path.isdir(role_dir):
shutil.copytree(role_dir, os.path.join(workdir, "roles"))
# Turn everything into a zipapp
zipapp.create_archive(workdir, target, interpreter=interpreter, compressed=True)
Since the zipapp contains not just the playbook, the roles, and the roles' assets, but also Transilience and Jinja2, it can run on any system that has a Python 3.7+ interpreter, and nothing else!
I added it to the standard set of playbook command line options, so any Transilience playbook can turn itself into a self-contained zipapp:
$ ./provision --help
usage: provision [-h] [-v] [--debug] [-C] [--local LOCAL]
[--ansible-to-python role | --ansible-to-ast role | --zipapp file.pyz]
[...]
--zipapp file.pyz bundle this playbook in a self-contained executable
python zipapp
Loading assets from the zipapp
I had to create ZipFile varieties of some bits of infrastructure in Transilience, to load templates, files, and Ansible yaml files from zip files.
You can see above a way to detect if a module is loaded from a zipfile: check
if the module's __loader__
attribute has an archive
attribute.
Here's a Jinja2 template loader that looks into a zip:
class ZipLoader(jinja2.BaseLoader):
def __init__(self, archive: zipfile.ZipFile, root: str):
self.zipfile = archive
self.root = root
def get_source(self, environment: jinja2.Environment, template: str):
path = os.path.join(self.root, template)
with self.zipfile.open(path, "r") as fd:
source = fd.read().decode()
return source, None, lambda: True
I also created a FileAsset
abstract interface to represent a local file, and had Role.lookup_file
return
an appropriate instance:
def lookup_file(self, path: str) -> str:
"""
Resolve a pathname inside the place where the role assets are stored.
Returns a pathname to the file
"""
if self.role_assets_zipfile is not None:
return ZipFileAsset(self.role_assets_zipfile, os.path.join(self.role_assets_root, path))
else:
return LocalFileAsset(os.path.join(self.role_assets_root, path))
An interesting side effect of having smarter local file accessors is that I can
cache the contents of small files and transmit them to the remote host together
with the other action parameters, saving a potential network round trip for
each builtin.copy
action that has a small source.
The result
The result is kind of fun:
$ time ./provision --zipapp test.pyz
real 0m0.203s
user 0m0.174s
sys 0m0.029s
$ time scp test.pyz root@test:
test.pyz 100% 528KB 388.9KB/s 00:01
real 0m1.576s
user 0m0.010s
sys 0m0.007s
And on the remote:
# time ./test.pyz --local=test
2021-06-29 18:05:41,546 test: [connected 0.000s]
[...]
2021-06-29 18:12:31,555 test: 88 total actions in 0.00ms: 87 unchanged, 0 changed, 1 skipped, 0 failed, 0 not executed.
real 0m0.979s
user 0m0.783s
sys 0m0.172s
Compare with a Mitogen run:
$ time PYTHONPATH=../transilience/ ./provision
2021-06-29 18:13:44 test: [connected 0.427s]
[...]
2021-06-29 18:13:46 test: 88 total actions in 2.50s: 87 unchanged, 0 changed, 1 skipped, 0 failed, 0 not executed.
real 0m2.697s
user 0m0.856s
sys 0m0.042s
From a single test run, not a good benchmark, it's 0.203 + 1.576 + 0.979 =
2.758s
with the zipapp and 2.697s
with Mitogen. Even if I've been lucky,
it's a similar order of magnitude.
What can I use this for?
This was mostly a fun hack.
It could however be the basis for a Fabric-based connector, or a clusterssh-based connector, or for bundling a Transilience playbook into an installation image, or to add a provisioning script to the boot partition of a Raspberry Pi. It looks like an interesting trick to have up one's sleeve.
One could even build an Ansible-based connector(!) in which a simple Ansible playbook, with no facts gathering, is used to build the zipapp, push it to remote systems and run it. That would be the wackiest way of speeding up Ansible, ever!
Next: using Systemd containers with unittest, for Transilience's test suite.
This is part of a series of posts on ideas for an ansible-like provisioning system, implemented in Transilience.
I thought a lot of what I managed to do so far with Transilience would be impossible, but then here I am. How about Ansible conditionals? Those must be impossible, right?
Let's give it a try.
A quick recon of Ansible sources
Looking into Ansible's sources, when
expressions are
lists of strings
AND-ed together.
The expressions are Jinja2 expressions that Ansible pastes into a mini-template, renders, and checks the string that comes out.
A quick recon of Jinja2
Jinja2 has a convenient function (jinja2.Environment.compile_expression
)
that compiles a template snippet into a Python function.
It can also parse a template into an AST that can be inspected in various ways.
Evaluating Ansible conditionals in Python
Environment.compile_expression
seems to really do precisely what we need for
this, straight out of the box.
There is an issue with the concept of "defined": for Ansible it seems to mean
"the variable is present in the template context". In Transilience instead, all
variables are fields in the Role dataclass, and can be None
when not set.
This means that we need to remove variables that are set to None
before
passing the parameters to the compiled Jinjae expression:
class Conditional:
"""
An Ansible conditional expression
"""
def __init__(self, engine: template.Engine, body: str):
# Original unparsed expression
self.body: str = body
# Expression compiled to a callable
self.expression: Callable = engine.env.compile_expression(body)
def evaluate(self, ctx: Dict[str, Any]):
ctx = {name: val for name, val in ctx.items() if val is not None}
return self.expression(**ctx)
Generating Python code
Transilience does not only support running Ansible roles, but also converting them to Python code. I can keep this up by traversing the Jinja2 AST generating Python expressions.
The code is straightforward enough that I can throw in a bit of pattern matching to make some expressions more idiomatic for Python:
class Conditional:
def __init__(self, engine: template.Engine, body: str):
...
parser = jinja2.parser.Parser(engine.env, body, state='variable')
self.jinja2_ast: nodes.Node = parser.parse_expression()
def get_python_code(self) -> str:
return to_python_code(self.jinja2_ast
def to_python_code(node: nodes.Node) -> str:
if isinstance(node, nodes.Name):
if node.ctx == "load":
return f"self.{node.name}"
else:
raise NotImplementedError(f"jinja2 Name nodes with ctx={node.ctx!r} are not supported: {node!r}")
elif isinstance(node, nodes.Test):
if node.name == "defined":
return f"{to_python_code(node.node)} is not None"
elif node.name == "undefined":
return f"{to_python_code(node.node)} is None"
else:
raise NotImplementedError(f"jinja2 Test nodes with name={node.name!r} are not supported: {node!r}")
elif isinstance(node, nodes.Not):
if isinstance(node.node, nodes.Test):
# Special case match well-known structures for more idiomatic Python
if node.node.name == "defined":
return f"{to_python_code(node.node.node)} is None"
elif node.node.name == "undefined":
return f"{to_python_code(node.node.node)} is not None"
elif isinstance(node.node, nodes.Name):
return f"not {to_python_code(node.node)}"
return f"not ({to_python_code(node.node)})"
elif isinstance(node, nodes.Or):
return f"({to_python_code(node.left)} or {to_python_code(node.right)})"
elif isinstance(node, nodes.And):
return f"({to_python_code(node.left)} and {to_python_code(node.right)})"
else:
raise NotImplementedError(f"jinja2 {node.__class__} nodes are not supported: {node!r}")
Scanning for variables
Lastly, I can implement scanning conditionals for variable references to add as fields to the Role dataclass:
class FindVars(jinja2.visitor.NodeVisitor):
def __init__(self):
self.found: Set[str] = set()
def visit_Name(self, node):
if node.ctx == "load":
self.found.add(node.name)
class Conditional:
...
def list_role_vars(self) -> Sequence[str]:
fv = FindVars()
fv.visit(self.jinja2_ast)
return fv.found
The result in action
Take this simple Ansible task:
---
- name: Example task
file:
state: touch
path: /tmp/test
when: (is_test is defined and is_test) or debug is defined
Run it through ./provision --ansible-to-python test
and you get:
from __future__ import annotations
from typing import Any
from transilience import role
from transilience.actions import builtin, facts
@role.with_facts([facts.Platform])
class Role(role.Role):
# Role variables used by templates
debug: Any = None
is_test: Any = None
def all_facts_available(self):
if ((self.is_test is not None and self.is_test)
or self.debug is not None):
self.add(
builtin.file(path='/tmp/test', state='touch'),
name='Example task')
Besides one harmless set of parentheses too much, what I wasn't sure would be possible is there, right there, staring at me with a mischievous grin.
This is part of a series of posts on ideas for an Ansible-like provisioning system, implemented in Transilience.
The time has come for me to try and prototype if it's possible to load some Transilience roles from Ansible's YAML instead of Python.
The data models of Transilience and Ansible are not exactly the same. Some of the differences that come to mind:
- Ansible has a big pot of global variables; Transilience has a well defined set of role-specific variables.
- Roles in Ansible are little more than a chunk of playbook that one includes; Roles in Transilience are self-contained and isolated, support pipelined batches of tasks, and can use full Python logic.
- Transilience does not have a
template
action: the equivalent is acopy
action that uses the Role's rendering engine to render the template. - Handlers in Ansible are tasks identified by a name in a global namespace; handlers in Transilience are Roles, identified by their Python classes.
To simplify the work, I'll start from loading a single role out of Ansible, not an entire playbook.
TL;DR: scroll to the bottom of the post for the conclusion!
Loading tasks
The first problem of loading an Ansible task is to figure out which of the keys is the module name. I have so far failed to find precise reference documentation about what keyboards are used to define a task, so I'm going by guesswork, and if needed a look at Ansible's sources.
My first attempt goes by excluding all known non-module keywords:
candidates = []
for key in task_info.keys():
if key in ("name", "args", "notify"):
continue
candidates.append(key)
if len(candidates) != 1:
raise RoleNotLoadedError(f"could not find a known module in task {task_info!r}")
modname = candidates[0]
if modname.startswith("ansible.builtin."):
name = modname[16:]
else:
name = modname
This means that Ansible keywords like when
or with
will break the parsing,
and it's fine since they are not supported yet.
args
seems to carry arguments to the module, when the module main argument is
not a dict, as may happen at least with the command
module.
Task parameters
One can do all sorts of chaotic things to pass parameters to Ansible tasks: for example string lists can be lists of strings or strings with comma-separated lists, and they can be preprocesed via Jinja2 templating, and they can be complex data structures that might contain strings that need Jinja2 preprocessing.
I ended up mapping the behaviours I encountered in an AST-like class hierarchy which includes recursive complex structures.
Variables
Variables look hard: Ansible has a big free messy cauldron of global variables, and Transilience needs a predefined list of per-role variables.
However, variables are mainly used inside Jinja2 templates, and Jinja2 can parse to an Abstract Syntax Tree and has useful methods to examine its AST.
Using that, I managed with resonable effort to scan an Ansible role and
generate a list of all the variables it uses! I can then use that list,
filter out facts-specific names like ansible_domain
, and use them to add
variable definition to the Transilience roles. That is exciting!
Handlers
Before loading tasks, I load handlers as one-action roles, and index them by name. When an Ansible task notifies a handler, I can then loop up by name the roles I generated in the earlier pass, and I have all that I need.
Parsed Abstract Syntax Tree
Most of the results of all this parsing started looking like an AST, so I changed the rest of the prototype to generate an AST.
This means that, for a well defined subset of Ånsible's YAML, there exists now a tool that is able to parse it into an AST and raeson with it.
Transilience's playbooks gained a --ansible-to-ast
option to parse an Ansible
role and dump the resulting AST as JSON:
$ ./provision --help
usage: provision [-h] [-v] [--debug] [-C] [--ansible-to-python role]
[--ansible-to-ast role]
Provision my VPS
optional arguments:
[...]
-C, --check do not perform changes, but check if changes would be
needed
--ansible-to-ast role
print the AST of the given Ansible role as understood
by Transilience
The result is extremely verbose, since every parameter is itself a node in the tree, but I find it interesting.
Here is, for example, a node for an Ansible task which has a templated parameter:
{
"node": "task",
"action": "builtin.blockinfile",
"parameters": {
"path": {
"node": "parameter",
"type": "scalar",
"value": "/etc/aliases"
},
"block": {
"node": "parameter",
"type": "template_string",
"value": "root: {{postmaster}}\n{% for name, dest in aliases.items() %}\n{{name}}: {{dest}}\n{% endfor %}\n"
}
},
"ansible_yaml": {
"name": "configure /etc/aliases",
"blockinfile": {},
"notify": "reread /etc/aliases"
},
"notify": [
"RereadEtcAliases"
]
},
Here's a node for an Ansible template
task converted to Transilience's model:
{
"node": "task",
"action": "builtin.copy",
"parameters": {
"dest": {
"node": "parameter",
"type": "scalar",
"value": "/etc/dovecot/local.conf"
},
"src": {
"node": "parameter",
"type": "template_path",
"value": "dovecot.conf"
}
},
"ansible_yaml": {
"name": "configure dovecot",
"template": {},
"notify": "restart dovecot"
},
"notify": [
"RestartDovecot"
]
},
Executing
The first iteration of prototype code for executing parsed Ansible roles is a little execise in closures and dynamically generated types:
def get_role_class(self) -> Type[Role]:
# If we have handlers, instantiate role classes for them
handler_classes = {}
for name, ansible_role in self.handlers.items():
handler_classes[name] = ansible_role.get_role_class()
# Create all the functions to start actions in the role
start_funcs = []
for task in self.tasks:
start_funcs.append(task.get_start_func(handlers=handler_classes))
# Function that calls all the 'Action start' functions
def role_main(self):
for func in start_funcs:
func(self)
if self.uses_facts:
role_cls = type(self.name, (Role,), {
"start": lambda host: None,
"all_facts_available": role_main
})
role_cls = dataclass(role_cls)
role_cls = with_facts(facts.Platform)(role_cls)
else:
role_cls = type(self.name, (Role,), {
"start": role_main
})
role_cls = dataclass(role_cls)
return role_cls
Now that the parsed Ansible role is a proper AST, I'm considering redesigning that using a generic Role class that works as an AST interpreter.
Generating Python
I maintain a library that can turn an invoice into Python code, and I have a convenient AST. I can't not generate Python code out of an Ansible role!
$ ./provision --help
usage: provision [-h] [-v] [--debug] [-C] [--ansible-to-python role]
[--ansible-to-ast role]
Provision my VPS
optional arguments:
[...]
--ansible-to-python role
print the given Ansible role as Transilience Python
code
--ansible-to-ast role
print the AST of the given Ansible role as understood
by Transilience
And will you look at this annotated extract:
$ ./provision --ansible-to-python mailserver
from __future__ import annotations
from typing import Any
from transilience import role
from transilience.actions import builtin, facts
# Role classes generated from Ansible handlers!
class ReloadPostfix(role.Role):
def start(self):
self.add(
builtin.systemd(unit='postfix', state='reloaded'),
name='reload postfix')
class RestartDovecot(role.Role):
def start(self):
self.add(
builtin.systemd(unit='dovecot', state='restarted'),
name='restart dovecot')
# The role, including a standard set of facts
@role.with_facts([facts.Platform])
class Role(role.Role):
# These are the variables used by Jinja2 template files and strings. I need
# to use Any, since Ansible variables are not typed
aliases: Any = None
myhostname: Any = None
postmaster: Any = None
virtual_domains: Any = None
def all_facts_available(self):
...
# A Jinja2 string inside a string list!
self.add(
builtin.command(
argv=[
'certbot', 'certonly', '-d',
self.render_string('mail.{{ansible_domain}}'), '-n',
'--apache'
],
creates=self.render_string(
'/etc/letsencrypt/live/mail.{{ansible_domain}}/fullchain.pem'
)),
name='obtain mail.* letsencrypt certificate')
# A converted template task!
self.add(
builtin.copy(
dest='/etc/dovecot/local.conf',
src=self.render_file('templates/dovecot.conf')),
name='configure dovecot',
# Notify referring to the corresponding Role class!
notify=RestartDovecot)
# Referencing a variable collected from a fact!
self.add(
builtin.copy(dest='/etc/mailname', content=self.ansible_domain),
name='configure /etc/mailname',
notify=ReloadPostfix)
...
Conclusion
Transilience can load a (growing) subset of Ansible syntax, one role at a time, which contains:
- All actions defined in
Transilience's
builtin.*
namespace - Ansible's template module (without
block_start_string
,block_end_string
,lstrip_blocks
,newline_sequence
,output_encoding
,trim_blocks
,validate
,variable_end_string
,variable_start_string
) - Jinja2 templates in string parameters, even when present inside lists and dicts and nested lists and dicts
- Variables from facts provided by
transilience.actions.facts.Platform
- Variables used in jitsi templates, both in strings and in files, provided by host vars, group vars, role parameters, and facts
- Notify using handlers defined within the role. Notifying handlers from other roles is not supported, since roles in Transilience are self-contained
The role loader in Transilience now looks for YAML when it does not find a Python module, and runs it pipelined and fast!
There is code to generate Python code from an Ansible module: you can take an Ansible role, convert it to Python, and then work on it to add more complex logic, or clean it up for adding it to a library of reusable roles!
Next: Ansible conditionals
This is part of a series of posts on ideas for an ansible-like provisioning system, implemented in Transilience.
I added check mode to Transilience, to do everything except perform changes, like Ansible does:
$ ./provision --help
usage: provision [-h] [-v] [--debug] [-C] [--to-python role]
Provision my VPS
optional arguments:
-h, --help show this help message and exit
-v, --verbose verbose output
--debug verbose output
-C, --check do not perform changes, but check if changes would be ← NEW!
needed ← NEW!
It was quite straightforwad to add a new field to the base Action
class, and
tweak the implementations not to perform changes if it is True:
# Shortcut function to annotate dataclass fields with documentation metadata
def doc(default: Any, doc: str, **kw):
return field(default=default, metadata={"doc": doc})
@dataclass
class Action:
...
check: bool = doc(False, "when True, check if the action would perform changes, but do nothing")
Like with Ansible, check mode takes about the same time as a normal run which does not perform changes.
Unlike Ansible, with Transilience this is actually pretty fast! ;)
Next step: parsing YAML!