Himblick one day later

This is part of a series of posts on the design and technical steps of creating Himblick, a digital signage box based on the Raspberry Pi 4.

One day after the first deploy, we went to check how the system was doing, and noticed some fine tuning to do, some pretty much urgent.


Inspecting

Since the system runs on a readonly rootfs with a writable tempfs overlay, one can inspect the contents of /live/cow and see exactly what files were written since the last boot. ncdu -x /live/cow is a wonderful, wonderful thing.

In this way, we can quickly identify disk/memory usage leaks, and other possible unexpected surprises, like an unexpectedly updated apt package database.

An unexpectedly updated apt package database, with apt sources that may publish broken software, raised very loud alarm bells.

Disable apt timers

It looks like Raspbian ships with the automatic apt update/upgrade timer services enabled. In our case, that would give us a system that works when turned on, then upgrades overnight, and the next day won't play videos, until rebooted, when the tmpfs overlay will be reset and it will work again, until the next nightly upgrade, and so on.

In other words, a flaky system, that would thankfully fix itself at boot but break one day after booting. A system that would be very hard to debug. A system that would soon lose the trust of its users.

The first hotfix after deployment of Himblick was then to update the provisioning procedure to disable automatic package updates:

systemctl disable apt-daily.timer
systemctl mask apt-daily.timer
systemctl disable apt-daily-upgrade.timer
systemctl mask apt-daily-upgrade.timer

Of course, the first system to be patched was on top of a very tall ladder close to a museum ceiling.

journald disk usage

Logging takes an increasing amount of space. In theory, using a systemd.volatile setup, journald does the right thing by default. Since we need to use dracut's hack instead of systemd.volatile, we need to take manual steps to bound the amount of disk space used.

Thanfully, it looks easy to fine-tune journald's disk usage

Limit the growth of .xsession-errors

The .xsession-errors file grows indefinitely during the X session, and it cannot be rotated without restarting X. Deleting it won't help, as the X session still has the file open and keeps it allocated and growing on disk. At most, it can be occasionally truncated.

The file is created by /etc/X11/Xsession before sourcing other configuration files, so one cannot override its location with, say, /dev/null, or a pipe to some command, without editing the Xsession script itself.

Still, .xsession-errors is extremely useful for finding unexpected error output from X programs when something goes wrong.

In our case, himblick-player is the only program run in the X session. We can greatly limit the growth of .xsession-errors by making it log to a file instead of stderr, and using one of python's rotating logging handlers to limit the amount of Himblick's stored logging, or send himblick's log directly to journald, and let journald take care of disk allocation.

Once that is sorted, we can change Himblick to capture the players' stdout and stderr, and log it, to avoid it going to .xsession-errors.