Latest posts for tag sw

This post is part of a series about trying to setup a gitlab runner based on systemd-nspawn. I published the polished result as nspawn-runner on GitHub.

New goal: make it easier to configure and maintain chroots. For example, it should be possible to maintain a rolling testing or sid chroot without the need to manually log into it to run apt upgrade.

It should be also be easy to have multiple runners reasonably in sync by carrying around a few simple configuration files, representing the set of images available to the CI.

Ideally, those configuration files could simply be one ansible playbook per chroot. nspawn-runner could have a 'chroot-maintenance' command that runs all the playbooks on their corresponding chroots, and that would be all I need.

ansible and systemd-nspawn

ansible being inadequate as usual, it still does not have a nspawn or machinectl connector, even though the need is there, and one can find requests, pull requests, and implementation attempts by all sorts of people, including me.

However, I don't want to have nspawn-runner depend on random extra plugins. There's a machinectl become plugin available in ansible from buster-backports, but no matter how I read its scant documentation, looked around the internet, and tried all sorts of things, I couldn't manage to figure out what it is for.

This said, simply using systemd-nspawn instead of chroot is quite trivial: use ansible_connection: chroot, set ansible_chroot_exe to this shellscript, and it just works, with things properly mounted, internet access, correct hostnames, and everything:

#!/bin/sh
chroot="$1"
shift
exec systemd-nspawn --console=pipe -qD "$chroot" -- "$@"

I guess that's a, uhm, plan, I guess?

Running playbooks

As an initial prototype, I made nspawn check the list of chroots in /var/lib/nspawn-runner, and for each directory found there, check if there's an equivalent .yaml or .yml file next to nspawn-runner.

For each chroot+playbook combination, it creates an inventory with the right setup, including the custom chroot command, and runs ansible-playbook.

As a prototype it works. I assume once it will see usage there's going to be feedback and fine tuning; meanwhile, I have the beginning of some automated maintenance for the CI chroots.

Next step

It would be nice to also have nspawn-runner create the chroots from configuration files if they are missing, so that a new runner can be deployed with a minimal effort, and it will proceed to generate all the images required in a single command.

For this, I'd like to find a clean way to store the chroot creation command inside the playbooks, to keep just one configuration file per chroot.

I'd also like to have it flexible enough to run debootstrap, as well as commands for different distributions.

Time will tell.

This is probably enough for study/design posts on my blog. Further updates will be in the issue tracker.

This post is part of a series about trying to setup a gitlab runner based on systemd-nspawn. I published the polished result as nspawn-runner on GitHub.

systemd-nspawn has an interesting --ephemeral option that sets up temporary copy-on-write filesystem snapshots on filesystems that support it, like btrfs.

Using copy on write means that one could perform maintenance on the source chroots, without disrupting existing CI sessions.

btrfs and copy on write

btrfs snapshots work on subvolumes.

As I understand it, if one uses btrfs subvolume create instead of mkdir, what is inside the resulting directory is managed as a subvolume that can be snapshotted and managed in all sorts of interesting ways.

I managed to delete a subvolume equally well with btrfs subvolume delete and with rm -r.

btrfs subvolume snapshot src dst is similar to cp -a, but it makes a copy-on-write snapshot of a btrfs subvolume.

If I change nspawn-runner to manage each chroot in its own subvolume, I should be able to build on all these features, and systemd-nspawn should be able to do that, too.

There's a cute shortcut to migrate a subdirectory to a subvolume: create the subvolume, then use cp -r --reflink to populate the subvolume with the directory contents.

systemd-nspawn and btrfs

Passing -x/--ephemeral to systemd-nspawn makes it do all the transient copy-on-write work automatically:

# systemd-nspawn -xD buster
Spawning container buster-7fd47ac79296c5d3 on /var/lib/nspawn-runner/t/.#machine.buster0939fbc61fcbca28.
Press ^] three times within 1s to kill container.
root@buster-7fd47ac79296c5d3:~# mkdir foo
root@buster-7fd47ac79296c5d3:~# ls -la
total 12
drwx------ 1 root root  62 Mar 13 16:30 .
drwxr-xr-x 1 root root 154 Mar 13 16:26 ..
-rw------- 1 root root 102 Mar 13 16:26 .bash_history
-rw-r--r-- 1 root root 570 Mar 13 16:26 .bashrc
-rw-r--r-- 1 root root 148 Mar 13 16:26 .profile
drwxr-xr-x 1 root root   0 Mar 13 16:30 foo
root@buster-7fd47ac79296c5d3:~# logout
Container buster-7fd47ac79296c5d3 exited successfully.
root@runner2:/var/lib/nspawn-runner/t# ls -la buster/root/
totale 12
drwx------ 1 root root  56 mar 13 16:26 .
drwxr-xr-x 1 root root 154 mar 13 16:26 ..
-rw------- 1 root root 102 mar 13 16:26 .bash_history
-rw-r--r-- 1 root root 570 mar 13 16:26 .bashrc
-rw-r--r-- 1 root root 148 mar 13 16:26 .profile

It also works on a directory that is not a subvolume, making reflinks of its contents instead of a subvolume snapshot, although this has a performance penalty on setup:

Snapshotting a subvolume:

# time systemd-nspawn -xD buster ls
Spawning container buster-7ab8f4123420b5d5 on /var/lib/nspawn-runner/t/.#machine.bustercd54ef4971229ff5.
Press ^] three times within 1s to kill container.
bin  boot  dev  etc  home  lib  lib32  lib64  libx32  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
Container buster-7ab8f4123420b5d5 exited successfully.

real    0m0,164s
user    0m0,032s
sys 0m0,014s

Reflink-ing a subdirectory:

# time systemd-nspawn -xD buster ls
Spawning container buster-ebc9dc77db0c972d on /var/lib/nspawn-runner/.#machine.buster2ecbcbd1a1a058b8.
Press ^] three times within 1s to kill container.
bin  boot  dev  etc  home  lib  lib32  lib64  libx32  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
Container buster-ebc9dc77db0c972d exited successfully.

real    0m3,022s
user    0m0,326s
sys 0m2,406s

Detecting filesystem type

I can change nspawn-runner to use btrfs-specific features only when /var/lib/nspawn-runner is on btrfs. Here's a command to detect the filesystem type:

# stat -f -c %T /var/lib/nspawn-runner/
btrfs

nspawn-runner updated

I've refactored nspawn-runner splitting backend and frontend code, and implementing multiple backends based on what's the filesystem type of /var/lib/nspawn-runner/.

It works really nicely, and with no special configuration required: if /var/lib/nspawn-runner is on btrfs, things run faster, with less kludges, and one can do maintenance on the base chroots without interfering with running CI jobs.

Next step

The next step is making it easier to configure and maintain chroots. For example, it should be possible to maintain a rolling testing or sid chroot without the need to manually log into it to run apt upgrade.

.gitlab-ci.yml supports 'image' to allow selecting in which environment the script gets run. The documentation says "Used to specify a Docker image to use for the job", but it's clearly a bug in the documentation, because we can do it with nspawn-runner, too.

It turns out that most of the environment variables available to CI runs are also available to custom runner scripts. In this case, the value passed as image can be found as $CUSTOM_ENV_CI_JOB_IMAGE in the custom runner scripts environment.

After some experimentation I made this commit that makes every chroot under /var/lib/nspawn-runner available as an image:

# Set up 3 new images for CI jobs:
nspawn-runner chroot-create buster
nspawn-runner chroot-create bullseye
nspawn-runner chroot-create sid

That's it, CI scripts can now use image: buster, image: bullseye or image: sid, as they please. You can manually set up other chroots under /var/lib/nspawn-runner and they'll be automatically available.

You can also now choose a default image in config.toml in case the CI script doesn't specify one:

prepare_args = ["--verbose", "prepare", "--default-image=buster"]

Updated:

  • New step! taking advantage of btrfs features, if available

This post is part of a series about trying to setup a gitlab runner based on systemd-nspawn. I published the polished result as nspawn-runner on GitHub.

gitlab-runner supports adding extra arguments to the custom scripts, and I can take advantage of that to pack all the various scripts that I prototyped so far into an all-in-one nspawn-runner command:

usage: nspawn-runner [-h] [-v] [--debug]
                     {chroot-create,chroot-login,prepare,run,cleanup,gitlab-config,toml}
                     ...

Manage systemd-nspawn machines for CI runs.

positional arguments:
  {chroot-create,chroot-login,prepare,run,cleanup,gitlab-config,toml}
                        sub-command help
    chroot-create       create a chroot that serves as a base for ephemeral
                        machines
    chroot-login        enter the chroot to perform maintenance
    prepare             start an ephemeral system for a CI run
    run                 run a command inside a CI machine
    cleanup             cleanup a CI machine after it's run
    gitlab-config       configuration step for gitlab-runner
    toml                output the toml configuration for the custom runner

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         verbose output
  --debug               verbose output

chroot maintenance

chroot-create and chroot-login are similar to what pbuilder, cowbuilder, schroot, debspawn and similar tools do.

They only take a chroot name, and default the rest of paths to where nspawn-runner expects things to be under /var/lib/nspawn-runner.

gitlab-runner setup

nspawn-runner toml <chroot-name> outputs a snippet to add to /etc/gitlab-runner/config.toml to configure the CI.

For example:`

$ ./nspawn-runner toml buster
[[runners]]
  name="buster"
  url="TODO"
  token="TODO"
  executor = "custom"
  builds_dir = "/var/lib/nspawn-runner/.build"
  cache_dir = "/var/lib/nspawn-runner/.cache"
  [runners.custom_build_dir]
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]
  [runners.custom]
    config_exec = "/home/enrico/…/nspawn-runner/nspawn-runner"
    config_args = ["gitlab-config"]
    config_exec_timeout = 200
    prepare_exec = "/home/enrico/…/nspawn-runner/nspawn-runner"
    prepare_args = ["prepare", "buster"]
    prepare_exec_timeout = 200
    run_exec = "/home/enrico/dev/nspawn-runner/nspawn-runner"
    run_args = ["run"]
    cleanup_exec = "/home/enrico/…/nspawn-runner/nspawn-runner"
    cleanup_args = ["cleanup"]
    cleanup_exec_timeout = 200
    graceful_kill_timeout = 200
    force_kill_timeout = 200

One needs to remember to set url and token, and the runner is configured.

The end, for now

This is it, it works! Time will tell what issues or ideas will come up: for now, it's a pretty decent first version.

The various prepare, run, cleanup steps are generic enough that they can be used outside of gitlab-runner: feel free to build on them, and drop me a note if you find this useful!

Updated: Issues noticed so far, that could go into a new version:

  • updating the master chroot would disturb the running CI jobs that use it. Using nspawn's btrfs-specfic features would prevent this problem, and possibly simplify the implementation even more. This is now done!
  • New step! trivially implementing support for multiple OS images

This post is part of a series about trying to setup a gitlab runner based on systemd-nspawn. I published the polished result as nspawn-runner on GitHub.

The plan

Back to custom runners, here's my plan:

  • config can be a noop
  • prepare starts the nspawn machine
  • run runs scripts with machinectl shell
  • cleanup runs machinectl stop

The scripts

Here are the scripts based on Federico's work:

base.sh with definitions sourced by all scripts:

MACHINE="run-$CUSTOM_ENV_CI_JOB_ID"
ROOTFS="/var/lib/gitlab-runner-custom-chroots/buster"
OVERLAY="/var/lib/gitlab-runner-custom-chroots/$MACHINE"

config.sh doing nothing:

#!/bin/sh
exit 0

prepare.sh starting the machine:

#!/bin/bash

source $(dirname "$0")/base.sh
set -eo pipefail

# trap errors as a CI system failure
trap "exit $SYSTEM_FAILURE_EXIT_CODE" ERR

logger "gitlab CI: preparing $MACHINE"

mkdir -p $OVERLAY

systemd-run \
  -p 'KillMode=mixed' \
  -p 'Type=notify' \
  -p 'RestartForceExitStatus=133' \
  -p 'SuccessExitStatus=133' \
  -p 'Slice=machine.slice' \
  -p 'Delegate=yes' \
  -p 'TasksMax=16384' \
  -p 'WatchdogSec=3min' \
  systemd-nspawn --quiet -D $ROOTFS \
    --overlay="$ROOTFS:$OVERLAY:/"
    --machine="$MACHINE" --boot --notify-ready=yes

run.sh running the provided scripts in the machine:

#!/bin/bash
logger "gitlab CI: running $@"
source $(dirname "$0")/base.sh

set -eo pipefail
trap "exit $SYSTEM_FAILURE_EXIT_CODE" ERR

systemd-run --quiet --pipe --wait --machine="$MACHINE" /bin/bash < "$1"

cleanup.sh stopping the machine and removing the writable overlay directory:

#!/bin/bash
logger "gitlab CI: cleanup $@"
source $(dirname "$0")/base.sh

machinectl stop "$MACHINE"
rm -rf $OVERLAY

Trying out the plan

I tried a manual invocation of gitlab-runner, and it worked perfectly:

# mkdir /var/lib/gitlab-runner-custom-chroots/build/
# mkdir /var/lib/gitlab-runner-custom-chroots/cache/
# gitlab-runner exec custom \
    --builds-dir /var/lib/gitlab-runner-custom-chroots/build/ \
    --cache-dir /var/lib/gitlab-runner-custom-chroots/cache/ \
    --custom-config-exec /var/lib/gitlab-runner-custom-chroots/config.sh \
    --custom-prepare-exec /var/lib/gitlab-runner-custom-chroots/prepare.sh \
    --custom-run-exec /var/lib/gitlab-runner-custom-chroots/run.sh \
    --custom-cleanup-exec /var/lib/gitlab-runner-custom-chroots/cleanup.sh \
    tests
Runtime platform                                    arch=amd64 os=linux pid=18662 revision=775dd39d version=13.8.0
Running with gitlab-runner 13.8.0 (775dd39d)
Preparing the "custom" executor
Using Custom executor...
Running as unit: run-r1be98e274224456184cbdefc0690bc71.service
executor not supported                              job=1 project=0 referee=metrics
Preparing environment

Getting source from Git repository

Executing "step_script" stage of the job script
WARNING: Starting with version 14.0 the 'build_script' stage will be replaced with 'step_script': https://gitlab.com/gitlab-org/gitlab-runner/-/issues/26426

Job succeeded

Deploy

The remaining step is to deploy all this in /etc/gitlab-runner/config.toml:

concurrent = 1
check_interval = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "nspawn runner"
  url = "http://gitlab.siweb.local/"
  token = "…"
  executor = "custom"
  builds_dir = "/var/lib/gitlab-runner-custom-chroots/build/"
  cache_dir = "/var/lib/gitlab-runner-custom-chroots/cache/"
  [runners.custom_build_dir]
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]
  [runners.custom]
    config_exec = "/var/lib/gitlab-runner-custom-chroots/config.sh"
    config_exec_timeout = 200
    prepare_exec = "/var/lib/gitlab-runner-custom-chroots/prepare.sh"
    prepare_exec_timeout = 200
    run_exec = "/var/lib/gitlab-runner-custom-chroots/run.sh"
    cleanup_exec = "/var/lib/gitlab-runner-custom-chroots/cleanup.sh"
    cleanup_exec_timeout = 200
    graceful_kill_timeout = 200
    force_kill_timeout = 200

Next steps

My next step will be polishing all this in a way that makes deploying and maintaining a runner configuration easy.

This post is part of a series about trying to setup a gitlab runner based on systemd-nspawn. I published the polished result as nspawn-runner on GitHub.

Here I try to figure out possible ways of invoking nspawn for the prepare, run, and cleanup steps of gitlab custom runners. The results might be useful invocations beyond Gitlab's scope of application.

I begin with a chroot which will be the base for our build environments:

debootstrap --variant=minbase --include=git,build-essential buster workdir

Fully ephemeral nspawn

This would be fantastic: set up a reusable chroot, mount readonly, run the CI in a working directory mounted on tmpfs. It sets up quickly, it cleans up after itself, and it would make prepare and cleanup noops:

mkdir workdir/var/lib/gitlab-runner
systemd-nspawn --read-only --directory workdir --tmpfs /var/lib/gitlab-runner "$@"

However, run gets run multiple times, so I need the side effects of run to persist inside the chroot between runs.

Also, if the CI uses a large amount of disk space, tmpfs may get into trouble.

nspawn with overlay

Federico used --overlay to keep the base chroot readonly while allowing persistent writes on a temporary directory on the filesystem.

Note that using --overlay requires systemd and systemd-container from buster-backports because of systemd bug #3847.

Example:

mkdir -p tmp-overlay
systemd-nspawn --quiet -D workdir \
  --overlay="`pwd`/workdir:`pwd`/tmp-overlay:/"

I can run this twice, and changes in the file system will persist between systemd-nspawn executions. Great! However, any process will be killed at the end of each execution.

machinectl

I can give a name to systemd-nspawn invocations using --machine, and it allows me to run multiple commands during the machine lifespan using machinectl and systemd-run.

In theory machinectl can also fully manage chroots and disk images in /var/lib/machines, but I haven't found a way with machinectl to start multiple machines sharing the same underlying chroot.

It's ok, though: I managed to do that with systemd-nspawn invocations.

I can use the --machine=name argument to systemd-nspawn to make it visible to machinectl. I can use the --boot argument to systemd-nspawn to start enough infrastructure inside the container to allow machinectl to interact with it.

This gives me any number of persistent and named running systems, that share the same underlying chroot, and can cleanup after themselves. I can run commands in any of those systems as I like, and their side effects persist until a system is stopped.

The chroot needs systemd and dbus for machinectl to be able to interact with it:

debootstrap --variant=minbase --include=git,systemd,systemd,build-essential buster workdir

Let's boot the machine:

mkdir -p overlay
systemd-nspawn --quiet -D workdir \
    --overlay="`pwd`/workdir:`pwd`/overlay:/"
    --machine=test --boot

Let's try machinectl:

# machinectl list
MACHINE CLASS     SERVICE        OS     VERSION ADDRESSES
test    container systemd-nspawn debian 10      -

1 machines listed.
# machinectl shell --quiet test /bin/ls -la /
total 60
[…]

To run commands, rather than machinectl shell, I need to use systemd-run --wait --pipe --machine=name, otherwise machined won't forward the exit code. The result however is pretty good, with working stdin/stdout/stderr redirection and forwarded exit code.

Good, I'm getting somewhere.

The terminal where I ran systemd-nspawn is currently showing a nice getty for the booted system, which is cute, and not what I want for the setup process of a CI.

Spawning machines without needing a terminal

machinectl uses /lib/systemd/system/systemd-nspawn@.service to start machines. I suppose there's limited magic in there: start systemd-nspawn as a service, use --machine to give it a name, and machinectl manages it as if it started it itself.

What if, instead of installing a unit file for each CI run, I try to do the same thing with systemd-run?

systemd-run \
  -p 'KillMode=mixed' \
  -p 'Type=notify' \
  -p 'RestartForceExitStatus=133' \
  -p 'SuccessExitStatus=133' \
  -p 'Slice=machine.slice' \
  -p 'Delegate=yes' \
  -p 'TasksMax=16384' \
  -p 'WatchdogSec=3min' \
  systemd-nspawn --quiet -D `pwd`/workdir \
    --overlay="`pwd`/workdir:`pwd`/overlay:/"
    --machine=test --boot

It works! I can interact with it using machinectl, and fine tune DevicePolicy as needed to lock CI machines down.

This setup has a race condition where if I try to run a command inside the machine in the short time window before the machine has finished booting, it fails:

# systemd-run […] systemd-nspawn […] ; machinectl --quiet shell test /bin/ls -la /
Failed to get shell PTY: Protocol error
# machinectl shell test /bin/ls -la /
Connected to machine test. Press ^] three times within 1s to exit session.
total 60
[…]

systemd-nspawn has the option --notify-ready=yes that solves exactly this problem:

# systemd-run […] systemd-nspawn […] --notify-ready=yes ; machinectl --quiet shell test /bin/ls -la /
Running as unit: run-r5a405754f3b740158b3d9dd5e14ff611.service
total 60
[…]

On nspawn's side, I should now have all I need.

Next steps

My next step will be wrapping it all together in a gitlab runner.

This is a first post in a series about trying to setup a gitlab runner based on systemd-nspawn. I published the polished result as nspawn-runner on GitHub.

The goal

I need to setup gitlab runners, and I try to not involve docker in my professional infrastructure if I can avoid it.

Let's try systemd-nspawn. It's widely available and reasonably reliable.

I'm not the first to have this idea: Federico Ceratto made a setup based on custom runners and Josef Kufner one based on ssh runners.

I'd like to skip the complication of ssh, and to expand Federico's version to persist not just filesystem changes but also any other side effect of CI commands. For example, one CI command may bring up a server and the next CI command may want to test interfacing with it.

Understanding gitlab-runner

First step: figuring out gitlab-runner.

Test runs of gitlab-runner

I found that I can run gitlab-runner manually without needing to go through a push to Gitlab. It needs a local git repository with a .gitlab-ci.yml file:

mkdir test
cd test
git init
cat > .gitlab-ci.yml << EOF
tests:
 script:
  - env | sort
  - pwd
  - ls -la
EOF
git add .gitlab-ci.yml
git commit -am "Created a test repo for gitlab-runner"

Then I can go in the repo and test gitlab-runner:

gitlab-runner exec shell tests

It doesn't seem to use /etc/gitlab-runner/config.toml and it needs all the arguments passed to its command line: I used the shell runner for a simple initial test.

Later I'll try to brew a gitlab-runner exec custom invocation that uses nspawn.

Basics of custom runners

A custom runner runs a few scripts to manage the run:

  • config, to allow to override the run configuration outputting JSON data
  • prepare, to prepare the environment
  • run, to run scripts in the environment (might be ran multiple times)
  • cleanup to clean up the environment

run gets at least one argument which is a path to the script to run. The other scripts get no arguments by default.

The runner configuration controls the paths of the scripts to run, and optionally extra arguments to pass to them

Next steps

My next step will be to figure out possible ways of invoking nspawn for the prepare, run, and cleanup scripts.

This is part of a series of posts on compiling a custom version of Qt5 in order to develop for both amd64 and a Raspberry Pi.

Building Qt5 takes a long time. The build server I was using had CPUs and RAM, but was very slow on I/O. I was very frustrated by that, and I started evaluating alternatives. I ended up setting up scripts to automatically provision a throwaway cloud server at Hetzner.

Initial setup

I got an API key from my customer's Hetzner account.

I installed hcloud-cli, currently only in testing and unstable:

apt install hcloud-cli

Then I configured hcloud with the API key:

hcloud context create

Spin up

I wrote a quick and dirty script to spin up a new machine, which grew a bit with little tweaks:

#!/bin/sh

# Create the server
hcloud server create --name buildqt --ssh-key … --start-after-create \
                     --type cpx51 --image debian-10 --datacenter …

# Query server IP
IP="$(hcloud server describe buildqt -o json | jq -r .public_net.ipv4.ip)"

# Update ansible host file
echo "buildqt ansible_user=root ansible_host=$IP" > hosts

# Remove old host key
ssh-keygen -f ~/.ssh/known_hosts -R "$IP"

# Update login script
echo "#!/bin/sh" > login
echo "ssh root@$IP" >> login
chmod 0755 login

I picked a datacenter in the same location as where we have other servers, to get quicker data transfers.

I like that CLI tools have JSON output that I can cleanly pick at with jq. Sadly, my ISP doesn't do IPv6 yet.

Since the server just got regenerated, I remove a possibly cached host key.

Provisioning the machine

One git server I need is behind HTTP authentication. Here's a quick hack to pass the relevant .netrc credentials to ansible before provisioning:

#!/usr/bin/python3

import subprocess
import netrc
import tempfile
import json

login, account, password = netrc.netrc().authenticators("…")

with tempfile.NamedTemporaryFile(mode="wt", suffix=".json") as fd:
    json.dump({
        "repo_user": login,
        "repo_password": password,
    }, fd)
    fd.flush()

    subprocess.run([
        "ansible-playbook",
        "-i", "hosts",
        "-l", "buildqt",
        "--extra-vars", f"@{fd.name}",
        "provision.yml",
        ], check=True)

And here's the ansible playbook:

#!/usr/bin/env ansible-playbook

- name: Install and configure buildqt
  hosts: all
  tasks:
   - name: Update apt cache
     apt:
        update_cache: yes
        cache_valid_time: 86400

   - name: Create build user
     user:
        name: build
        comment: QT5 Build User
        shell: /bin/bash

   - name: Create sources directory
     become: yes
     become_user: build
     file:
        path: ~/sources
        state: directory
        mode: 0755

   - name: Download sources
     become: yes
     become_user: build
     get_url:
        url: "https://…/{{item}}"
        dest: "~/sources/{{item}}"
        mode: 0644
     with_items:
      - "qt-everywhere-src-5.15.1.tar.xz"
      - "qt-creator-enterprise-src-4.13.2.tar.gz"

   - name: Populate home directory
     become: yes
     become_user: build
     copy:
        src: build
        dest: ~/
        mode: preserve

   - name: Write .netrc
     become: yes
     become_user: build
     copy:
        dest: ~/.netrc
        mode: 0600
        content: |
           machine …
           login {{repo_user}}
           password {{repo_password}}

   - name: Write .screenrc
     become: yes
     become_user: build
     copy:
        dest: ~/.screenrc
        mode: 0644
        content: |
           hardstatus alwayslastline
           hardstatus string '%{= cw}%-Lw%{= KW}%50>%n%f* %t%{= cw}%+Lw%< %{= kK}%-=%D %Y-%m-%d %c%{-}'
           startup_message off
           defutf8 on
           defscrollback 10240

   - name: Install base packages
     apt:
        name: git,mc,ncdu,neovim,eatmydata,devscripts,equivs,screen
        state: present

   - name: Clone git repo
     become: yes
     become_user: build
     git:
        repo: https://…@…/….git
        dest: ~/…

   - name: Copy Qt license
     become: yes
     become_user: build
     copy:
        src: qt-license.txt
        dest: ~/.qt-license
        mode: 0600

Now everything is ready for a 16 core, 32Gb ram build on SSD storage.

Tear down

When done:

#!/bin/sh
hcloud server delete buildqt

The whole spin up plus provisioning takes around a minute, so I can do it when I start a work day, and take it down at the end. The build machine wasn't that expensive to begin with, and this way it will even be billed by the hour.

A first try on a CPX51 machine has just built the full Qt5 Everywhere Enterprise including QtWebEngine and all its frills, for amd64, in under 1 hour and 40 minutes.

This is part of a series of posts on compiling a custom version of Qt5 in order to develop for both amd64 and a Raspberry Pi.

After having had some success with a sysroot in having a Qt5 cross-build environment that includes QtWebEngine, the next step is packaging the sysroot so it can be available both to build the cross-build environment, and to do cross-development with it.

The result is this Debian source package which takes a Raspberry Pi OS disk image, provisions it in-place, extracts its contents, and packages them.

Yes. You may want to reread the last paragraph.

It works directly in the disk image to avoid a nasty filesystem issue on emulated 32bit Linux over a 64bit mounted filesystem.

This feels like the most surreal Debian package I've ever created, and this saga looks like one of the hairiest yaks I've ever shaved.

Integrating this monster codebase, full of bundled code and hacks, into a streamlined production and deployment system has been for me a full stack nightmare, and I have a renewed and growing respect for the people in the Qt/KDE team in Debian, who manage to stay on top of this mess, so that it all just works when we need it.

Lite extra ball, from https://www.flickr.com/photos/st3f4n/143623902
Lite extra ball, from https://www.flickr.com/photos/st3f4n/143623902

This is part of a series of posts on compiling a custom version of Qt5 in order to develop for both amd64 and a Raspberry Pi.

The previous rounds of attempts ended in one issue too many to investigate in the allocated hourly budget.

Andreas Gruber wrote:

Long story short, a fast solution for the issue with EGLSetBlobFuncANDROID is to remove libraspberrypi-dev from your sysroot and do a full rebuild. There will be some changes to the configure results, so please review them - if they are relevant for you - before proceeding with your work.

That got me unstuck! dpkg --purge libraspberrypi-dev in the sysroot, and we're back in the game.

While Qt5's build has proven extremely fragile, I was surprised that some customization from Raspberry Pi hadn't yet broken something. In the end, they didn't disappoint.

More i386 issues

The run now stops with a new 32bit issue related to v8 snapshots:

qt-everywhere-src-5.15.0/qtwebengine/src/core/release$ /usr/bin/g++ -pie -Wl,--fatal-warnings -Wl,--build-id=sha1 -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,-z,defs -Wl,--as-needed -m32 -pie -Wl,--disable-new-dtags -Wl,-O2 -Wl,--gc-sections -o "v8_snapshot/mksnapshot" -Wl,--start-group @"v8_snapshot/mksnapshot.rsp"  -Wl,--end-group  -ldl -lpthread -lrt -lz
/usr/bin/ld: skipping incompatible //usr/lib/x86_64-linux-gnu/libz.so when searching for -lz
/usr/bin/ld: skipping incompatible //usr/lib/x86_64-linux-gnu/libz.a when searching for -lz
/usr/bin/ld: cannot find -lz
collect2: error: ld returned 1 exit status

Attempted solution: apt install zlib1g-dev:i386.

Alternative solution (untried): configure Qt5 with -no-webengine-v8-snapshot.

It builds!

Installation paths

Now it tries to install files into debian/tmp/home/build/sysroot/opt/qt5custom-armhf/.

I realise that I now need to package the sysroot itself, both as a build-dependency of the Qt5 cross-compiler, and as a runtime dependency of the built cross-builder.

Conclusion

The current work in progress, patches, and all, is at https://github.com/Truelite/qt5custom/tree/master/debian-cross-qtwebengine

It blows my mind how ridiculously broken is the Qt5 cross-compiler build, for a use case that, looking at how many people are trying, seems to be one of the main ones for the cross-builder.