Docker and systemd, getting rid of dreaded 'Failed to connect to bus' error

The following error has been the bane of my existence for the past few months:

TASK [geerlingguy.containerd : Ensure containerd is started and enabled at boot.] ***
fatal: [instance]: FAILED! => {
  "changed": false,
  "cmd": "/bin/systemctl",
  "msg": "Failed to connect to bus: No such file or directory",
  "rc": 1,
  "stderr": "Failed to connect to bus: No such file or directory",
  "stderr_lines": [
    "Failed to connect to bus: No such file or directory"
  "stdout": "",
  "stdout_lines": []

Since I use Molecule with my Ansible roles and playbooks to test them in identical CI environments both locally and in GitHub Actions, I can maintain an identical environment inside which tests are run. And many of my roles and playbooks need to test whether systemd services are configured and run correctly.

But Docker recently switched from cgroups v1 to cgroups v2, and that started this 'Failed to connect to bus' business—systemd relied on some configuration that was easy enough to add in the past: just run your containers with these options:

Automating my Homelab with Ansible (AnsibleFest 2022)

At AnsibleFest 2022, I presented Ansible for the Homelab.

Jeff Geerling's Homelab Rack in 2022

In the presentation, I gave a tour of my homelab, highlighting it's growth from a modem and 5-port switch to a full 24U rack with a petabyte of storage and multiple 10 gigabit switches!

Then I spent some time discussing how various components are automated using Ansible, mostly using open source projects on GitHub.

Unfortunately for attendees, the room my session was in was packed, and a lot of people who wanted to see it were turned away.

Clearing Cloudflare and Nginx caches with Ansible

Since being DDoS continuously earlier this year, I've set up extra caching in front of my site. Originally I just had Nginx's proxy cache, but that topped out around 100 Mbps of continuous bandwidth and maybe 5-10,000 requests per second on my little DigitalOcean VPS.

So then I added Cloudflare's proxy caching service on top, and now I've been able to handle months with 5-10 TB of traffic (with multiple spikes of hundreds of mbps per second).

That's great, but caching comes with a tradeoff—any time I post a new article, update an old one, or a post receives a comment, it can take anywhere between 10-30 minutes before that change is reflected for end users.

I used to use Varnish, and with Varnish, you could configure cache purges directly from Drupal, so if any operation occurred that would invalidate cached content, Drupal could easily purge just that content from Varnish's cache.

apt_key deprecated in Debian/Ubuntu - how to fix in Ansible

For many packages, like Elasticsearch, Docker, or Jenkins, you need to install a trusted GPG key on your system before you can install from the official package repository.

Traditionally, you'd run a command like:

wget -qO - | sudo apt-key add -

But if you do that in modern versions of Debian or Ubuntu, you get the following warning:

Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).

This way of adding apt keys still works for now (in mid-2022), but will stop working in the next major releases of Ubuntu and Debian (and derivatives). So it's better to stop the usage now. In Ansible, you would typically use the ansible.builtin.apt_key module, but even that module has the following deprecation warning:

Using an Ansible playbook with an SSH bastion / jump host

Since I've set this up a number of times, but I just realized I've never documented it on my blog, I thought I'd finally do that.

I have a set of servers that are running on a private network. That network is connected to the Internet through a single reverse proxy / 'bastion' host.

But I still want to be able to manage the servers on the private network behind the bastion from outside.

Method 1 - Inventory vars

The first way to do it with Ansible is to describe how to connect through the proxy server in Ansible's inventory. This is helpful for a project that might be run from various workstations or servers without the same SSH configuration (the configuration is stored alongside the playbook, in the inventory).

In my Ansible project, I had an inventory file like the following:

Ansible playbook to upgrade Ubuntu/Debian servers and reboot if needed

I realized I've never posted this playbook to my blog... I needed to grab it for a project I'm working on, so I figured I'd post it here for future reference.

Basically, I need a playbook I can run whenever, that will ensure all packages are upgraded, then checks if a reboot is required, and if so, reboots the server. Afterwards, it removes any dependencies no longer required.

- hosts: all
  gather_facts: yes
  become: yes

    - name: Perform a dist-upgrade.
        upgrade: dist
        update_cache: yes

    - name: Check if a reboot is required.
        path: /var/run/reboot-required
        get_md5: no
      register: reboot_required_file

    - name: Reboot the server (if required).
      when: reboot_required_file.stat.exists == true

    - name: Remove dependencies that are no longer required.
        autoremove: yes

Install Python 3.9 on Raspberry Pi OS or Debian 10 (for Ansible or other uses)

I've started getting a lot of bug reports on my repos to the effect of "Ansible won't install on my Raspberry Pi anymore". Accompanying it is a debug message like one of the following:

$ python3 -m pip install ansible
No matching distribution found for ansible-core<2.13,>=2.12.0 (from ansible)

# Alternatively:
ERROR: No matching distribution found for ansible-core<2.13,>=2.12.0

The problem is ansible-core 2.12 has a new hard requirement for Python 3.8 or newer. And ansible-core 2.12 is included in Ansible 5.0.0, which was recently released. Raspberry Pi OS, which was based on Debian 10 ("Buster") until recently, includes Python 3.7, which is too old to satisfy Ansible's installation requirements.

There was recently a fix that makes it so Ansible 5.x won't get installed on these older systems, but who wants to get stuck on old unsupported Ansible versions?

There are three options:

Automating the Uncommon - AnsibleFest 2021 presentation

At AnsibleFest 2021, I presented a session titled Automating the Uncommon - Ansible automates everything!.

Since watching on-demand versions of the AnsibleFest sessions requires a signup, I thought I'd also post the session to my YouTube channel, so everyone can learn from it without registering. The session seemed well-received, and I hope it shows that, as I state in my 'Rule of Golden Hammers':

Jeff's rule of Golden Hammers - If you know a tool well enough, and the tool is good enough, it's okay to do weird things with it.

I demonstrate how I use Ansible to:

Allowing Ansible playbooks to work with new user groups on first run

For a long time, I've had some Ansible playbooks—most notably ones that would install Docker then start some Docker containers—where I had to split them in two parts, or at least run them twice, because they relied on the control user having a new group assigned for some later tasks.

The problem is, Ansible would connect over SSH to a server, and use that connection for subsequent tasks. If you add a group to the user (e.g. docker), then keep running more tasks, that new group assignment won't be picked up until the SSH connection is reset (similar to how if you're logged in, you'd have to log out and log back in to see your new groups).

The easy fix for this? Add a reset_connection meta task in your play to force Ansible to drop its persistent SSH connection and reconnect to the server: