docker

Resolving 'Temporary failure in name resolution' on Pi OS 12 Bookworm

Raspberry Pi OS version 12 (based on Debian 12 Bookworm) uses NetworkManager instead of dhcpcd for managing network connections, DNS resolution settings, DHCP, etc.

I've already mentioned using nmcli and nmtui for managing WiFi settings, but I ran into a strange issue after installing Docker on a fresh Raspberry Pi OS installation today. Suddenly DNS stopped working.

Trying to ping anything on the Internet gave me:

$ ping www.google.com
ping: www.google.com: Temporary failure in name resolution

As always, It was DNS. It was like DNS just gave up the ghost! Trying to change settings via nmtui seemed to not work (I tried DHCP for IPv4 with manual DNS, and that wasn't working).

Luckily, I found this post and followup comments mentioning the proper nmcli incantation to override DNS settings for an interface, so here it is (assuming built-in Ethernet):

Docker and systemd, getting rid of dreaded 'Failed to connect to bus' error

The following error has been the bane of my existence for the past few months:

TASK [geerlingguy.containerd : Ensure containerd is started and enabled at boot.] ***
fatal: [instance]: FAILED! => {
  "changed": false,
  "cmd": "/bin/systemctl",
  "msg": "Failed to connect to bus: No such file or directory",
  "rc": 1,
  "stderr": "Failed to connect to bus: No such file or directory",
  "stderr_lines": [
    "Failed to connect to bus: No such file or directory"
  ],
  "stdout": "",
  "stdout_lines": []
}

Since I use Molecule with my Ansible roles and playbooks to test them in identical CI environments both locally and in GitHub Actions, I can maintain an identical environment inside which tests are run. And many of my roles and playbooks need to test whether systemd services are configured and run correctly.

But Docker recently switched from cgroups v1 to cgroups v2, and that started this 'Failed to connect to bus' business—systemd relied on some configuration that was easy enough to add in the past: just run your containers with these options:

Using a reverse-NFS mount to access Docker container's data from macOS

For years, Mac users have dealt with slow filesystem performance for Docker volumes when using Docker for Mac. This is because the virtualized filesystem, which used osxfs for a while and will soon be upgraded to use VirtioFS.

But if you need to do large operations on huge codebases inside a shared directory, even using NFS to share from the Mac into Docker is a lot slower than running a native Docker volume or just using files inside the container's own filesystem.

macOS Disk Utility APFS Case Insensitive filesystem

New Docker for Mac VirtioFS file sync is 4x faster

Docker for Mac's shared volume performance saga continues!

After monitoring the issue File system performance improvements for years (discussion has moved to this issue now), it seems like the team behind Docker Desktop for Mac has finally settled on the next generation of filesystem sync.

For years, the built-in osxfs sync performance has been abysmal. For a Drupal developer like me, running a default shared volume could lead to excruciating slowdowns as PHP applications like Symfony and Drupal scan thousands of files when building app caches.

Or God forbid you ever have to install dependencies using Composer or NPM over a shared volume!

It got to the point where I started using NFS to speed up volume performance. Heck, the Docker team almost added Mutagen sync, which I tested successfully, but it caused problems for too many projects.

Allowing Ansible playbooks to work with new user groups on first run

For a long time, I've had some Ansible playbooks—most notably ones that would install Docker then start some Docker containers—where I had to split them in two parts, or at least run them twice, because they relied on the control user having a new group assigned for some later tasks.

The problem is, Ansible would connect over SSH to a server, and use that connection for subsequent tasks. If you add a group to the user (e.g. docker), then keep running more tasks, that new group assignment won't be picked up until the SSH connection is reset (similar to how if you're logged in, you'd have to log out and log back in to see your new groups).

The easy fix for this? Add a reset_connection meta task in your play to force Ansible to drop its persistent SSH connection and reconnect to the server:

Resolving intermittent Fedora DNF error "No such file or directory: '/var/lib/dnf/rpmdb_lock.pid'"

For many of my Ansible playbooks and roles, I have CI tests which run over various distributions, including CentOS, Ubuntu, Debian, and Fedora. Many of my Docker Hub images for Ansible testing include systemd so I can test services that are installed inside. For the most part, systemd-related issues are rare, but it seems with Fedora and DNF, I often encounter random test failures which invariably have an error message like:

No such file or directory: '/var/lib/dnf/rpmdb_lock.pid'

The full Ansible traceback is:

Be careful, Docker might be exposing ports to the world

Recently, I noticed logs for one of my web services had strange entries that looked like a bot trying to perform scripted attacks on an application endpoint. I was surprised, because all the endpoints that were exposed over the public Internet were protected by some form of authentication, or were locked down to specific IP addresses—or so I thought.

I had re-architected the service using Docker in the past year, and in the process of doing so, I changed the way the application ran—instead of having one server per process, I ran a group of processes on one server, and routed traffic to them using DNS names (one per process) and Nginx to proxy the traffic.

In this new setup, I built a custom firewall using iptables rules (since I had to control for a number of legacy services that I have yet to route through Docker—someday it will all be in Kubernetes), installed Docker, and set up a Docker Compose file (one per server) that ran all the processes in containers, using ports like 1234, 1235, etc.

The Docker Compose port declaration for each service looked like this:

Revisiting Docker for Mac's performance with NFS volumes

tl;dr: Docker's default bind mount performance for projects requiring lots of I/O on macOS is abysmal. It's acceptable (but still very slow) if you use the cached or delegated option. But it's actually fairly performant using the barely-documented NFS option!

July 2020 Update: Docker for Mac may soon offer built-in Mutagen sync via the :delegated sync option, and I did some benchmarking here. Hopefully that feature makes it to the standard Docker for Mac version soon.

September 2020 Update: Alas, Docker for Mac will not be getting built-in Mutagen support at this time. So, read on.

Molecule fails on converge and says test instance was already 'created' and 'prepared'

I hit this problem every once in a while; basically, I run molecule test or molecule converge (in this case it was for a Kubernetes Operator I was building with Ansible), and it says the instance is already created/prepared—even though it is not—and then Molecule fails on the 'Gathering Facts' portion of the converge step:

Running Drupal in Kubernetes with Docker in production

Update: Since posting this, there have been some interesting new developments in this area, for example:

  • There is now a Drupal/Kubernetes SIG which meets every other Wednesday.
  • There are Kubernetes Drupal Operators which can manage Drupal instances in Kubernetes; I maintain the geerlingguy/drupal-operator but there are a couple others out there in development.

Since 2014, I've been working on various projects which containerized Drupal in a production environment. There have always been a few growing pains—there will for some time, as there are so few places actually using Docker or containers in a production environment (at least in a 'cloud native' way, without tons of volume mounts), though this is changing. It was slow at first, but it's becoming much more rapid.