devops

Clearing Cloudflare and Nginx caches with Ansible

Since being DDoS continuously earlier this year, I've set up extra caching in front of my site. Originally I just had Nginx's proxy cache, but that topped out around 100 Mbps of continuous bandwidth and maybe 5-10,000 requests per second on my little DigitalOcean VPS.

So then I added Cloudflare's proxy caching service on top, and now I've been able to handle months with 5-10 TB of traffic (with multiple spikes of hundreds of mbps per second).

That's great, but caching comes with a tradeoff—any time I post a new article, update an old one, or a post receives a comment, it can take anywhere between 10-30 minutes before that change is reflected for end users.

I used to use Varnish, and with Varnish, you could configure cache purges directly from Drupal, so if any operation occurred that would invalidate cached content, Drupal could easily purge just that content from Varnish's cache.

Allowing Ansible playbooks to work with new user groups on first run

For a long time, I've had some Ansible playbooks—most notably ones that would install Docker then start some Docker containers—where I had to split them in two parts, or at least run them twice, because they relied on the control user having a new group assigned for some later tasks.

The problem is, Ansible would connect over SSH to a server, and use that connection for subsequent tasks. If you add a group to the user (e.g. docker), then keep running more tasks, that new group assignment won't be picked up until the SSH connection is reset (similar to how if you're logged in, you'd have to log out and log back in to see your new groups).

The easy fix for this? Add a reset_connection meta task in your play to force Ansible to drop its persistent SSH connection and reconnect to the server:

Kubernetes 101 livestream series starts Nov 18th!

On November 18th, at 11 a.m., the first episode of my upcoming Kubernetes 101 livestream series will start on my YouTube channel.

Kubernetes 101 Series Artwork

The first episode will be available here on YouTube: Kubernetes 101 - Episode 1 - Hello, Kubernetes!.

You can find more details about the series on my Kubernetes 101 site, and there is also an open-source Kubernetes 101 GitHub repository which will contain all the code examples for the series.

In the spring, I presented a similar livestream series, Ansible 101, covering all the basics of Ansible and setting people up for success in infrastructure automation.

Ansible 101 by Jeff Geerling - YouTube streaming series

Ansible 101 Header Image

After the incredible response I got from making my Ansible books free for the rest of March to help people learn new automation skills, I tried to think of some other things I could do to help developers who may be experiencing hardship during the coronavirus pandemic and market upheaval.

So I asked on Twitter:

Real World DevOps

This blog post contains a written transcript of my NEDCamp 2018 keynote, Real World DevOps, edited to match the style of this blog. Accompanying resources: presentation slides, video.

Jeff Geerling at NEDCamp 2018 - New England Drupal Camp

I'm Jeff Geerling; you probably know that because my name appears in huge letters at the top of every page on this site, including the post you're reading right now. I currently work at Acquia as a Senior Technical Architect, building hosting infrastructure projects using some buzzword-worthy tech like Kubernetes, AWS, and Cloud.

NEDCamp 2018 - Keynote on DevOps

Over the past decade, I've enjoyed presenting sessions at many DrupalCamps, DrupalCon, and other tech conferences. The conferences are some of the highlights of my year (at least discounting all the family things I do!), and lately I've been appreciative of the local communities I meet and get to be a part of (even if for a very short time) at Drupal Camps.

The St. Louis Drupal Users Group has chosen to put off it's annual Camp to 2019, so we're guiding people to DrupalCorn Camp, which is only a little bit north of us, in Iowa.

NEDCamp New England Drupal Camp logo

Properly deploying updates to or shutting down Jenkins

One of my most popular Ansible roles is the geerlingguy.jenkins role, and for good reason—Jenkins is pretty much the premiere open source CI tool, and has been used for many years by Ops and Dev teams all over the place.

As Jenkins (or other CI tools) are adopted more fully for automating all aspects of infrastructure work, you begin to realize how important the Jenkins server(s) become to your daily operations. And then you realize you need CI for your CI. And you need to have version control and deployment processes for things like Jenkins updates, job updates, etc. The geerlingguy.jenkins role helps a lot with the main component of automating Jenkins install and configuration, and then you can add on top of that a task that copies config.xml files with each job definition into your $JENKINS_HOME to ensure every job and every configuration is in code...

Getting Munin-node to monitor Nginx and Apache, the easy way

Since this is something I think I've bumped into at least eight times in the past decade, I thought I'd document, comprehensively, how I get Munin to monitor Apache and/or Nginx using the apache_* and nginx_* Munin plugins that come with Munin itself.

Besides the obvious action of symlinking the plugins into Munin's plugins folder, you should—to avoid any surprises—forcibly configure the env.url for all Apache and Nginx servers. As an example, in your munin-node configuration (on RedHat/CentOS, in /etc/munin/plugin-conf.d, add a file named something like apache or nginx):

# For Nginx:
[nginx*]
env.url http://localhost/nginx_status

# For Apache:
[apache*]
env.url http://localhost/server-status?auto

Now, something that often trips me up—especially since I maintain a variety of servers and containers, with some running ancient forms of CentOS, while others are running more recent builds of Debian, Fedora, or Ubuntu—is that localhost doesn't always mean what you'd think it means.

Fix for Ansible hanging when used with Docker and TTY

For almost all my Ansible roles on Ansible Galaxy, I have a comprehensive suite of tests that run against all supported OSes on Travis CI, and the only way that's possible is using Docker containers (one container for each OS/test combination).

For the past year or so, I've been struggling with some of the test suites having strange issues when I use docker exec --tty (which passes through Ansible's pretty coloration) along with Ansible playbooks running inside Docker containers in Travis CI. It seems that certain services, when restarted on OSes running sysvinit (like Ubuntu 14.04 and CentOS 6), cause ansible-playbook to hang indefinitely, resulting in a build failure: