apache

Getting Munin-node to monitor Nginx and Apache, the easy way

Since this is something I think I've bumped into at least eight times in the past decade, I thought I'd document, comprehensively, how I get Munin to monitor Apache and/or Nginx using the apache_* and nginx_* Munin plugins that come with Munin itself.

Besides the obvious action of symlinking the plugins into Munin's plugins folder, you should—to avoid any surprises—forcibly configure the env.url for all Apache and Nginx servers. As an example, in your munin-node configuration (on RedHat/CentOS, in /etc/munin/plugin-conf.d, add a file named something like apache or nginx):

# For Nginx:
[nginx*]
env.url http://localhost/nginx_status

# For Apache:
[apache*]
env.url http://localhost/server-status?auto

Now, something that often trips me up—especially since I maintain a variety of servers and containers, with some running ancient forms of CentOS, while others are running more recent builds of Debian, Fedora, or Ubuntu—is that localhost doesn't always mean what you'd think it means.

Stripping the 'Vary: Host' header from an Apache response using Varnish

A colleague of mine found out that many static resource requests which should've been cached upstream by a CDN were not being cached, and the reason was an extra Vary http header being sent with the response—in this case Host.

It was hard to reproduce the issue, but in the end we found out it was related to Apache bug #58231. Basically, since we used some RewriteConds that evaluated the HTTP_HOST value before a RewriteRule, we ran into a bug where Apache would dump a Vary: Host header into the request response. When this was set, it effectively bypassed Varnish's cache, as well as our upstream CDN... and since it applied to all image, css, js, xml, etc. requests, we saw a lot of unexpected volume hitting the backend Apache servers.

To fix the issue, at least until the upstream bug is fixed in Debian, we decided to strip Host from the Vary header inside our Varnish default.vcl. Inside the vcl_backend_response, we added:

Using MaxMind's free GeoIP databases with the official Docker PHP image

I recently had to add support for the MaxMind free GeoIP database to a PHP container build that was based on the official Docker PHP image on Docker Hub. Unfortunately, it seems nobody else who's added this support has documented it, so I figured I'd post this so that the next poor soul who needs to implement the functionality doesn't have to spend half a day doing it!

First, you need the PHP geoip extension, which is available via PECL (note: if you can make the PHP project itself use a composer library, there are a few better/more current geoip libraries available via Packagist!). Here's how to install it in one of the php 5.6 or 7.0-apache images (note that 7.1 uses Debian Stretch instead of Jessie... but the instructions should be the same there):

Apache, fastcgi, proxy_fcgi, and empty POST bodies with chunked transfer

I've been working on building a reproducible configuration for Drupal Photo Gallery, a project born out of this year's Acquia Build Hackathon.

We originally built the site on an Acquia Cloud CD environment, and this environment uses a pretty traditional LAMP stack. We didn't encounter any difficulty using AWS Lambda to post image data back to the Drupal site via Drupal's RESTful Web Services API.

The POST request is built in Node.js using:

Streaming PHP - disabling output buffering in PHP, Apache, Nginx, and Varnish

For the past few days, I've been diving deep into testing Drupal 8's experimental new BigPipe feature, which allows Drupal page requests for authenticated users to be streamed and loaded in stages—cached elements (usually the majority of a page) are loaded almost immediately, meaning the end user can interact with the main elements on the page very quickly, then other uncacheable elements are loaded in as Drupal is able to render them.

Here's a very quick demo of an extreme case, where a particular bit of content takes five seconds to load; BigPipe hugely improves the usability and perceived performance of the page by streaming the majority of the page content from cache immediately, then streaming the harder-to-generate parts as they become available (click to replay):

Upgrade an Apache Solr Search index from 1.4 to 3.6 (and later versions)

Recently I had to upgrade someone's Apache Solr installation from 1.4 to 5.x (the current latest version), and for the most part, a Solr upgrade is straightforward, especially if you're doing it for a Drupal site that uses the Search API or Solr Search modules, as the solr configuration files are already upgraded for you (you just need to switch them out when you do the upgrade, making any necessary customizations).

However, I ran into the following error when I tried loading the core running Apache Solr 4.x or 5.x:

org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource: MMapIndexInput(path="/var/solr/cores/[corename]/data/spellchecker2/_1m.cfx") [slice=_1m.fdx]): 1 (needs to be between 2 and 3). This version of Lucene only supports indexes created with release 3.0 and later.

To fix this, you need to upgrade your index using Solr 3.5.0 or later, then you can upgrade to 4.x, then 5.x (using each version of Solr to upgrade from the previous major version):

Solr for Drupal Developers, Part 3: Testing Solr locally

In earlier Solr for Drupal Developers posts, you learned about Apache Solr and it's history in and integration with Drupal. In this post, I'm going to walk you through a quick guide to getting Apache Solr running on your local workstation so you can test it out with a Drupal site you're working on.

The guide below is for those using Mac or Linux workstations, but if you're using Windows (or even if you run Mac or Linux), you can use Drupal VM instead, which optionally installs Apache Solr alongside Drupal.

As an aside, I am writing this series of blog posts from the perspective of a Drupal developer who has worked with large-scale, highly customized Solr search for Mercy (example), and with a variety of small-to-medium sites who are using Hosted Apache Solr, a service I've been running as part of Midwestern Mac since early 2011.

Installing Apache Solr in a Virtual Machine

Apache Solr can be run directly from any computer that has Java 1.7 or later, so technically you could run it on any modern Mac, Windows, or Linux workstation natively. But to keep your local workstation cleaner, and to save time and hassle (especially if you don't want to kludge your computer with a Java runtime!), this guide will show you how to set up an Apache Solr virtual machine using Vagrant, VirtualBox, and Ansible.

Let's get started:

Highly-Available PHP infrastructure with Ansible

I just posted a large excerpt from Ansible for DevOps over on the Server Check.in blog: Highly-Available Infrastructure Provisioning and Configuration with Ansible. In it, I describe a simple set of playbooks that configures a highly-available infrastructure primarily for PHP-based websites and web applications, using Varnish, Apache, Memcached, and MySQL, each configured in a way optimal for high-traffic and highly-available sites.

Here's a diagram of the ultimate infrastructure being built:

Highly Available Infrastructure

Solr for Drupal Developers, Part 2: Solr and Drupal, A History

Drupal has included basic site search functionality since its first public release. Search administration was added in Drupal 2.0.0 in 2001, and search quality, relevance, and customization was improved dramatically throughout the Drupal 4.x series, especially in Drupal 4.7.0. Drupal's built-in search provides decent database-backed search, but offers a minimal set of features, and slows down dramatically as the size of a Drupal site grows beyond thousands of nodes.

In the mid-2000s, when most custom search solutions were relatively niche products, and the Google Search Appliance dominated the field of large-scale custom search, Yonik Seeley started working on Solr for CNet Networks. Solr was designed to work with Lucene, and offered fast indexing, extremely fast search, and as time went on, other helpful features like distributed search and geospatial search. Once the project was open-sourced and released under the Apache Software Foundation's umbrella in 2006, the search engine became one of the most popular engines for customized and more performant site search.

As an aside, I am writing this series of blog posts from the perspective of a Drupal developer who has worked with large-scale, highly customized Solr search for Mercy (example), and with a variety of small-to-medium sites who are using Hosted Apache Solr, a service I've been running as part of Midwestern Mac since early 2011.

Timeline of Apache Solr and Drupal Solr Integration

If you can't view the timeline, please click through and read this article on Midwestern Mac's website directly.

A brief history of Apache Solr Search and Search API Solr

Only two years after Apache Solr was released, the first module that integrated Solr with Drupal, Apache Solr Search, was created. Originally, the module was written for Drupal 5.x, but it has been actively maintained for many years and was ported to Drupal 6 and 7, with some relatively major rewrites and modifications to keep the module up to date, easy to use, and integrated with all of Apache Solr's new features over time. As Solr gained popularity, many Drupal sites started switching from using core search or the Views module to using Apache Solr.

Solr for Drupal Developers, Part 1: Intro to Apache Solr

It's common knowledge in the Drupal community that Apache Solr (and other text-optimized search engines like Elasticsearch) blow database-backed search out of the water in terms of speed, relevance, and functionality. But most developers don't really know why, or just how much an engine like Solr can help them.

I'm going to be writing a series of blog posts on Apache Solr and Drupal, and while some parts of the series will be very Drupal-centric, I hope I'll be able to illuminate why Solr itself (and other search engines like it) are so effective, and why you should be using them instead of simple database-backed search (like Drupal core's Search module uses by default), even for small sites where search isn't a primary feature.

As an aside, I am writing this series of blog posts from the perspective of a Drupal developer who has worked with large-scale, highly customized Solr search for Mercy (example), and with a variety of small-to-medium sites who are using Hosted Apache Solr, a service I've been running as part of Midwestern Mac since early 2011.

Why not Database?

Apache Solr's wiki leads off it's Why Use Solr page with the following:

If your use case requires a person to type words into a search box, you want a text search engine like Solr.

At a basic level, databases are optimized for storing and retrieiving bits of data, usually either a record at a time, or in batches. And relational databases like MySQL, MariaDB, PostgreSQL, and SQLite are set up in such a way that data is stored in various tables and fields, rather than in one large bucket per record.

In Drupal, a typical node entity will have a title in the node table, a body in the field_data_body table, maybe an image with a description in another table, an author whose name is in the users table, etc. Usually, you want to allow users of your site to enter a keyword in a search box and search through all the data stored across all those fields.

Drupal's Search module avoids making ugly and slow search queries by building an index of all the search terms on the site, and storing that index inside a separate database table, which is then used to map keywords to entities that match those keywords. Drupal's venerable Views module will even enable you to bypass the search indexing and search directly in multiple tables for a certain keyword. So what's the downside?

Mainly, performance. Databases are built to be efficient query engines—provide a specific set of parameters, and the database returns a specific set of data. Most databases are not optimized for arbitrary string-based search. Queries where you use LIKE '%keyword%' are not that well optimized, and will be slow—especially if the query is being used across multiple JOINed tables! And even if you use the Search module or some other method of pre-indexing all the keyword data, relational databases will still be less efficient (and require much more work on a developer's part) for arbitrary text searches.

If you're simply building lists of data based on very specific parameters (especially where the conditions for your query all utilize speedy indexes in the database), a relational database like MySQL will be highly effective. But usually, for search, you don't just have a couple options and maybe a custom sort—you have a keyword field (primarily), and end users have high expectations that they'll find what they're looking for by simply entering a few keywords and clicking 'Search'.