Stripping the 'Vary: Host' header from an Apache response using Varnish

A colleague of mine found out that many static resource requests which should've been cached upstream by a CDN were not being cached, and the reason was an extra Vary http header being sent with the response—in this case Host.

It was hard to reproduce the issue, but in the end we found out it was related to Apache bug #58231. Basically, since we used some RewriteConds that evaluated the HTTP_HOST value before a RewriteRule, we ran into a bug where Apache would dump a Vary: Host header into the request response. When this was set, it effectively bypassed Varnish's cache, as well as our upstream CDN... and since it applied to all image, css, js, xml, etc. requests, we saw a lot of unexpected volume hitting the backend Apache servers.

To fix the issue, at least until the upstream bug is fixed in Debian, we decided to strip Host from the Vary header inside our Varnish default.vcl. Inside the vcl_backend_response, we added:

Streaming PHP - disabling output buffering in PHP, Apache, Nginx, and Varnish

For the past few days, I've been diving deep into testing Drupal 8's experimental new BigPipe feature, which allows Drupal page requests for authenticated users to be streamed and loaded in stages—cached elements (usually the majority of a page) are loaded almost immediately, meaning the end user can interact with the main elements on the page very quickly, then other uncacheable elements are loaded in as Drupal is able to render them.

Here's a very quick demo of an extreme case, where a particular bit of content takes five seconds to load; BigPipe hugely improves the usability and perceived performance of the page by streaming the majority of the page content from cache immediately, then streaming the harder-to-generate parts as they become available (click to replay):

Yes, Drupal 8 is slower than Drupal 7 - here's why

tl;dr: Drupal 8's defaults make most Drupal sites perform faster than equivalent Drupal 7 sites, so be wary of benchmarks which tell you Drupal 7 is faster based solely on installation defaults or raw PHP execution speed. Architectural changes have made Drupal's codebase slightly slower in some ways, but the same changes make the overall experience of using Drupal and browsing a Drupal 8 site much faster.

When some people see reports of Drupal 8 being 'dramatically' slower than Drupal 7, they wonder why, and they also use this performance change as ammunition against some of the major architectural changes that were made during Drupal 8's development cycle.

First, I wanted to give some more concrete data behind why Drupal 8 is slower (specifically, what kinds of things does Drupal 8 do that make it take longer per request than Drupal 7 on an otherwise-identical system), and also why this might or might not make any difference in your choice to upgrade to Drupal 8 sooner rather than later.

Use Drupal 8 Cache Tags with Varnish and Purge

Varnish cache hit in Drupal 8

Over the past few months, I've been reading about BigPipe, Cache Tags, Dynamic Page Cache, and all the other amazing-sounding new features for performance in Drupal 8. I'm working on a blog post that more comprehensively compares and contrasts Drupal 8's performance with Drupal 7, but that's a topic for another day. In this post, I'll focus on cache tags in Drupal 8, and particularly their use with Varnish to make cached content expiration much easier than it ever was in Drupal 7.

Always getting X-Drupal-Cache: MISS? Check for messages

I spent about an hour yesterday debugging a Varnish page caching issue. I combed the site configuration and code for anything that might be setting cache to 0 (effectively disabling caching), I checked and re-checked the /admin/config/development/performance settings, verifying the 'Expiration of cached pages' (page_cache_maximum_age) had a non-zero value and that the 'Cache pages for anonymous users' checkbox was checked.

After scratching my head a while, I realized that the headers I was seeing when using curl --head [url] were specified as the defaults in drupal_page_header(), and were triggered any time there was a message displayed on the page (e.g. via drupal_set_message()):

<br />
X-Drupal-Cache: MISS<br />
Expires: Sun, 19 Nov 1978 05:00:00 GMT<br />
Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0<br />
X-Content-Type-Options: nosniff<br />

Highly-Available PHP infrastructure with Ansible

I just posted a large excerpt from Ansible for DevOps over on the Server blog: Highly-Available Infrastructure Provisioning and Configuration with Ansible. In it, I describe a simple set of playbooks that configures a highly-available infrastructure primarily for PHP-based websites and web applications, using Varnish, Apache, Memcached, and MySQL, each configured in a way optimal for high-traffic and highly-available sites.

Here's a diagram of the ultimate infrastructure being built:

Highly Available Infrastructure

Debugging Varnish VCL configuration files

If you're a Drupal or PHP developer used to debugging or troubleshooting some code by adding a print $variable; or dpm($object); to your PHP, and then refreshing the page to see the debug message (or using XDebug, or using watchdog logging...), debugging Varnish's VCL language can be intimidating.

VCL uses C-like syntax, and is compiled when varnish starts, so you can't just modify a .vcl file and refresh to see changes or debug something. And there are only a few places where you can simply stick a debug statement. So, I'll explain four different ways I use to debug VCLs in this post (note: don't do this on a production server!):

Simple Error statements (like print in PHP)

Sometimes, all you need to do is see the output of a variable, like req.http.Cookie, inside vcl_recv(). In these cases, you can just add an error statement to throw an error in Varnish and output the contents of a string, like the Cookie: