Benchmarking PHP 7 vs HHVM - Drupal and Wordpress

[Multiple updates: I've added results for concurrencies of 1 and 10, results on bare metal vs. VMware instances, tested Drupal 8 vs Drupal 7 vs Wordpress 4.4, and I've also retested every single benchmark at least twice! Please make sure you're read through the entire post prior to contesting these benchmark results!]

tl;dr: Always test your own application, and trust, but verify every benchmark you see. PHP 7 is actually faster than HHVM in many cases, neck-in-neck in others, and slightly slower in others. Both PHP 7 and HHVM blow PHP ≤ 5.6 out of the water.

Skip to benchmark results:

Lessons Learned building the Raspberry Pi Dramble

Raspberry Pi Dramble Bramble Cluster

Edit: Many people have been asking for more technical detail, benchmarks, etc. There is much more information available on the Raspberry Pi Dramble Wiki (e.g. Power Consumption, microSD card benchmarks, etc.), if you're interested.

After the Raspberry Pi 2 model B was released, I decided the Pi was finally a fast enough computing platform (with its 4-core 900 MHz ARMv7 architecture) with enough memory (1 GB per Pi) to actually use for web infrastructure. If not in a production environment (I would definitely avoid putting the Pi into a role as a high-load 24x7x365 server), then for development and/or testing purposes. And for some fun!

Highly-Available PHP infrastructure with Ansible

I just posted a large excerpt from Ansible for DevOps over on the Server blog: Highly-Available Infrastructure Provisioning and Configuration with Ansible. In it, I describe a simple set of playbooks that configures a highly-available infrastructure primarily for PHP-based websites and web applications, using Varnish, Apache, Memcached, and MySQL, each configured in a way optimal for high-traffic and highly-available sites.

Here's a diagram of the ultimate infrastructure being built:

Highly Available Infrastructure

NFS, rsync, and shared folder performance in Vagrant VMs

It's been a well-known fact that using native VirtualBox or VMWare shared folders is a terrible idea if you're developing a Drupal site (or some other site that uses thousands of files in hundreds of folders). The most common recommendation is to switch to NFS for shared folders.

NFS shared folders are a decent solution, and using NFS does indeed speed up performance quite a bit (usually on the order of 20-50x for a file-heavy framework like Drupal!). However, it has it's downsides: it requires extra effort to get running on Windows, requires NFS support inside the VM (not all Vagrant base boxes provide support by default), and is not actually all that fast—in comparison to native filesystem performance.

I was developing a relatively large Drupal site lately, with over 200 modules enabled, meaning there were literally thousands of files and hundreds of directories that Drupal would end up scanning/including on every page request. For some reason, even simple pages like admin forms would take 2+ seconds to load, and digging into the situation with XHProf, I found a likely culprit:

PHP: It doesn't have to be a bad experience

It gets a little bit under my skin when I see a link to PHP: A fractal of bad design posted in the comments on every article mentioning PHP on tech sites, blogs, and forums.

Computer Reaction Face - Suspicious

PHP developers get it: PHP is full of ugly warts, and is not perfect. Far from it.

Developers brazen enough to admit they don't detest every minute of programming in PHP are blithely dismissed by 'real' developers. Ironically, many of these 'real' developers enjoy using some other dynamically-typed language to which 80% of the arguments against PHP equally apply.

Meet Phergie, an efficient PHP IRC bot

The Drupal community uses IRC extensively for collaboration and community building. A permanent and ever-helpful fixture of the official #drupal-* IRC channels, and in the Drupal community itself, is the humble Druplicon bot. Druplicon is a Drupal-based IRC bot that was created in 2005, and is still going strong as part of the Bot module for Drupal.

Bots like Druplicon do a lot of nice things—they can remind people of things after they were away for a while, they can store facts, track karma, throw people virtual beers, store and retrieve helpful facts, and relay important information. For example, when a build fails in Jenkins, a bot can post a message in IRC. Similarly, if a server goes down, or is under heavy load, the bot could post a message.

Drupal 2014 - New Year's Resolutions

2014 is going to be a big year for Drupal. I spent a lot of 2013 sprucing up services like Hosted Apache Solr and Server (both running on Drupal 7 currently), and porting some of my Drupal projects to Drupal 8.

So far I've made great progress on Honeypot and Wysiwyg Linebreaks, which I started migrating a while back. Both modules work and pass all tests on Drupal's current dev/alpha release, and I plan on following through with the D8CX pledges I made months ago.

CI: Deployments and Static Code Analysis with Drupal/PHP

CI: Deplyments and Code Quality

tl;dr: Get the Vagrant profile for Drupal/PHP Continuous Integration Server from GitHub, and create a new VM (see the README on the GitHub project page). You now have a full-fledged Jenkins/Phing/SonarQube server for PHP/Drupal CI.

In this post, I'm going to explain how Jenkins, Phing and SonarQube can help you with your Drupal (or hany PHP-based project) deployments and code quality, and walk you through installing and configuring them to work with your codebase. Bear with me... it's a long post!

Code Deployment

If you manage more than one environment (say, a development server, a testing/staging server, and a live production server), you've probably had to deal with the frustration of deploying changes to your code to these servers.

In the old days, people used FTP and manually copied files from environment to environment. Then FTP clients became smarter, and allowed somewhat-intelligent file synchronization. Then, when version control software became the norm, people would use CVS, SVN, or more recently Git, to push or check out code to different servers.

All the aforementioned deployment methods involved a lot of manual labor, usually involving an FTP client or an SSH session. Modern server management tools like Ansible can help when there are more complicated environments, but wouldn't everything be much simpler if there were an easy way to deploy code to specific environments, especially if these deployments could be automated to either run on a schedule or whenever someone commits something to a particular branch?

Jenkins Logo

Enter Jenkins. Jenkins is your deployment assistant on steroids. Jenkins works with a wide variety of tools, programming languages and systems, and allows the automation (or radical simplification) of tasks surrounding code changes and deployments.

In my particular case, I use a dedicated Jenkins server to monitor a specific repository, and when there are commits to a development branch, Jenkins checks out that branch from Git, runs some PHP code analysis tools on the codebase using Phing, archives the code and other assets in a .tar.gz file, then deploys the code to a development server and runs some drush commands to complete the deployment.

Static Code Analysis / Code Review

If you're a solo developer, and you're the only one planning on ever touching the code you write, you can use whatever coding standards you want—spacing, variable naming, file structure, class layout, etc. don't really matter.

But if you ever plan on sharing your code with others (as a contributed theme or module), or if you need to work on a shared codebase, or if there's ever a possibility you will pass on your code to a new developer, it's a good idea to follow coding standards and write good code that doesn't contain too many WTFs/min.

The Drupal Way™

The Drupal Way

I've worked with a wide variety of developers, designers, content managers, and the other Drupal users in the past few years, and I'm pretty sure I have a handle on most of the reasons people think Drupal is a horrible platform. But before I get to that, I have to set up the rest of this post with the following quote:

There are not a hundred people in America who hate the Catholic Church. There are millions of people who hate what they wrongly believe to be the Catholic Church — which is, of course, quite a different thing.

Forgive me for diverging slightly into my faith, but this quote is from the late Fulton J. Sheen, and is in reference to the fact that so many people pour hatred on the Catholic Church not because of what the Church actually teaches, but because of what they think the Catholic Church teaches. Once someone comes to understand the actual teaching, they are free to agree or disagree with it—but there are comparatively few people who disagree with teachings they actually understand.

Similarly, the problems most people have with Drupal—and with systems like it—are problems not with Drupal, but with their perception of Drupal.

Java Jane: One-off vs. Flexible Design

A Java developer (let's call her Jane) is used to creating a bunch of base object classes and a schema for a database by hand, then deploying an application and managing the database through her own wrapper code. Jane is assigned to a Drupal project, takes one look at the database, and decides that no sane person would ever design a schema with hundreds of tables named field_data_* and field_revision_* for every single data point in the application!

Why does Drupal have So Many Database Tables?

In reality, Drupal is doing this because The Drupal Way dictates that things like field data should be: flexible (able to be used by different kinds of entities (content)), able to be translated, able to be revised with a trackable history, and able to be stored in different storage backends (e.g. MySQL, MariaDB, MongoDB, SQLite, etc.). If the fields were all stored in a per-entity table as separate columns, these different traits would be much more difficult to implement.

Thus, The Drupal Way is actually quite beneficial—if you want a flexible content management system.

I think a lot of developers hate Drupal because they know they could build a more efficient web application that only has the minimal required features they need by simply writing everything from scratch (or using a barebones framework). But what about the next 72 times you have to build the exact same thing, except slightly different each time, with a feature that's different here, translation abilities there, integration with Active Directory for user login here, integration with a dozen APIs there, etc.?

There's a maxim that goes something like: Every seasoned web developer started with plain HTML and CSS, or some hosted platform, then discovered a dynamic scripting language and built his own CMS-like system. Then, after building the CMS into a small system like many others but hopelessly insecure and unmaintainable, the developer realized that thousands of other people went through the same progression and ultimately worked together on systems like Drupal. Then said developer starts using Drupal, and the rest is history.

I know you could build a small system that beats the pants off Drupal performance-wise, and handles the three features you need done now. But why spend hours on a login form (that probably has security holes), session handling (ditto), password storage (ditto) forms in general (ditto), content CRUD interfaces, a translation system, a theme layer, etc., when you can have that out of the box, and just spend a little time making it look and behave like you want it? The shoulders of giants and all that...

.Net Neil: Letting Contrib/Bespoke Code Let You Down

A .Net developer (lets call him Neil) joins a Drupal project team after having worked on a small custom .Net application for a few years. Not only does he not know PHP (so he's learning by seeing the code already in use), he is also used to a tightly-controlled application code structure, which he knows and owns end-to-end.

Setting a max_execution_time limit for PHP CLI

PHP's command line interface doesn't respect the max_execution_time limit within your php.ini settings. This can be both a blessing and a curse (but more often the latter). There are some drush scripts that I run concurrently for batch operations that I want to make sure don't run away from me, because they perform database operations and network calls, and can sometimes slow down and block other operations.

Memory usage - PHP and MySQL locked from runaway threads
Can you tell when the batch got backlogged? CPU usage spiked to 20, and threads went from 100 to 400.

I found that some large batch operations (where there are hundreds of thousands of items to work on) would hold the server hostage and cause a major slowdown, and when I went to the command line and ran:

<br />
$ drush @site-alias ev "print ini_get('max_execution_time');"<br />