Preventing Form Spam

Spam email folder - Gmail interface

There are many different techniques for preventing form spam on your website, and an important component of the battle against spam is your constant struggle between giving your 'real' users a good experience while preventing spammers and automated bots from spamming you and lowering the quality of the content on your website.

A Constant User-Experience Battle

Usually, the first thing someone will do after having trouble fighting spammers by manual comment/content moderation is place a complex CAPTCHA system on their forms. Something like this:

Spam CAPTCHA text difficult to read

Besides the fact that this kind of text is difficult for a normal person to read, it's even harder so for those with poor eyesight. In addition, many (if not most) CAPTCHA implementations are inaccessible to blind people or those using assistive devices to read your website.

What happens, in essence, is you end up completely silencing a large portion of the people who may be wanting to comment on your post or send you feedback—not just the disabled, but those who would rather not waste an extra ten seconds of their lives leaving a comment!

Form Usability vs. Spam Deterrence
Form Spam prevention vs. Usability Chart

In the chart above, I illustrate the conundrum that web developers and content administrators must face when dealing with all types of forms—article submissions, forums, comments, etc. You can usually steer your forms towards either more user-friendliness, or better spam prevention, but there's no way to get the best of both worlds. You'll always need to compromise in some way; the rest of this article will lead you through my typical process for determining where, exactly, the happy medium resides.

So, what can we do? Spend hours moderating content? Disable comment forms? Those options aren't necessarily wrong in every situation, but I'd propose a more level approach—take small steps towards spam prevention, and only use CAPTCHAs (or other techniques which require extra user attention/time) in the most dire circumstances.

Before the CMS; Basic Principles in Spam Prevention

There are a few common-sense approaches to spam prevention that require minimal effort but can already go a long way towards preventing spam.

  • Stop it before it starts.
    On many of my sites, I use a 'comment disabler' system that automatically sets comments to 'read only' on my posts after a few weeks. By that time, all relevant discussion has already been had... and I can turn comments back on if the post is a more timeless post or deserves more time for discussion. Spammers can't spam when there's no comment form!
  • Don't Give Comments Center Stage.
    Many spammers only spam to get links back to their websites. If you can hide your comments on a separate page, separate from your main post, or have them not display until the user clicks a 'show comments' link (used by many news sites), that will help make spammers' links less potent. (You should also always add a rel="nofollow" attribute to all links posted in comments).
  • Don't allow anonymous comments/posts (require a login).
    This can be a good thing and a bad thing; if you have a smaller site, and want to encourage people to post, requiring a user account can be a difficult barrier. However, many sites and communities can effectively reduce spam by requiring users to create an account or log into the site using Facebook or Twitter (or a commenting system like Disqus). 

Obviously, these are very low-tech methods, and they only work in certain cases... but they're very effective if you don't need to leave comments open, and if you want to radically reduce the amount of spam you're getting with minimal effort.

Basic Form Protections

The first thing I usually do on any website (no matter how small) is enable one or two minimal form spam prevention techniques (for Drupal sites, I always turn on the Honeypot module and at least enable it for the user registration form and comment forms), and make sure that all comments and publicly-visible postings on the site are emailed to myself or another moderator (along with quick links to remove the post or in some cases approve it.

The Honeypot module is a very basic defense that works very well against weaker spammers who prey on smaller sites that typically don't have any spam protection.

The module is aptly named: the module adds an invisible field to comment forms (and other forms), and if that invisible field has any data entered into it, the form is not accepted. Like a pot of honey in front of Pooh (a spammer), a field in a form is irresistable to a spam script, which goes through the form and throws a value into any field it can find. An illustration:

Add a hidden field in a comment form

Honeypot also adds 'timestamp' protection, which basically requires a time limit be passed before the form can be submitted (by default, 5 seconds). Usually, humans can't read an article, type a comment, and click 'Submit' within 5 seconds. However, spam scripts that want to post spam comments on hundreds of forms every second will try submitting the form within less than a second, and Honeypot will stop that from happening.

So, at a basic level (preventing spam from automated scripts and bots), two protections are pretty effective:

  • A 'honeypot' field (with a common title like 'homepage' or 'url', to make it even more tantalizing) that is hidden from normal users using CSS or JavaScript, and is not allowed to have any content entered into it.
  • A time-based protection, which attaches a time value to a form, and requires a certain amount of time to pass before form submission is accepted.

Advanced Form Protections

Unfortunately, there are many situations where simple spam prevention techniques will fail. Typically, the more popular a site gets, and the more PageRank it gets, the more likely spammers will outsmart your protections.

Spammers may customize their spam scripts to wait five seconds before posting a comment, and they may be able to detect invisible fields and work around them.

In these cases, you need to start using more intelligent spam prevention. There are three systems I've tested (and use) on my sites, and all three are highly effective in preventing spam, but come with a price:

  • Mollom - a newer spam prevention service that analyzes content and also offers accessible CAPTCHAs either by default, or if a form was submitted and flagged as potential spam. Has a Drupal module and Wordpress plugin.
  • Akismet - a well-established service used for spam prevention, has a Drupal module and Wordpress plugin, among others.
  • External comment services: My favorites (and the most spam-free, in my experience) are Disqus and LiveFyre. Both have integrations with most CMSes, and both have many different pricing options.

Hosting comments externally has many benefits, but it means you may not have as much control over the comment integration, display, and data, as you would if you keep the comments on-site. For-pay spam prevention services can allow you to keep your comments on-site and high quality.

When the Going Gets Rough

I would highly recommend you consider following the same progression of spam prevention techniques that I outline above rather than blindly install a CAPTCHA or other usability nightmare. Doing so will encourage real people to comment without distractions that may cause them to discard their comments—especially if your CAPTCHA system is broken.

However, there are certain cases where I still employ CAPTCHAs, and believe them to be helpful (especially on smaller forms or registration forms that won't have enough data for a spam prevention service like Mollom to work effectively). When I do use CAPTCHAs, I make sure they are accessible and not too difficult to read; I'd rather have to do some work moderating new users and have some false positives than deny a real person access to my site!

If you do use a CAPTCHA or something else that requires extra end-user work, add an audible CAPTCHA alternative so people with eyesight problems can still submit your forms. And make sure it's not impossible to decipher the images! One final thing to make sure of: don't lose the form data if a user submits the form with an incorrect CAPTCHA answer (that's quite annoying—having to fill out a form all over again).

There are also alternatives to CAPTCHAs that are easier to complete; one example is VouchSafe. Another is a simple addition or subtraction challenge. Yet another is a question like 'click the third word in this sentance'. Creative CAPTCHA alternatives are a very good way to keep forms extremely secure while making life easier for your site's users.

Comments

You have included a can of SPAM image at the top of your post, which talks exclusively about spam. You are violating the Trademark of the Hormel Foods LLC by associating the two. Please see http://www.spam.com/about/internet.aspx for further details. I also hate entering email addresses into forms.

I have honestly never seen Hormel's notice about the use of their image associated with electronic spam, and I apologize for having violated their terms. (Of course, if it were me, I'd be happy for the incredible amount of free publicity... I had never heard of spam as a kid, and the first time I found out about the mystery meat was when I started using email—which prompted me to actually buy some and try it!).

Additionally, I have a twofold reason for requiring an email address: First, it tends to keep people slightly more honest, and second, there have been literally hundreds of times where I've sent personal emails in response to comments (especially for off-topic or personal comments that still deserve a response). I take the privacy of my commenters extremely seriously, and also only keep secure backups of my databases to ensure the security of those who choose to comment anonymously.

Of course, you've figured out an easy way to circumvent this particular feature, which is fine by me :)

It also seems Hormel has been quite unsuccessful in litigating over the use of the term 'Spam' (capitalized or not) in association with electronic mail or other communications. I did remove the picture though; shouldn't've let that one slip by.

Hi Jeff. I enjoyed your post, and I've probably got more than a few choice comments to make about comment spam in Drupal. I think even requiring a user account (major disincentive to pass-through commenters) and using Mollom/Akismet doesn't quite cut it. Even with those features in place, I've found a number of spambots signing up for accounts and posting spam comments, entries and/or profile fields.

There's a smart plugin for WordPress which actually works wonders at reducing comment spam - it's called Cookies for Comments and what it does is note a minimum delay between the page loading and the comment being posted. If the comment is posted within - say 5 seconds - the plugin will classify the comment as spam. And in principle it makes sense - time yourself leaving a legitimate comment compared with a bot automatically pasting content into fields. Most people take upwards of 10 seconds. It's certainly worth having a look at.

Honeypot does exactly the same thing, by default, and also adds honeypot protection... This alone, though, is not enough for many sites. For some of my sites, I have honeypot + mollom, and for others I use Disqus (though I usually like leaving comments on my own site...).