My unorthodox CAPTCHA blocked thousands of spam comments every week

I wrote a custom image-less CAPTCHA for my blog a while ago. I didn’t write it as a plugin, so I lost it when I upgraded Wordpress a couple weeks ago. Not having this protection was an eye-opening experience, and vindicated what I asserted in my original posts: a naive question-and-answer system is highly effective at stopping spammers, probably as effective as scrambled images. Read on for the details.

In my original article, I hypothesized that CAPTCHAs with scrambled images just make it hard for real people to use websites, and probably don’t provide any additional protection over less obnoxious methods. I thought there was probably a sweet spot at which humans don’t find the system intrusive, and yet it’s just a tiny bit too hard for most spammers to bother cracking it. After all, comment spammers are mostly targeting wide-open Wordpress installations. Why work hard at the small fraction that resist comment spam when there are so many easy targets?

(Actually, knowing what I know about search engine optimization, I’d go after the hard-to-get ones myself if I wanted quality links, but the comment spam I get is clearly about quantity, not even an attempt to look like quality).

How much spam do I get?

My little system of multiple-choice questions such as “which of the following is blue? a) sky b) grass …” seemed to cut out the vast majority of comment spam, but I never quite knew how much until I took it away and replaced it with a default installation of WordPress 2.1. In the old system, I had to delete a comment or two a day from the moderation queue. Wanna guess how much spam I built up in a week with nothing but Akismet in the new installation? From Sunday night May 13th to the next Sunday night, I got over 1,800 spam comments.

What about Akismet?

“Ah,” you say, “but that’s really no problem. You say you had Akismet installed; it should catch most of them.” Yes, but it also catches valid comments, which I value highly and don’t want to throw away. I had to pore through the spam queue and find them. If you’ve ever tried that with 1,800 comments in the spam bucket — holy cow, that’s all but impossible. I had to log into my MySQL database at the command line and start nuking them with LIKE patterns just to get it down to something manageable. Even a couple dozen spam comments a day in the spam queue would push me over the edge. If I had to deal with thousands in the spam bucket, and dozens that weren’t caught by Akismet, I’d turn off comments.

I needed a challenge question just to stop the hemorrhaging. Instead of writing my own this time, I decided to try using a pre-built plugin. I chose the popular “did you pass math?” plugin. It is, like most WordPress plugins, not perfect — but it’s good enough. I’m down to about 15 spam comments a day in the moderation queue now. With Akismet helping, that becomes quite manageable.

Notice — and this surprised me — the “did you pass math” plugin lets through more spam than my custom solution. I’d bet dollars to donuts that’s because it’s both popular and not customized per-blog. My system was unique, so it makes sense that it worked better.

So much for the naysayers

There’s a lot of “wisdom” floating around the web (some of it in the comments on my earlier posts, showing me how easy it would be to bypass my custom solution ) that says CAPTCHAs don’t work at all, and you should just use Bayesian filters and the like. I never believed it. Now I have proof. Was my system easy to break? Absolutely, and that’s why it wasn’t a hassle for real people to use. Did it work great despite its flaws? You bet.

I may re-write my solution as a plug-in at some point, if I get time. Till then, good enough is good enough, just as it always has been.

Technorati Tags:No Tags

You might also like:

  1. CAPTCHAs without images, part 2
  2. My apologies if Bad Behavior blocked you
  3. Why CAPTCHAs don’t work well
  4. How to implement CAPTCHAs without images
  5. How to install and maintain multiple WordPress blogs easily

5 Responses to “My unorthodox CAPTCHA blocked thousands of spam comments every week”


  1. 1 Bill Minton

    To me, captchas are similar to challenge/response email filtering. While many abhor them, I think it’s impossible to beat them with content (bayesian) filtering. Similarly, it would be nearly impossible for content filtering to match the the accuracy of your custom solution.

  2. 2 Aaron Bassett

    You might want to take a look at wp-gatekeeper by Eric Meyer. Its sort of like the did-you-pass-math plugin but it lets you set simple questions for your users to answer (I use it on my own blog and it seems quite successful). The only slight problem I have come across with it is the fact that some users who don’t have english as a first language can find some of the questions confussing.

    But then that is probably more to do with the questions I have written than with the plugin itself.

  3. 3 sapphirecat

    Interesting - I am currently blocking 100% of spam on my blog with a negative captcha. I removed the email address field from my comment form, and I reject any comment with an email address present. I only used to see a few per week, so my spam-load was nothing like yours, but it was still enough to be frustrating to me.

    I don’t expect this to last, though. I set up a captcha on an old UBBThreads forum for someone; prior to doing so, I added a hidden field with a random value, and about 50% of the spambots there correctly passed the hidden field.

  4. 4 Jay Pipes

    I got more than 250K (yes, 250,000) spam comments in my blog in one month and subsequently turned off comments altogether, which is, according to some, even worse than not consistently blogging! ;) I use Serendipity, and an old version at that. Just trying to find the time to upgrade/enhance it is so very hard! But, I’m not surprised at all that your CAPTCHA caught more spam than a published plugin… the spammers will always target the published ones first! :)

    -Jay

  5. 5 MM

    i get most of my spam blocked, not only the comments on my blog but also in my email.

    i usually find, especially for the e mail spam, that as long as you dont enter your e mail or your information in mailing lists ( i got my explination at http://www.advertisingonlinesite.com/MailingListBrokers.html and so far its held up pretty well) the ammount of spam you recieve declines significantly.

    the only exceotion is my work e mail account. recently a virus infiltrated the system and spam is sent from the e mail address of many of my co workers. a big pain.

Leave a Reply

Please do not use this blog to get help with problems or bugs in Maatkit or innotop: use the Sourceforge forums, mailing list, or bug trackers. If you're asking for help with MySQL, please use the MySQL mailing list instead. I'm writing a book and my time is extremely limited :-)