Why Akismet Doesn't Catch All Blog Comment Spam
Akismet is the biggest name in WordPress-based spam comment filtering, and with good reason. They've been around for a long time, and they've steadily improved their tools throughout that whole time. They're easy to get set up and running, and the basic anti-spam offering they have is free.
And yet, if you look, you'll see dozens of articles and hundreds of reviews posted by people complaining about how the Akismet plugin fails to catch spam. WordPress blogs end up flooded with spam in waves, it seems, and it always feels just a little too late by the time Akismet pushes an update to address the issue.
So why does Akismet let spam through? What's going on?
The Forever War
There's a war going on. You don't see it, because it's on the internet. It's hidden just below the surface, hidden behind careful screens and obfuscation. It's a war between people who desperately want your attention, and other people who… also want your attention.
It's a war over rules. It's a war over laws. It's a war over control, and money, and power. It's like all wars, with an arms race.
The aggressors in this war are the spammers, the scammers, the cheats, and the frauds. They're the people who clamor for your attention, so they can lead you on, string you along with a compelling, too-good-to-be-true offer, only to steal your money and cut and run at the last minute.
These are the people behind blog comment spam, behind spam emails, and behind robocalls. These are the people making communication via modern means a crapshoot, making education about phishing a requirement, and making life as an old person incomprehensibly difficult.
The defenders in this war are not always the best, but they're better. They want your attention because they want your service. They, too, want your money, but they want it while providing you with value. The defenders are companies like Akismet, like Google with their anti-spam algorithms in search and their anti-spam filters in Gmail. Like Microsoft, doing the same thing with Outlook. Like phone and app companies, trying to give you a way to block robocalls.
The arms race is the battleground of technology. Spammers build bots, pieces of software meant to spam their message to as many people as possible, in as many places as possible, as quickly as possible. Anti-spam companies develop ways to detect and block that spam.
The weapons are things like captcha, like automated keyword filters, like link blocking. They can be very sophisticated, or they can be blunt objects wielded like a nuclear bomb, blocking communications both valid and spam.
The war goes on forever. As long as there are people who can fall for scams, and as long as there are people with more greed than ethics, willing to put scams into motion, there will be spam. New weapons developed for the war effort, new defenses built to counter those weapons, and newer weapons built anew.
That's the reality of the teeming underbelly of the internet; a world where spammers are developing new strategies constantly, and where anti-spam defenders have to be reactionary to keep up.
Sometimes the spammers emerge victorious. Sometimes the defenders make breakthroughs. Sometimes we discover that people aren't playing by the rules, and that they have agents on the inside, willing to undermine tools like the Do Not Call list in exchange for personal gain. The war goes on.
Perhaps now you can see why Akismet, a single company in this unending torrent of ever-escalating spam, struggles to keep up. They employ machine learning, they have massive databases of spam and the techniques used to get around filters, and they have a huge array of different ways they can target and prevent spammers from getting through. They work with other agencies to share data and make life harder for spammers. And yet, all the while, spammers are analyzing their means of prevention and finding ways around them.
Akismet looks for patterns. If you're just looking for a pattern in your own website's data, you might not be able to see one. Every spam comment looks more or less unique. You can't employ a single URL filter, because they have new URLs every time. You can't block keywords, both because they don't always use the right keywords, and because you can block legitimate comments.
Akismet can look at patterns on a global scale. They have access not just to your site, but to every site that they're installed on, and to data from other anti-spam companies. They can analyze broad patterns across millions of websites, and develop algorithms to prevent spam that would otherwise slip through.
What kind of data do they look for and analyze?
- Personal information. When blog spammers leave spam comments, they often fill out the personal information fields with names, websites, and other information. Patterns can be seen in the names used, the URLs used, and other quirks of those fields.
- The comments themselves. Take a look at your email's spam inbox; you'll likely see patterns in the spam you get, if your email address has been around long enough to end up on a ton of lists. Common uses of emojis, of formatting or special characters, and so on. The same goes for spam comments on blogs.
- IP addresses. Spammers often run through lists of proxies, through VPN endpoints, through TOR nodes, or even just on public connections. They tend to come from, or route through, third world countries. If 100,000 spam comments arrive from one set of 50 IP addresses, those IPs can be blacklisted.
- Access information. Bots still have to load a web page, so they have to use a browser. Browser user agent information can be used against them as well, at least in conjunction with other data.
This is a lot of data, and a lot of patterns for their algorithms to analyze. It does a pretty good job.
The problem lies when the arms race updates. When the spammers take a new tactic. When they create a new bot, with a new user agent, mimicking what normal users look like. When they use a new list of IP addresses or a new way to hide their traffic.
There is, and will always be, a moment of hesitation. When the first few spam comments come in, it takes time for a company like Akismet, or a company like Google, to analyze the new patterns and develop a new way to block them. They have teams constantly working to do just that, but it will always take time. That's why spam emails slip through into your inbox, it's why robocalls get past filters, and it's why comments make it through to your blog.
It's also just so, so much easier for the spammers. A spammer whose proxy list is broadly banned needs only rotate in a few new proxies to begin again. A spammer whose comments are filtered needs only run them through a spinner to get dozens of new variations. A spammer whose common names are blocked only has to change their seeded names. It's a matter of mere minutes for a spammer to be back in action, while it's a matter of hours or days before the defenders put new methods in place to prevent it.
How to Help Prevent Spam
Yes, Akismet will occasionally let spam through. There's not really anything you can do about that on your own. There are, however, a number of different steps you can take to try to minimize the spam you receive, and help others along the way.
Pay for Akismet Pro. According to the different plans available, Akismet doesn't actually provide any better anti-spam services for their paid version than they do for their free version. However, paying for it does fund the company, and that's money they can then use to leverage into more advanced anti-spam technology. Keeping the company alive, at least so long as they're mostly effective, is a good idea.
We create blog content that converts - not just for ourselves, but for our clients, too.
We pick blog topics like hedge funds pick stocks. Then, we create articles that are 10x better to earn the top spot.
Content marketing has two ingredients - content and marketing. We've earned our black belts in both.
There are a few posts out there about how to fight spam without a plugin. The sentiment is fine; you can do a lot of the filtering yourself, and plugins have a chance to slow down your site. However, if everyone decided to do it this way, the spammers who all work together have a much greater advantage. I don't recommend trying the DIY approach here, honestly.
Add Sucuri to the mix. Sucuri is another plugin, or rather a suite of plugins, you can use for various forms of website security. Foremost among those offerings is the website firewall. This uses some advanced profiling, machine learning, and correlation to identify bad traffic and filter it before it can even reach your site. You don't have to prevent the bot from leaving a spam comment when the bot can't even reach your site!
You'll need to pay for this one too, around $17 per month or so – $200 per year – for the cheapest plan. You can get more expensive plans to get better features, malware removal assistance, and more frequent malware scans if you think your site is under threat.
Hold all comments for moderation. One of the built-in features in WordPress is the ability to hold all comments for moderation. When a user submits a comment, it goes into the moderation queue, and you – or another moderator or admin for your site with the appropriate access – have to go into your dashboard and either approve or deny the comment.
This is a good way to manually filter the comments that make it through your plugins, and prevent spam from ever being published. It is, unfortunately, also a great way to backlog yourself with work you put off until it's no longer relevant. Far too many sites never bother to approve comments more than once every few months, and that utterly kills your users' ability to hold conversations. I don't like the option, but I can see why it's valuable to many people. You just have to stay on top of it. I recommend adding email notifications when a comment is added to the queue so you can approve it quickly.
Disallow links. Different spammers have different goals in spamming, but almost all of them eventually come back to a link. They want link juice, they want traffic, they want people to swing in and get scammed. If you don't allow links in your comments, then a lot of spammers will remove your site from their target list. That, or the comments will just bounce, because the bots try to submit a URL and the comment is rejected.
The downside to this is that you may have real readers who want to share valuable links, and disabling links doesn't actually prevent URLs from being posted with a gap to prevent them from parsing as links. It's just a stopgap, but it can help in some cases.
Disallow special characters. Special characters are being used more and more often. There are a lot of characters in an encoding type other than the typical Unicode, that look very similar to regular letters or letters with formatting, but are in fact special characters. Most of them are obvious and look a little off, but some are near identical and you can only tell they're not letters by looking at the raw version of the text. Unfortunately, using those characters gets around basic word filters, so you need to add the special characters to your filtered list.
Restrict anonymous comments. Bots very rarely bother to register accounts with any sort of verification. For one thing, they don't want to have to juggle email addresses to verify accounts. For another, having to deal with logging in and out all the time just means they have less time to spam, and more steps where their spam can be blocked. Requiring some kind of registration – and making anonymous commenting impossible – helps cut back on both the low quality comments and the spam comments you're likely to see on your blog.
Enable "comment author must have previously approved comment." This is a feature you can find in the WordPress comment admin dashboard. It's a great way to make sure your regular readers are always able to comment freely, while still requiring initial moderation for comments from newcomers. The first comment a user leaves will require approval, but after that, the commenter will be freely allowed to leave comments without being held for moderation, unless their comment violates another rule.
Add a social login requirement. You can use a social login, Facebook comments, Disqus comments, or another comment system that requires OAuth to help prevent spam as well. It's a lot of hoops to jump through to register spam social media accounts for things like Facebook, and a lot of spammers won't do it. Many will, of course, so this doesn't cut out spam entirely, but it's one more option you have available to you.
Add a Google Recaptcha requirement. Captchas are a great illustration of the ongoing arms race between spammers and security types. The original captchas were merely there to prevent text readers from inputting a verification text. They've steadily grown more and more advanced. Meanwhile, captcha breakers are tools spammers use to automatically fill out all but the most advanced captchas.
The most advanced available captcha system is Google's Recaptcha. It uses data similar to what Akismet uses to identify a bot, but also tracks things like mouse movement, time to response, and more. On top of that, it often just requires a single click, and maybe some image verification. Installing it isn't too difficult either, which you can see in the link above.
Use a honey pot field. One of my favorite ways to help prevent bot spam is to use a honey pot. A honey pot is a baited trap, essentially. They've been used for all manner of traps throughout history.
For comment spam honeypots, the general idea is to add a form field to your comment section. This field will be invisible to normal users, but visible to bots. Bots fill it out, and that traps them. Since ONLY bots can see the field, you know that anyone filling it out is going to be a spammer, so you can freely dumpster the comments without a second thought.
The downside is that this method is difficult to set up. I believe the Antispam Bee plugin uses this method as one means of spam protection, though, so you can give that a try as well.
Disable comments on old posts. Older sites often have an issue where moderating comments is a huge task. If you've been publishing three posts per week for ten years, that's a lot of posts up with comment sections ready to be spammed. Moderating all of those spam comments is a thankless and tedious task.
WordPress gives you the ability to disable comments on posts if those posts are older than X number of days, which you can set. This means your older posts are no longer an attack surface, so you have fewer comments to moderate.
The downside here is that older evergreen content might still get valuable discussion, but you disable comments and lose out on it entirely. I don't like this method, but if your site doesn't really get valuable comments on old content, feel free to use it.
Disable comments entirely. Many sites disable comments entirely these days. Some of them replace comments with the ability to engage with the site owner on Twitter, Discord, Skype, or a web forum or Facebook group. Others simply don't mourn the loss of potential engagement and don't think twice about it.
I prefer comments to be enabled and deal with the occasional spam message and the moderation queue, because comments can often be very valuable. However, I can see the argument that it's not worth the effort, so that's a personal choice you have to make for your own site.