Dealing with Spam
Contents
Background
When you allow user contributed content of any form (stories, comments, forum posts, ...) on your Geeklog site, you will sooner or later have to deal with spam. That is, unfortunately, the reality of the web of today and it is not going to change any time soon, as long as spamming is cheap, can be done safely through open proxies or from countries, hosting services, and ISP that don't have an anti-spam policy, and, most of all, as long as people keep on buying from spammers.
The Spam-X Plugin
Spam protection in Geeklog is mostly based on the Spam-X Plugin, originally developed by Tom Willet. It has a modular architecture that allows it to be extended with new modules to fight the spammer's latest tricks, should the need arise.
Spam detection
Geeklog and the Spam-X plugin will check the following for spam:
- Story submissions
- Comments
- Trackbacks and Pingbacks
- Event submissions
- Link submissions
- The text sent with the "Email story to a friend" option
- A user's profile
Other plugins can also use the Spam-X plugin to filter their content for spam. The Forum plugin does that, for example.
Modules
Geeklog ships with the following Spam-X modules:
MT-Blacklist
MT-Blacklist was a blacklist, i.e. a listing of URLs that were used in spam posts, originally developed for Movable Type (hence the name) and maintained by Jay Allen.
Maintaining a blacklist is a lot of work, and you're continually playing catch-up with the spammers. Therefore, Jay Allen eventually discontinued MT-Blacklist on the assumption that new and better methods to detect spam are now available.
The MT-Blacklist modules were removed from Geeklog as of version 1.4.1. Instead, we are now shipping modules for SLV.
Personal Blacklist
The Personal Blacklist module lets you add keywords and URLs that typically exist in spam posts. When you're being hit by spam, make sure to add the URLs of those spam posts to your Personal Blacklist so that they can be filtered out automatically, should the spammer try to post them again.
This will also help you get rid of spam that made it through, as you can then use the Mass Delete Comments and Mass Delete Trackbacks modules to easily remove larger numbers of spam posts from your database.
IP Filter
Sometimes you will encounter spam that is coming from one or only a few IP addresses. By simply adding those IP addresses to the IP Filter module, any posts from these IPs will be blocked automatically.
Please note that IP addresses aren't really a good filter criterion. While some ISPs and hosting services are known to host spammers, it won't help much to block an IP address by one of the well-known ISPs. Often, the spammer will get a new IP address the next time he connects to the internet, while the blocked IP address will be reused and may be used by some innocent user.
IP of URL Filter
This module is only useful in a few special cases: Here you enter the IP address of a webserver that is used to host domains for which you may see spam. Some spammers have a lot of their sites on only a few webservers, so instead of adding lots of domains to your blacklist, you only add the IP addresses of those webservers. The Spam-X module will then check all the URLs in a post to see if any of these is hosted on one of those blacklisted webservers.
Experimental Modules
As mentioned above, Spam-X can easily be extended by dropping new modules into the /path/to/geeklog/plugins/spamx/ directory. The following modules are currently under development:
Filters
These modules make use of external services to rate posts as spam (or not spam):
- Akismet modules
- The spam-merge wiki spam list is a shared blacklist of serveral Wiki communities and sort of a successor to the discontinued MT-Blacklist. For more information and how to use it with Geeklog see here and here.
Actions
The Spam-X plugin also lets you define actions to be performed once a post has been recognized as spam. The default actions are to delete the post and, optionally, send an email to the site admin.
The following modules also pass the information from a spam post on to other plugins (see below) to block any further spam posts from the same source automatically:
- Bad Behavior module (Note: For use with Bad Behavior 1 only - won't work with Bad Behavior 2)
- Ban plugin module
Other plugins
Bad Behavior
Bad Behavior is a collection of scripts checking for broken HTTP requests as well as signatures of known spambots and aims at stopping those before they even get the chance to post spam.
Bad Behavior was written by Michael Hampton as a plugin for WordPress but has since been ported to Geeklog.
Ban Plugin
"The Ban Plugin is designed to do one thing. Provide an easy way to ban people from your web site." (quoted from Tom Willet's site).
Other methods
When you can identify a spammer by their HTTP request, i.e. by the IP address or by certain characteristics in their HTTP request, e.g. an unusual user agent string, the most efficient method of blocking them is directly on the webserver.
On an Apache webserver, that is usually done in the .htaccess file, as explained here. This method is also useful against script kiddies and worms.
Resources
- How to use Spam-X in your own plugin (for plugin developers)
- The geeklog-spam mailing list is the best place to discuss new anti-spam techniques for Geeklog as well as report spam sightings.
- Ann Elisabeth Nordbo aka Spam Huntress provides lots of useful information regarding web spam on her site and collects information about known spammers in her wiki.
- Other anti-spam services that could possibly be integrated into Spam-X