PLEASE NOTE The wiki will be rolled back to a snapshot from approximately June 2011, cleaned of spam, and closed to further editing sometime in the near future. If this is a problem for you, or if you are interested in taking over UMassWiki so it can continue to grow, contact
UMassWiki:Blocking Spam In Mediawiki
From UMassWiki
Note: Some of these instructions are a bit out of date as of early 2009, but it's still a good starting point.
Wiki spam is a fact of life these days. Most places on the net advise wiki operators to disallow anonymous editing or even block open registration altogether to prevent some or all linkspam. That's not an ideal solution. Wikis thrive on openness.
This HOWTO is a brief step-by-step guide which should greatly reduce or even stop spam on your Mediawiki installation. Blocking spam is essential to any open MediaWiki installation. This multi-pronged approach works very well for me in fighting spam on UMassWiki, a wiki which gets a fair amount of traffic and is listed in many places on the web.
In fact, this formula works so well, I avoided having a single successful spam insertion on UMassWiki for over 14 months at one stretch, and I went back to an open edit policy (no registration required) in August 2006.
I'll list the steps to blocking spam in order from simplest to most difficult, though none of them are very hard. Even if you just apply the first step, you'll throw up a significant roadblock to spammers targeting your wiki.
If you want easier access to find out what IPs your registered users are coming from, see UMassWiki:Installing CheckUser.
Contents |
[edit] Five steps to blocking spam in Mediawiki
This was so helpful and easy! Do you have any artlceis on rehab?
[edit] Blank User Agents
Very VERY few legitimate clients leave their user agent field blank, and those that do should fix this behavior. You can stop spam at the door by forbidding access to your wiki to anyone connecting with a blank user agent. If you don't already have a .htaccess file in the root of your site, create one. Then edit .htaccess and add this:
SetEnvIf User-Agent ^$ spammer=yes # block blank user agents Order allow,deny allow from all deny from env=spammer
This will return a 403 Forbidden error to any robot connecting with a blank user agent. If you want to return a custom 403 page, you'll need it at a separate domain or subdomain (otherwise they'll just get another 403). I have a separate subdomain for error pages, so I added the following to my .htaccess below the above lines:
ErrorDocument 403 http://error.umasswiki.com/403.html
Thanks to Spam Huntress (love that name) and others for this one.
BS low - rationaltiy high! Really good answer!
Ya learn soemhting new everyday. It's true I guess!
[edit] SpamBlacklist
NOTICE: This section is somewhat out of date. Please see http://www.mediawiki.org/wiki/Extension:SpamBlacklist
There's an extension of sorts for MediaWiki called the SpamBlacklist extension. Right now it's not well documented and a bit of a pain to download since you have to save each file individually from CVS into a directory in your extensions/ directory. However, it works pretty well.
SpamBlacklist blocks spam by analyzing edits to the wiki and searching for URLs that are known spamvertised sites. If it finds one, it will refuse the edit and tell the user what's wrong with the page.
SpamBlacklist, by default, downloads Wikimedia's blacklist and protects your wiki using this. However, you can maintain your own local blacklist as a page right in your wiki. You can use your local blacklist instead of or in addition to Mediawiki's list.
I strongly reccommend setting up a local blacklist in your wiki. If you get a specific spam attack not covered by Mediawiki's list, you can add the term to your local blacklist. Then, you can use the cleanup.php script included with the extension to go through your wiki and revert spam edits. Be warned, the cleanup script is aggressive. If the only existing revision of a page contains spam, it will be blanked. (You can always revert, of course -- all changes made by the script appear in Special:Recentchanges.)
Here is what I added to my LocalSettings.php after putting the SpamBlacklist extension in extensions/SpamBlacklist:
# SpamBlacklist extension
require_once( "$IP/extensions/SpamBlacklist/SpamBlacklist.php");
$wgSpamBlacklistFiles = array(
"http://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw&sb_ver=1", // Wikimedia's list
"DB: umasswikidb Spam_Blacklist",
);
The first line in the array tells SpamBlacklist to use Mediawiki's blacklist, which it will not do by default if you add anything to $wgSpamBlacklistFiles.
The DB: line points to my local blacklist for UMassWiki. The first word is the name of my wiki database from $wgDBname. The second is the page title of my blacklist page, [[Spam Blacklist]]. Note that you must convert space characters to _ (underscore) characters.
In a couple days of testing, SpamBlacklist failed to load the Mediawiki blacklist at first, falling back to my local blacklist only. After a few hours, it started working on its own. This may have been the result of network problems.
The main drawbacks of this approach are that it may slow posting of edits, particularly on slow or overloaded servers.
Find the instructions for download and installation here.
To use the cleanup.php script to automatically revert edits which match your spam blacklists, just run 'php cleanup.php' in the extensions/SpamBlacklist/ directory you created once your LocalSettings.php is configured properly.
[edit] See also
- http://mediawiki.org/wiki/manual:combating_spam
- http://mediawiki.org/wiki/manual:combating_vandalism

Was this article useful? Please spread the word and