Personal tools
Share This Page
Facebook
del.icio.us
StumbleUpon

PLEASE NOTE Spammers are terrible. To request an account to edit UMassWiki, or if you need accounts for your students, contact GMorehoumail.gif

UMassWiki:Blocking Spam In Mediawiki

From UMassWiki

Jump to: navigation, search
THIS HOWTO IS LONG OUT OF DATE AS OF MID 2012, BUT MIGHT STILL BE A GOOD STARTING POINT. Thanks to all the MediaWiki community who've helped me over the years. I let things go to seed for 18 months and ended up deleting about 7500 pages and 4000 accounts; fighting wiki spam is a constant arms race. You MUST keep up with it on a public wiki. GMorehou(talk) 03:48, 5 July 2012 (EDT)

Wiki spam is a fact of life these days. Most places on the net advise wiki operators to disallow anonymous editing or even block open registration altogether to prevent some or all linkspam. That's not an ideal solution. Wikis thrive on openness.

This HOWTO is a brief step-by-step guide which should greatly reduce or even stop spam on your Mediawiki installation. Blocking spam is essential to any open MediaWiki installation. This multi-pronged approach works very well for me in fighting spam on UMassWiki, a wiki which gets a fair amount of traffic and is listed in many places on the web.

In fact, this formula works so well, I avoided having a single successful spam insertion on UMassWiki for over 14 months at one stretch, and I went back to an open edit policy (no registration required) in August 2006.

I'll list the steps to blocking spam in order from simplest to most difficult, though none of them are very hard. Even if you just apply the first step, you'll throw up a significant roadblock to spammers targeting your wiki.

If you want easier access to find out what IPs your registered users are coming from, see UMassWiki:Installing CheckUser.

Contents

Five steps to blocking spam in Mediawiki

CSS Hidden Spam

Block CSS Hidden Spam from div tags. Tons of wiki spam these days is hidden in specially constructed div tags. Mediawiki allows div tags, and I like to use them, so I use this solution which allows most of the tags but blocks them if they use any attributes which may be used for hiding and have little or no use in a wiki.

Edit your LocalSettings.php file and add the following line:

$wgSpamRegex = "/\<.*style.*?(display|position|overflow|visibility|height)\s*:.*?>/i";

Blank User Agents

Very VERY few legitimate clients leave their user agent field blank, and those that do should fix this behavior. You can stop spam at the door by forbidding access to your wiki to anyone connecting with a blank user agent. If you don't already have a .htaccess file in the root of your site, create one. Then edit .htaccess and add this:

SetEnvIf User-Agent ^$ spammer=yes     # block blank user agents

Order allow,deny
allow from all           
deny from env=spammer

This will return a 403 Forbidden error to any robot connecting with a blank user agent. If you want to return a custom 403 page, you'll need it at a separate domain or subdomain (otherwise they'll just get another 403). I have a separate subdomain for error pages, so I added the following to my .htaccess below the above lines:

ErrorDocument 403 http://error.umasswiki.com/403.html

Thanks to Spam Huntress (love that name) and others for this one.

ConfirmEdit

Install the ConfirmEdit extension. ConfirmEdit uses captchas (simple tests to "prove" an editor is human before registering an account or allowing certain types of edits) to prevent many spam robots from inserting spam into a wiki.

You can use "fancy" image captchas with squiggly letters, the kind everyone is used to seeing these days, or you can opt to use simple math problems. The image-based captchas present problems for blind people, but it's probably easier to write a spam robot to detect and do math problems, so there's a tradeoff. I use the math problems on UMassWiki for maximum accessibility. They're very simple addition and subtraction only.

See the ConfirmEdit page on Meta for download and installation instructions. If you want to configure your captchas for minimum annoyance and maximum accessibility, you can configure them just as I have:

  • Not to bother Sysops and (registered) bots
  • Only to bother registered users when they are adding external links
  • To always require a captcha on newly created accounts

You'll need to edit ConfirmEdit.php and make the following settings. Scroll down in the file, defaults are already set, so you shouldn't just add these at the top or bottom.

$wgGroupPermissions['*'            ]['skipcaptcha'] = false;
$wgGroupPermissions['user'         ]['skipcaptcha'] = false;
$wgGroupPermissions['autoconfirmed']['skipcaptcha'] = false;
$wgGroupPermissions['bot'          ]['skipcaptcha'] = true; // registered bots
$wgGroupPermissions['sysop'        ]['skipcaptcha'] = true;

$wgCaptchaTriggers['edit']          = false; // Would check on every edit
$wgCaptchaTriggers['addurl']        = true;  // Check on edits that add URLs
$wgCaptchaTriggers['createaccount'] = true;  // Special:Userlogin&type=signup

If you're having problems with vandal bots or malfunctioning spambots damaging your pages without inserting URLs, you can allow all registered users to skip the captchas, but require unregistered users to solve the captcha with every edit. I don't recommend this as a first-line defense, but I dealt with a minor botnet attack for about 6 weeks where 8-16 random alphanumeric characters would be inserted into random pages on UMassWiki before I finally got fed up and turned on captchas for every edit by an unregistered user. Note that currently there is no easy solution where you can have both captchas on every unregistered edit and captchas for registered users when they add a URL. Here are the options:

$wgGroupPermissions['*'            ]['skipcaptcha'] = false;
$wgGroupPermissions['user'         ]['skipcaptcha'] = true;
$wgGroupPermissions['autoconfirmed']['skipcaptcha'] = true;
$wgGroupPermissions['bot'          ]['skipcaptcha'] = true; // registered bots
$wgGroupPermissions['sysop'        ]['skipcaptcha'] = true;

$wgCaptchaTriggers['edit']          = true; // Would check on every edit
$wgCaptchaTriggers['addurl']        = true;  // Check on edits that add URLs
$wgCaptchaTriggers['createaccount'] = true;  // Special:Userlogin&type=signup

Bad Behavior

Even though plenty of spammers are dumb enough to be stopped by the blank user agent block, plenty are not. They use programs which fake the user agent information, but you can use a program which analyzes their requests for suspicious behavior, poorly faked agent strings, or connections originating from known spam addresses.

Download and install the Bad Behavior extension. As of version 2.0.10, installation was dead simple -- just unpack into your extensions/ directory, change a config setting or two, and add a line to your LocalSettings.php as documented in the instructions. It takes less than 10 minutes and begins blocking spam instantly.

Bad Behavior 2 Extended

I have written a simple additional extension for MediaWiki which adds a special page detailing Bad Behavior's blocking activity. You can easily download and install it so that you can avoid having to dig into your database to see what is going on. See UMassWiki:Bad Behavior 2 Extended.

Bugfixes

On MediaWiki 1.7.1, there is a minor bug which adds whitespace to certain pages (mainly Special:Recentchanges and the edit page) which can break up bits of the user interface in an annoying but nondestructive fashion. See bug 7424 on the MediaWiki Bugzilla for more information. There may also be a workaround.

As of MediaWiki 1.10.0, installing Bad Behavior can result in a blank white page because the extension uses the deprecated wfQuery() call to access the database. Adding this line just below the if (!defined('MEDIAWIKI')) die(); line in bad-behavior-mediawiki.php appears to fix it: require_once( "$IP/includes/DatabaseFunctions.php" );

See also http://www.gossamer-threads.com/lists/wiki/mediawiki/101502 for more on this issue.

Caveats

Bad Behavior may not be appropriate for large, open-access wikis which absolutely must not hamper access to a human under any circumstances. There's a small but significant chance that a person using a virus-infected or trojaned computer could be blocked if their computer is listed in certain DNS blacklists. This may be unacceptable to certain wikis. I stopped using Bad Behavior for a while due to the very occasional block of a legitimate user, and I still didn't have any successful spam insertions for many months without it. I am currently again investigating its use after a botnet attack on one of my wikis.

For smaller sites which can afford to break one or two eggs making a 10,000 egg omelette, Bad Behavior is great. And, Bad Behavior returns an explanation page to blocked clients in case there's a problem.

SpamBlacklist

NOTICE: This section is somewhat out of date. Please see http://www.mediawiki.org/wiki/Extension:SpamBlacklist

There's an extension of sorts for MediaWiki called the SpamBlacklist extension. Right now it's not well documented and a bit of a pain to download since you have to save each file individually from CVS into a directory in your extensions/ directory. However, it works pretty well.

SpamBlacklist blocks spam by analyzing edits to the wiki and searching for URLs that are known spamvertised sites. If it finds one, it will refuse the edit and tell the user what's wrong with the page.

SpamBlacklist, by default, downloads Wikimedia's blacklist and protects your wiki using this. However, you can maintain your own local blacklist as a page right in your wiki. You can use your local blacklist instead of or in addition to Mediawiki's list.

I strongly reccommend setting up a local blacklist in your wiki. If you get a specific spam attack not covered by Mediawiki's list, you can add the term to your local blacklist. Then, you can use the cleanup.php script included with the extension to go through your wiki and revert spam edits. Be warned, the cleanup script is aggressive. If the only existing revision of a page contains spam, it will be blanked. (You can always revert, of course -- all changes made by the script appear in Special:Recentchanges.)

Here is what I added to my LocalSettings.php after putting the SpamBlacklist extension in extensions/SpamBlacklist:

# SpamBlacklist extension
    require_once( "$IP/extensions/SpamBlacklist/SpamBlacklist.php");
    $wgSpamBlacklistFiles = array(
        "http://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw&sb_ver=1", // Wikimedia's list
        "DB: umasswikidb Spam_Blacklist",
    );

The first line in the array tells SpamBlacklist to use Mediawiki's blacklist, which it will not do by default if you add anything to $wgSpamBlacklistFiles.

The DB: line points to my local blacklist for UMassWiki. The first word is the name of my wiki database from $wgDBname. The second is the page title of my blacklist page, [[Spam Blacklist]]. Note that you must convert space characters to _ (underscore) characters.

In a couple days of testing, SpamBlacklist failed to load the Mediawiki blacklist at first, falling back to my local blacklist only. After a few hours, it started working on its own. This may have been the result of network problems.

The main drawbacks of this approach are that it may slow posting of edits, particularly on slow or overloaded servers.

Find the instructions for download and installation here.

To use the cleanup.php script to automatically revert edits which match your spam blacklists, just run 'php cleanup.php' in the extensions/SpamBlacklist/ directory you created once your LocalSettings.php is configured properly.

See also

Academics
Student Life
Food
Recreation
Campus
Local
fb Was this article useful? Please spread the word and share on Facebook!
Site Sponsors