Wiki - combating spam

From NoskeWiki
Jump to: navigation, search

About

NOTE: This page is a daughter page of: Wiki


It's sad, but every wiki pages are often "spammed" with links to external links (most of them porn sites) or just junk text. This is usually done by "robots" - programs that navigate the internet and collect and/or modify information - rather than people. When I notice spam starting to appear on my own wiki site I usually do the following:


Spam Avoidance

To prevent or at least reduce further span to your MediaWiki site there are several techniques which can work and these have been listed here. Some of these solutions involve add-ons to your wiki which search large blacklists of known bad websites or bad IP addresses. Two of the best methods are listed here:


CAPTCHA (to prevent edits by bots)

Probably the most popular solution is to use a "CAPTCHA" - a simple response test (usually a distorted image of text) to check the user is human. The easiest way to use CAPTCHA is to install the "ConfirmEdit" extension which lets you choose between several effective CAPTCHA techniques, including: ReCAPTCHA, specifying your own questions (proven very effective to prevent most spam from bots), solving math problems or choosing the image of a cat among images of dogs! With the right CAPTCHA, you can eliminate almost all attacks by robots, but won't attack from abuse by humans... for that you may want to protect pages from anonymous edits (people who haven't created an account). One method to prevent future spamming is to protect all pages by going through them manually.


Protect All Pages with Protection Rules

Although it's possible for a sysop user to protect pages individuality, this can get a bit tedious if you have many pages. An easier method is to apply group permission rules to all pages by adding the following lines to LocalSettings.php:

$wgGroupPermissions['*']['edit'] = false;
$wgGroupPermissions['user']['edit'] = false;
$wgGroupPermissions['sysop']['edit'] = true;

This prevents anyone (except sysop users) from editing OR creating new pages. NOTE: The third line is actually redundant (see here)

You could also create more specific rules. The following code allows only users registered for over a week to create new pages (see: "MediaWiki: Preventing access"):

# Prevent anonymous users from creating pages:
$wgGroupPermissions['*']['createpage'] = false;
 
# Allow only users with accounts seven days old or older to create pages (*requires MW 1.6 or higher):
$wgGroupPermissions['*'            ]['createpage'] = false;
$wgGroupPermissions['user'         ]['createpage'] = false;
$wgGroupPermissions['autoconfirmed']['createpage'] = true;
$wgAutoConfirmAge = 86400 * 7;    # seven days times 86400 seconds/day

NOTE: In addition to 'edit' and 'createpage', you can also create rules for 'edit-talk', 'createtalk' and many others (see MediaWiki:User rights)

Although it's possible to create protection rules, this is not ideal. Wiki thrives on openness - so it's probably quite annoying for a user if he has register then wait seven days before. You should also know that many bots have noticed this tendency and will not only create accounts - typically with random character names - but will wait multiple days before creating spam pages. To avoid bots the better solution is to use CAPTCHA and then for bad human edits you may want to just wait until a page is spammed before undoing the damage and protecting abused pages individually.


Spam Correction

In many cases the damage to your site is already done - the human or robot spammers have already vandalized your existing pages or created hundreds of new pages. The techniques I've listed here are ones that I've used and can help identify, undo and protect spammed pages in your MediaWiki.


Identify Spammed Pages

  • Log in as a system operator user (eg: WikiSysop).
  • Go to the "Special:RecentChanges" page (usually featured on the left navigation bar) and look for any abuse - these are usually marked with a red exclamation and done by users with either nothing or a random string of characters as their name (eg: rkXmTSne)
  • Click the page name of a suspicious entry to check it has indeed been spammed: sometimes this will be obvious (a hyperlink at the top, or most of the page deleted), and sometimes it will be more subtle.
  • Open the "Special:UncategorizedPages" page (under "Special Pages" on the left navigation bar) and this usually shows all the pages created by robots which I want to delete.
    • NOTE: This worked only because I add category tags at the bottom of all my pages (eg: "[[Category:Computers]]" but the bots creating new pages didn't add these tags


Block Abusive Users

  • Click "History", then click on "last" for the first entry.
  • Notice in green is the changes this user made - if these are clearly malicious, make a mental note of their IP address, and click "block" (on the right).
  • Change expiry to "infinite" (you can leave the others fields as default) and click "Block this user". Their IP address (i.e. computer) is now blacklisted from making changes to your site.
  • Now click "Back" in your browser to return to the history page.
    • NOTE: Unfortunately, if your spam was caused by a bot (instead of a person) then the bot is almost certainly running on a huge number of other computers / IP addresses, so this isn't a great fix - hence the importance of CAPTCHA.


Revert the Page (Undo the Damage)

  • On the history page, scroll down and finding that last entry by yourself or a different IP address, then click "last".NOTE: the same IP address may have spammed you several hundred times, so you may have to click "500".
  • Here you can click "Previous diff" and "Next diff" to identify where your page was first abused.
  • Find the page with the good version, click "Edit" and simply copy the text with [Ctrl+a] then [Ctrl+c].
  • Click "Page" (at the top), and then "Edit" to go to edit the current version of the page.
  • Paste in the desired text with [Ctrl+a] then [Ctrl+v] and click "Save Page".
    • Note: There are other methods involving "undo"/"rollback", but I find my way this easier.


Protect the Page

  • It's now a good idea to click "Protect" (up the top) - to prevent anyone (except system operators) from changing this page.
    • NOTE: A page spammed by one IP address or person is likely to be spammed by another in the near future.


Conclusion

This page has show several methods to avoid and undo spam to your MediaWiki site. Note that none of these techniques is a "silver bullet". If your site becomes popular you may have to combine different techniques (eg: CAPTCHA + permission rules) to prevent both robotic and human spam, and even with a good CAPTCHA in place it's still recommended you check the "Recent Changes" every once in a while to check for suspicious activities. More reading is provided in the links below.


Links