r69 - 29 May 2007 - 22:55:53 - PeterThoenyYou are here: TWiki >  Codev Web > WikiSpam
Tags:
development 1 Add my vote for this tag, spam 1 Add my vote for this tag, , create new tag

WikiSpam

ALERT!NOTE: All administrators of public TWiki sites are encouraged to upgrade to the latest BlackListPlugin, version 29 Mar 2007 (r13238). It prevents known wiki-spam from getting saved in TWiki topics and uploaded as HTML attachments, makes scripted registrations harder, and protects the site from excessive use by an IP address. It also protects TWikis from topics text and attachments with evil script eval() and escape().
ALERT!NOTE: TWiki administrators are encouraged to subscribe to the TWikiAnnounceMailingList to get alerted of security issues, new TWiki releases and important spam related announcements.
ALERT!NOTE: A new type of spam has emerged: HTML attachement spam. This spam is nasty because it might identify TWiki sites as spam sites.

Wiki spam is a growing problem on public wiki sites. Actually, it is not isolated to wikis; any website that can be updated by users is a potential target for spam, such as blogs and bulletin boards. What is considered spam on a website? In the broadest sense any content that is off-topic and considered unwanted by its website users. The most common spam on writable websites is Wikipedia:Link_spam: Spammers add links to their websites in many wikis, blogs, and bulletin boards, with the hope that search engines will raise the ranking of their page. In other words, spam is added not for human consumption, but for search engine spiders. This strategy works unfortunately, spam sites are listed in the first page of a search results as can be seen in this Google search for Alprazolam. You understand why if you search for Alprazolam and wiki.

What can you do as an administrator of a public wiki site?

  • Rule 1: Enable spam protection
  • Rule 2: Remove spam as quickly as possible when it happens. Reason: Spammers identify easy targets by searching sites for known spam keywords. It pays off to spam sites where spam survives long enough for search engines to spider the content.

Most of the wiki spam happens on wiki pages. Anonymous users or newly registered users add link spam to their wiki homepage and other wiki pages. Spammers are getting more sophisticated. A new type of spam has emerged recently: HTML attachment spam on wikis where users can attach HTML files to a wiki. This spam is nasty because it might identify wiki sites as spam sites. It works like this: A spammer attaches a web page to a wiki with lots of ads for what they sell. Then they add link spam to many wikis, blogs and bulletin boards to raise the page ranking of the HTML page attached to the wiki.

Public TWiki sites are spam targets for some time already. TWiki has a BlackListPlugin that is quite effective in fighting spam. The Plugin gets updated every time a new spam twist is discovered, such as an HTML redirect obfuscated in a JavaScript eval statement. The BlackListPlugin fights spam on several fronts:

  • Multiple registrations by the same IP address in rapid succession
  • Multiple page saves by the same IP address in rapid succession
  • Saving text with known wiki-spam (spam list is maintained and shared by TWiki, MoinMoin and Mediawiki sites)
  • Attaching files with known wiki-spam
  • Attaching files with JavaScript eval statements
  • Manually maintained BLACKLIST of malicious IP addresses
  • Automatically updated BANLIST of IP addresses with suspicious activities
  • Registration form with magic number in hidden form field to make scripted registrations harder
  • Add a rel="nofollow" parameter to external URLs to defeat the purpose of spamming TWiki sites

Administrators of public TWikis are strongly recommend to install the latest BlackListPlugin? . The reality however is that there are still many public TWiki sites that do not even have this Plugin installed. To address the issue, we sent out several spam related alerts to the twiki-announce mailing list, and we sent personal e-mails to some site owners not on the list. Still, the awareness of wiki spam needs to be raised so that more site owners take actions.

(Above content largely taken from Peter Thoeny's Wiki Corner on Wiki Spam on Public Wikis)

Related links on wiki spam:

-- Contributors: PeterThoeny

Discussions

At TWiki.org we have once in a while an issue with people deleting or altering content by purpose or because they do not understand how Wiki works. This happens mainly in the Main web, and sometimes in the TWiki web. It is a minor annoyance that can be fixed quickly.

Yesterday we had the first case of Wiki:WikiSpam where HasitRuparel with IP address 219.65.75.99 was spamming over a dozen user home pages by adding these URLs: (edit page to see the URL)

The log suggests that this person edited the pages manually:

| 10 Feb 2004 - 03:20 | Main.HasitRuparel | save | Main.HasitRuparel | repRev 1.1 Main.HasitRuparel 2004/02/10 10:52:00 | 219.65.75.99 |
| 10 Feb 2004 - 03:27 | Main.HasitRuparel | save | Main.AndreaMarchetti |  | 219.65.75.99 |
| 10 Feb 2004 - 03:28 | Main.HasitRuparel | save | Main.AndreasKapp |  | 219.65.75.99 |
| 10 Feb 2004 - 03:30 | Main.HasitRuparel | save | Main.BillLeeney |  | 219.65.75.99 |
| 10 Feb 2004 - 03:31 | Main.HasitRuparel | save | Main.BillKelly |  | 219.65.75.99 |
| 10 Feb 2004 - 03:32 | Main.HasitRuparel | save | Main.AldenWilner |  | 219.65.75.99 |
| 10 Feb 2004 - 03:32 | Main.HasitRuparel | save | Main.BenjaminDrieu |  | 219.65.75.99 |
| 10 Feb 2004 - 03:33 | Main.HasitRuparel | save | Main.AllenBierbaum |  | 219.65.75.99 |
| 10 Feb 2004 - 03:33 | Main.HasitRuparel | save | Main.BalthasarSieber |  | 219.65.75.99 |
| 10 Feb 2004 - 03:34 | Main.HasitRuparel | save | Main.AndreaSterbini |  | 219.65.75.99 |
| 10 Feb 2004 - 03:58 | Main.HasitRuparel | save | Main.ChristopheVermeulen |  | 219.65.75.99 |
| 10 Feb 2004 - 03:58 | Main.HasitRuparel | save | Main.BrillPappin |  | 219.65.75.99 |
| 10 Feb 2004 - 03:59 | Main.HasitRuparel | save | Main.ChristopheVermeulen | repRev 1.4 Main.HasitRuparel 2004/02/10 11:59:00 | 219.65.75.99 |
| 10 Feb 2004 - 03:59 | Main.HasitRuparel | save | Main.BjornStadil |  | 219.65.75.99 |
| 10 Feb 2004 - 03:59 | Main.HasitRuparel | save | Main.BrillPappin | repRev 1.2 Main.HasitRuparel 2004/02/10 11:59:00 | 219.65.75.99 |
| 10 Feb 2004 - 04:00 | Main.HasitRuparel | save | Main.ChrisMcLennan |  | 219.65.75.99 |
| 10 Feb 2004 - 04:05 | Main.HasitRuparel | save | Main.CarrieCoy |  | 219.65.75.99 |
| 10 Feb 2004 - 04:10 | Main.HasitRuparel | save | Main.DacreWroe |  | 219.65.75.99 |

Now, a spammer could raise the Google ranking of his web site by spamming Wiki pages in an automated way, which is a scary thing for public Wikis.

Here is an interest related post on the WikiForum? by Arno Hollosi:

John Abbe wrote: > Well, the NCDD wiki was found, and spammed by a robot before we even
> have gone public. I'm trying to nudge the team off a sudden interest
> in HardSecurity? . At the same time, on Wiki:ReverseLinkDisabled
> there's mention of turning away IPs with a high request rate.
> Can anyone offer good starting settings for such a protection - how
> many requests in how short a time to trigger it?

On Sensei's Library (http://senseis.xmp.net/) which is one of the largest non-Wikipedia wikis I have a 3-step meassure:

  • limit requests/minute: anything beyond 30 requests within 60 seconds and the IP address is disabled for 5 minutes. If after that the maximum gets exceeded again within an hour then the IP address is disabled for 24 hours.

  • shield resource intensive requests (or edit links etc.) by checking for a HTTP referer header that originates from your site. Effective as well. Some browsers (privacy proxys, ...) supress the referer header. Those people have to set a (preference) cookie in order to access those functions.

  • one of the most effective meassures is adding a "trap link". I.e. if the link is followed the IP address is immediatly added to the block list (at Sensei's for 48 hours). Mark this link as "Disallow" in your robots.txt file so compliant robots don't follow the link. At Sensei's look at source and search for "Blockme" to find the trap link - users are not able to active it, as it contains no link-text.

In my experience of running this high traffic site, the trap link in combination with the referer header is most effective. The requests/minute is only there so that people don't mirror the wiki with wget or some other tool.

See also: http://senseis.xmp.net/?AccessBlocked

Is it time to write a WikiSpamPlugin? ?

-- PeterThoeny - 12 Feb 2004

Looks like the 'apocalypse' mentioned in SpamProofingOfComments (first comment) has finally arrived frown

There are two quite different sorts of WikiSpam, I believe:

  • SpamProofingOfComments - what just happened at TWiki.org, could be manual or automated. SlashDot style 'you can't edit another page for N minutes' tests could help here, particularly if people are creating unique userids and not using TWikiGuest. IP-based checking is useful for the TWikiGuest account. Another approach would be to use SpamAssassin to check the content of comments/edits to TWiki pages, which is probably the only defence against patient manual or automated comment spam (i.e. the user or robot does 1 comment every 20 minutes, say).
  • Excessive page views by robots? (search engines or mirroring) - this is the target of the Sensei's Library feature mentioned above. It is a component in SpamProofingOfComments.

-- RichardDonkin - 12 Feb 2004

We had again Wiki Spam in the Main web, similar spam like above has been added to almost 100 pages by IP address 203.88.152.253 and 203.88.152.17. The BlackListPlugin is the answer to this spam; the Plugin is installed at TWiki.org. Please provide feedback on the Plugin at BlackListPluginDev.

Edit this topic in case you want to see the topic save logs of the spammer and the content of spammed user home pages.

-- PeterThoeny - 21 Mar 2004

Last night we had the same issue again (same spam content), this time registering 46 dummy users and spamming 36 existing user topics, using IP address 203.88.155.244 and 203.88.155.135. I updated the the BLACKLIST in BlackListPlugin accordingly. JohnTalintyre and I scubed the walls clean. Quickly removing graffiti is key, there are plenty of walls around elsewhere for the artist to use if the artwork disappears quickly on TWiki.org. The real solution of corse is to enhance the Plugins as described in BlackListPluginDev.

Edit this topic in case you want to see the topic save logs of the spammer and the content of spammed pages.