create new tag
, view all tags

HTML Attachment Spam - Beware of New Type of Spam

ALERT!NOTE: All administrators of public TWiki sites are encouraged to upgrade to the latest BlackListPlugin, version 2013-03-22. It prevents known wiki-spam from getting saved in TWiki topics and uploaded as HTML attachments, makes scripted registrations harder, and protects the site from excessive use by an IP address. It also protects TWikis from topics text and attachments with evil script eval() and escape().

Recently a new type of spam has been seen that can have very negative consequences for TWiki site owners.

Traditionally spammers have added links to their porn/medicin/whatever websites in the effort of getting a higher Google ranking.

Naturally this meant that their URLs quickly got blacklisted everywhere.

Now they use our TWiki's as webservers.

What they have done until now is register as a regular user editing only their own user topic.

What they do is they add a number of attachments to the user page. The attachments are all HTML pages that your Apache will serve quite normally without activating TWiki because they are served right out of the pub directory.

After they have planted their files they start adding links directly to these html files from misc other wikis, blogs, guestbooks, forums etc. And YOUR site ends up with a high Google score when searching for various product names.

And soon YOUR IP address and domain name will be on the common blacklists. If your TWiki is important to you and/or your business you should be concerned about this fact.

The first thing you should do when you read this is look for a user called NeoVertigo. He has spammed quite many TWikis the past days.

The 2nd thing is to look in the pub/Main directory and check out your users attachments. Look for .htm or .html files.

If you find something my experience is that the spammer has invested quite a lot of energy getting his pub directory high on the Google score. So he will keep on re-registering and add his junk again and again. A spammer has been seen re-registered 4 times during 3 hours on the same site. It is not enough to delete his user account. You need to make it useless and prevent re-registration.

  • Empty his homepage for any content.
  • Delete the ,v file so there is no history. Delete the pub directory that belongs to his account.
  • Put either a * Set ALLOWTOPICCHANGE = Main.TwikiAdminGroup in his topic
  • Or if possible change the ownership of both topic and directory under pub to have root as owner and chmod them to 400. This way the spammer will get nothing but errors.
  • Finally change his entry in .htpasswd to some illegal password and important: change the email address in .htpasswd to your own email address if you run TWiki4.

How can we prevent him from making a new account and start all over? We cannot. But we can annoy them.

In your httpd.conf you can add this

RedirectMatch .*pub/Main.*\.htm.*$ http://yourdomain.com/nospam.htm
Rewritecond %{QUERY_STRING} ^.*htm.*$
RewriteRule ^bin/viewfile/Main.* http://yourdomain.com/nospam.htm

and put a nice message for both spammer and other visitors on the page.

What this line does is it prevents anyone from seeing .htm or .html files from any users pub directory.

If you never attach html files to any topic you can use this

RedirectMatch .*pub/.*\.htm.*$ http://yourdomain.com/nospam.htm
Rewritecond %{QUERY_STRING} ^.*htm.*$
RewriteRule ^bin/viewfile/.* http://yourdomain.com/nospam.htm

In TWiki4 you can make uploading of .htm and .html named files by forcing TWiki to append a .txt to the file name. This does not prevent people from seeing the spam file but it is now shown as raw HTML and not at all very interesting to anyone. This filter will work in all webs and will prevent also the good guys from uploading html files in the pub directories. But it is rare that anyone really needs that.

It has been suggested to add htm|html to the $TWiki::cfg{UploadFilter} but this is of little use because the most popular browser Internet Explorer does not give a damm and shows the contents as a webpage anyway.

Here is a command you can run in your twiki/data directory to check for recent uploads of files ending with .htm

egrep ' upload .*\.htm' log201503.txt

Here is an improved command you can put into an "ths" alias (mnemonic "type html spam") for easy access:

alias ths='egrep " upload .*\.htm" log`date +%Y%m`.txt | sed "s/upload . \([^\.]*\)\.\([^ ]*\) . /http:\/\twiki.org\/p\/pub\/\1\/\2\//" | tail -20'

This is twiki.org specific; it produces a complete URL for copy & paste, such as:

11 May 2006 - 09:38 Main.FirstLast http:/twiki.org/p/pub/Support/ModPerlWindowsInstallProblems/configure.htm

The next move the spammers make is using the Trash web. So they upload first and then move it to the trash. So it is wise to block access to viewing any files from the pub dir in the trash web.

This is done with this in the apache config file (example)

<Directory "/var/www/twiki/pub/Trash">
   deny from all

We hope people will come up with more and maybe better counter measures. Take this one seriously since you do not want your IP or domain to end up blacklisted.

-- Contributors: KennethLavrsen, PeterThoeny


The BlackListPlugin should be enhanced to check also uploaded html files.

-- PeterThoeny - 26 May 2006

Since Internet Explorer shows .txt files with html inside as web pages it make little point to add the htm|html to the upload filter. So I removed it again from above.

-- KennethLavrsen - 26 May 2006

Just had the first case here on twiki.org. Someone registering as NewGood and attaching 6 spam html pages to the user homepage. Removed the account, trashed attachments, set attachment directory to read only.

-- PeterThoeny - 27 May 2006

A new version of the BlackListPlugin is available in TWiki 4 branch of SVN that scans for HTML attachment spam. I will release it in the next few days (after investigating an issue with the beforeAttachmentSaveHandler on Solaris, Bugs:Item2390, which breaks file uploads.) Please help test the Plugin.

-- PeterThoeny - 01 Jun 2006

Wiki spam isn't a new problem, of course, but as spammers and scammers get more desperate they are likely to more actively pursue new ways of conducting their "business". So it's very good that you've enhanced the BlackListPlugin to try to deal with this. Unfortunately, this is likely to be a continuing battle, as those of us who have fought the on the email spam front have discovered, and the very nature of wikis make it especially hard to combat.

-- MeredithLesly - 03 Jun 2006

As long as spammers are successful in their marketing (I start to think the criteria for this is that just one person actually buys their stuff ..) this will continue. Problem is only basically solved when people will choose to do something else than react on the spam :-/.

But as this is not going to happen any time soon, let's continue the "fight" smile

If we look at spam sent over the SMTP protocol at the moment, most of it (75%, based on a quick review of my own share, ~90 mails/day) are sent as attachments (jpgs). That means traditional regex-based filters are ruled out, and the same is going to happen to twiki spamming.

For now the spammers are appearing to work html-based with regards to twiki (because they are experimenting with search engine ratings), but as soon as BlackListPlugin and other automatic measures are effective against this html-/text-based-kind, we will start to see a jpg/image/ppt/pdf/choose-your-binary-format/-based kind of spam instead.

Trying to automate this thorugh OCR and other technical initiatives is a way to go - but at the same time trying to have your community focused on this problem is a better and simpler approach in my view.

One thing we could consider while waiting: As google will automatically interpret many binary formats, a regular google check on your site is one of the easy things you can schedule regularly (i.e. a simple search like http://www.google.com/search?q=%2Bviagra+%2Binurl:twiki (add +site:mysite.com parm)). Unforunately leaveraging this through i.e. blacklistplugin is quite a job (remove content, trace/remove content contributor account, ban ip-address range or similar); community effort can apply more IQ up front and are readily available smile

Btw: If you explore some of the links from the search, you will notice much of the spam is not actually displayed to a visiting user, but just hidden in the source (typically using style=display:none). So we need to be aware that often the purpose of this kind of spam is another than just displaying links to the user (higher ratings/more referrers is just as important).

-- SteffenPoulsen - 04 Jun 2006

Yet another twist has emerged: E-mail spam pointing to HTML attachment spam that immediately redirects to a spam site (in order to earn clicks.) To complicate matters, the redirect is obfuscated with JavaScript. So far I identified three different approaches of obfuscating the spam site URL. Exmples:

Case 1, in HTML body:

<SCRIPT language=JavaScript>
(vj20+so26==6) eval(fak+sqg5+hez8+mek13+rgi18+vbb25+awq27+dh4+je15);}

Case 2, in HTML body:

<script>var t='<span id="sp" sty'+'le="dis'+'play:n'+'one"><h'+'1>404 
Not Found'+'</'+'h1><span 
ref=escape(document.referrer); var s="<scr"+"ipt 

Case 3, between HTML head and body:

<script src='x.js' type='text/javascript'>

And with this content in the x.js file:


The latest BlackListPlugin does not yet protect against obfuscated URL, until then I suggest site operators to monitor HTML attachments daily.

-- PeterThoeny - 09 Jun 2006

It's also important to make sure that WIKIWEBMASTER is set in Main.!TWikiPreferences, so that you will receive notifications of new registrations. Many public sites have the notices going to webmaster@example.com.

-- MeredithLesly - 09 Jun 2006

After adding the RedirectMatch line to my .htaccess file, I noticed that HTML files were still accessible via the viewfile script. I've added the following lines to the recommendations above. They seem to work well for my site, but I would be grateful if someone else could try them too.

Rewritecond %{QUERY_STRING} ^.*htm.*$
RewriteRule ^bin/viewfile/.* http://yourdomain.com/nospam.htm

-- AndyPryke - 25 Jun 2006

The evil spammers ( smile ) have placed 100's if not thousands of links to their spam which was placed on my website. I get a great number of hit on these pages.

In order to discourage the spammers, I think it makes sense to try and undermine their business model. If they spend the effort to create these links, but their potential "customers" or victims are put off, then their attempts at spamming have a negative effect on their profits. I think that this is the ultimate disincentive.

I have redirected the spam pages to a message which says

This page of spam advertising has been removed.
It was not placed on this website by the website owner.
If you try and buy drugs from spammers they will steal your credit card details.
You're also very unlikely to receive any pills, and if you do they will probably be complete fakes or dangerous
If you do feel that you need to obtain medicines, please see a qualified doctor who can advise you about their suitability and side effects.

You can see the page here: http://www.the-data-mine.com/spam_message.html

I'd recommend that other TWiki admins do the same, as I believe it will have a genuine effect against the spammers.

Any comments?

-- AndyPryke - 03 Jul 2006

Doesn't work. Doesn't cost them money. Just keeps eating your bandwidth.

-- MeredithLesly - 03 Jul 2006

It helps if humans click on the link. The spammers spam wiki sites mainly with the hope that search engines index it and raise the ranking because many sites pointing to their spam.

Andy, in order to reduce spam on your site it is better to remove all spam as quickly as possible. Spammers search for "viagra" and other spam words to find easy targets.

Update your robots.txt file and ask Google to update the cache of your site, http://www.google.com/support/webmasters/bin/topic.py?topic=8459

-- PeterThoeny - 03 Jul 2006

Updating the robots.txt to block indexing of the "pub" directory makes sense - I'll do this.

You're right that the spammers who place links in TWiki pages are doing this to increase their search ranking.

However, the spammers who add HTML as an attachment have a different tactic. This is evidenced by their (massive but presumably scripted) efforts to provide links to the pages they have uploaded. For example, google reports 21,000 links to the pages they uploaded to my site, though if you click through the google listings, they stop at about page 98 or so. I get a few hundred click-throughs a day at the moment.

Each of those 21k (or 980) links is competing for people who will click on dodgy dr_gs links. The people who click through are mostly "warm leads". If TWiki sites get a reputation for (i) quickly removing HTML spam and (ii) putting off those warm leads, then spammers will go elsewhere - they're in this to make money, not to cause us problems (it's just that they don't mind causing us problems).

In terms of bandwidth, the impact of a simple text page is negligible.

By the way, I also got an email from NewGood saying that they were having problems changing their password! I did respond but their inbox was full.

-- AndyPryke - 04 Jul 2006 (either forgot to add last time or it got deleted by accident)

AllI can only relate from many years of experience in fighting email spam (and blocking it from my clients) is that spammers adapt faster than we ever can. The only thing that's changed at all is that a few ISPs that used to be willing to host spammers don't any more. The result? Spammers created viruses that turn Windows machines into spamming machines, either sending out mail or hosting sites (primarily the former). I venture to say 90% of the spam that is sent to my clients comes from infected machines. This is, btw, why blocking IP numbers doesn't work and in fact has a negative effect: the IP numbers belong to innocent people who don't know that they're infected who might be interested in a wiki site and will wind up being blocked from registering through no fault of their own (other than running an infected machine, of course).

If you (or anyone else who is concerned with this problem) want to learn more about fighting spammers, read the various news.admin.net-abuse.* news groups. This battle has been going on for a long long time, with sadly few positive effects.

-- MeredithLesly - 04 Jul 2006

I don't agree that blacklisting IPs of zombified Windows machines is a bad idea - the owners of such machines need to learn how to keep their machine secure through auto-updates, anti-virus/spyware scanning, etc, so it is a good thing if a lot of sites start to block them. Then, maybe they'll call their ISP who will diagnose that their PC's been zombified and help them clean it up. Of course, this will only work if the blacklist address list is very widely used and they visit a blog/wiki that uses it.

-- RichardDonkin - 05 Jul 2006

TopicClassification TWikiDeployment
TopicSummary Warning to admins about new type of spam that can get your domain blacklisted

RelatedTopics WikiSpam
Edit | Attach | Watch | Print version | History: r21 < r20 < r19 < r18 < r17 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r21 - 2007-03-19 - PeterThoeny
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2015 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.