Tags: view all tags

More Ways To Make Spamming Unattractive

Registration process has been greatly improved in TWiki4. But with services like gmail, hotmail it is so easy to set up an email service and then later abandon it.

Surely we can make the registration process even more difficult with admin confirmations etc. But I do not want extra admin work and I do not want to scare users away. I use my public TWiki for open source project communities and openess is a must have. So I think we need to think more in the direction of making it pointless to add URLs to a TWiki just to increase the score and ranking at Google.

-- Contributors: KennethLavrsen

Discussion

So I am starting a brainstorming session to see if we can make a catalog of ideas.

I see some of the best of them as something to either add to the BlackListPlugin or to a new AntiSpamPlugin. Not as a part of core code.

Only accept URLs to a pre-approved list of domains.
Try and be smarter with interpretation of the styles so that URLs inside
are not even sent to the browser.
TWiki Administrator has to approve new users before they can post URLs. Like getting a positive reputation before.
Saving a topic with external URLs not on a positive list goes into a "hold" position and requires an Administrator to approve the submission.
If you are the member of the administrator group you get an extra action button - maybe only in the "More Topic Actions" page - where you can delete the last revision (?cmd=delRev) with one click and one confirmation click. This does not prevent spam but it makes it much easier to remove.
One place where administrator can remove a user, his password entry from .htpasswd, his user topic and all his edits in one easy operation. Again does not prevent spam but makes it easy to fight it which also makes it less interesting to spam in the first place.
URLs to domains that are not on positive list are padded with extra text similar to email addresses. You cannot click them. Search engines do not follow them. But a human can copy paste it if needed.
URLs are not included or padded with NOSPAM only to the TWikiGuest (and to Google). You have to log in and authenticate to see URLs and to be able to click on them. Google will not authenticate ever so adding spam becomes pointless.

I think there are even more ideas that could be added to the BlackListPlugin or to new antispam plugin. The blacklisting of IP addresses helps only for a few minutes. They come back with a new IP minutes later. The current blocking of saving patterns works well but we will always be behind new spammers and new URLs. So some more generic counter measures are needed.

-- KennethLavrsen - 18 Mar 2006

I think a key way of addressing this is to retrieve the web page referenced by a URL and then apply anti-spam rules, including Bayesian filtering, to the contents of the web page. This requires far less involvement from administrator than maintaining a blacklist of IP addresses, URLs or keywords. SpamAssassin has what I think is a very good approach, integrating keyword rules, Bayesian filtering, etc, and is Perl-based. There may also be something from the world of blogs that is re-usable.

Blocking posting of external URLs would stop people using TWiki normally, but a SpamAssassin type filtering of the pages referred to by URLs would probably be quite accurate.

The easy deletion of last revision for administrator group only is a good idea - not sure about Wikipedia but maybe they do this. Also, I like the ability to remove a user with a single operation. Makes it easier to clean up the cases not handled by automated approach.

-- RichardDonkin - 18 Mar 2006

Even a quick plugin to show an administrator all the outbound links would be useful.

-- MartinCleaver - 19 Mar 2006

Didn't IE have an option at one time to summarise all links? Or was that only for printout?

-- AntonAylward - 19 Mar 2006

I do not think additional measures that prevent Google from indexing topics will help since spammers do not read a "why it does not make sense to spam our site" note. They simply spam away with automated scripts in the hope that the spam does not get cleaned up on a few target sites.

Most of the spam happens by new users on the home page, and sometimes by TWikiGuest on an arbitrary topic.

I think the best additional spam defence is:

Easy way to remove a user and his/her traces
Quick way to share known spam signature

The shared list that we import is useful, but we had several cases where a new spammer attacked TWiki sites at almost the same time. The current alert via twiki-dev mailing list is not quick enough. Ideally we establish a shared spam signature list of participating TWiki sites, where admins can push out new spam signatures to participating sites. This should be in 10-15 min intervals.

-- PeterThoeny - 22 Mar 2006

SpamAssassin may cover quick sharing of spam signatures, or could at least provide a model though it's SMTP oriented - it uses several real-time blacklists (RBLs) as well as efforts such as CloudMark which enable end users to quickly share their 'this is spam' markings on emails. I think we should be tackling this alongside the blog developers who already have some solutions here - there are certainly plugin for MovableType and WordPress, for example.

By contributing a limited amount of refactoring or perhaps just a shim to enable a blog-oriented plugin to work with TWiki, we could avoid reinventing the wheel and benefit from a broader developer base for countering blog and wiki spam.

BlogSpam and WikiSpam are quite similar, and somewhat different to email spam:

Created by a web form served by web application - no SMTP involved
Main point of spam is to point to spam sites, not to directly encourage a sale like email spam. Hence there's often less useful text in the spam entry, and more in the actual site. Checks on linking, and oncontent of sites linked to, is more important than checks on spam entry itself.
Many sites tend to be hit at same time (wiki and blog) - so a shared 'this is spam' model as in CloudMark could help.

Let's at least spend a bit of time looking at what the blog community (and other wikis) have done about spam before we develop our own system. Google:blog+spam+plugin has quite a few candidate plugins.

-- RichardDonkin - 22 Mar 2006

BasicForm
TopicClassification	BrainstormingIdea
TopicSummary	Ways To Make Spamming Unattractive by not giving them the Google ranking increase they want
InterestedParties
RelatedTopics	BlackListPlugin

Topic revision: r6 - 2006-03-22 - RichardDonkin

Account
- Log In
- Register User

Edit
Attach

Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.