are not even sent to the browser.
TWiki Administrator has to approve new users before they can post URLs. Like getting a positive reputation before.
Saving a topic with external URLs not on a positive list goes into a "hold" position and requires an Administrator to approve the submission.
If you are the member of the administrator group you get an extra action button - maybe only in the "More Topic Actions" page - where you can delete the last revision (?cmd=delRev) with one click and one confirmation click. This does not prevent spam but it makes it much easier to remove.
One place where administrator can remove a user, his password entry from .htpasswd, his user topic and all his edits in one easy operation. Again does not prevent spam but makes it easy to fight it which also makes it less interesting to spam in the first place.
URLs to domains that are not on positive list are padded with extra text similar to email addresses. You cannot click them. Search engines do not follow them. But a human can copy paste it if needed.
URLs are not included or padded with NOSPAM only to the TWikiGuest (and to Google). You have to log in and authenticate to see URLs and to be able to click on them. Google will not authenticate ever so adding spam becomes pointless.
I think there are even more ideas that could be added to the
BlackListPlugin or to new antispam plugin.
The blacklisting of IP addresses helps only for a few minutes. They come back with a new IP minutes later. The current blocking of saving patterns works well but we will always be behind new spammers and new URLs. So some more generic counter measures are needed.
--
KennethLavrsen - 18 Mar 2006
I think a key way of addressing this is to retrieve the web page referenced by a URL and then apply anti-spam rules, including Bayesian filtering, to the contents of the web page. This requires far less involvement from administrator than maintaining a blacklist of IP addresses, URLs or keywords.
SpamAssassin has what I think is a very good approach, integrating keyword rules, Bayesian filtering, etc, and is Perl-based. There may also be something from the world of blogs that is re-usable.
Blocking posting of external URLs would stop people using TWiki normally, but a
SpamAssassin type filtering of the pages referred to by URLs would probably be quite accurate.
The easy deletion of last revision for administrator group only is a good idea - not sure about Wikipedia but maybe they do this. Also, I like the ability to remove a user with a single operation. Makes it easier to clean up the cases not handled by automated approach.
--
RichardDonkin - 18 Mar 2006
Even a quick plugin to show an administrator all the outbound links would be useful.
--
MartinCleaver - 19 Mar 2006
Didn't IE have an option at one time to summarise all links? Or was that only for printout?
--
AntonAylward - 19 Mar 2006
I do not think additional measures that prevent Google from indexing topics will help since spammers do not read a "why it does not make sense to spam our site" note. They simply spam away with automated scripts in the hope that the spam does not get cleaned up on a few target sites.
Most of the spam happens by new users on the home page, and sometimes by
TWikiGuest on an arbitrary topic.
I think the best additional spam defence is:
- Easy way to remove a user and his/her traces
- Quick way to share known spam signature
The shared list that we import is useful, but we had several cases where a new spammer attacked TWiki sites at almost the same time. The current alert via twiki-dev mailing list is not quick enough. Ideally we establish a shared spam signature list of participating TWiki sites, where admins can push out new spam signatures to participating sites. This should be in 10-15 min intervals.
--
PeterThoeny - 22 Mar 2006
SpamAssassin may cover quick sharing of spam signatures, or could at least provide a model though it's SMTP oriented - it uses several real-time blacklists (RBLs) as well as efforts such as
CloudMark which enable end users to quickly share their 'this is spam' markings on emails. I think we should be tackling this alongside the blog developers who
already have some solutions here - there are certainly plugin for
MovableType and
WordPress, for example.
By contributing a limited amount of refactoring or perhaps just a shim to enable a blog-oriented plugin to work with TWiki, we could avoid reinventing the wheel and benefit from a broader developer base for countering blog and wiki spam.
BlogSpam and
WikiSpam are quite similar, and somewhat different to email spam:
- Created by a web form served by web application - no SMTP involved
- Main point of spam is to point to spam sites, not to directly encourage a sale like email spam. Hence there's often less useful text in the spam entry, and more in the actual site. Checks on linking, and oncontent of sites linked to, is more important than checks on spam entry itself.
- Many sites tend to be hit at same time (wiki and blog) - so a shared 'this is spam' model as in CloudMark could help.
Let's at least spend a bit of time looking at what the blog community (and other wikis) have done about spam before we develop our own system.
Google:blog+spam+plugin
has quite a few candidate plugins.
--
RichardDonkin - 22 Mar 2006