Tags:
create new tag
, view all tags

Archive of BlackListPluginDev Discussions

We had again WikiSpam in the Main web, spam added to almost 100 pages by IP addresses GeoIP:203.88.152.253 and GeoIP:203.88.152.17. This Plugin is the answer to this spam.

Feel free to enhance the Plugin. A useful enhancement would be to add spammers automatically to the list based on access frequency. If more then, say, 10 pages are accessed in one minute, the IP address gets added to the BLACKLIST. We could add a WHITELIST for IP addresses that are excluded.

-- PeterThoeny - 21 Mar 2004

if someone is interested in implementing MeatBall:EditThrottling, i think it should be a separate plugin/addon of its own (but one that could work in conjunction with this plugin)

-- WillNorris - 21 Mar 2004

We just had another incident of WikiSpam with the same spam content. This time with IP address GeoIP:203.88.155.244 and GeoIP:203.88.155.135.

Will: The functionality could be combined in this Plugin since it is related. Additional functionality besides the BLACKLIST:

  • A ban-list that gets updated automatically based on event logs
  • A WHITELIST to exclude IP addresses from the ban-list check

Pseudo code at topic init time for a combined blacklist/whitelist/banlist functionality to keep out badly behaving spiders and humans:

  • if on blacklist or banlist:
    • sleep for one minute
    • output warning message
  • else unless on whitelist:
    • read event log
    • remove expired events
    • add this event (content: timestamp, IP-address, action)
    • save event log
    • for all events with this IP address:
      • calculate score:
        • for each save and upload add 10 points
        • for registration add 20 points
        • for other activity add 1 point
      • if score over limit:
        • add IP address to banlist
        • sleep for one minute
        • output warning message
      • end if
    • end for
  • end if

-- PeterThoeny - 22 Mar 2004

Could a mass renaming trigger blacklisting?

  • It would not since the Plugin gets active only once at init time of a CGI script -- PTh - 23 Mar 2004

-- SamHasler - 23 Mar 2004

I am working on an enhancement based on above spec.

-- PeterThoeny - 23 Mar 2004

Turned out that SourceForge's load-balanced web servers did not have synchronized times with differences up to 7 minutes. This confuses the Plugin (still work in progress). In the mean time SF has fixed this issue.

-- PeterThoeny - 01 Apr 2004

Since IP subnets can now be on any bit boundary in a 32-bit IPv4 address, I think we need to allow the ban list and white list to use addresses with 'CIDR' prefixes (commonly used in networking and most modern OSes. see Google:CIDR+IP+address and specifically this O'Reilly book excerpt), e.g. '1.2.3.0/24' is the same as the current '1.2.3.', while '1.2.3.0/30' only covers four IP addresses, from 1.2.3.0 to 1.2.3.3.

Alternatively, just using ranges of IP addresses might be better - easier to use and certainly easier to debug.

Hopefully there won't be any IPv6 WikiSpam for some time to come, but that would be good to support at some point.

-- RichardDonkin - 07 Apr 2004

This plugin still needs some work. A couple of weeks ago core team member SvenDowideit blacklisted himself by experimenting with UsingTopicToDefineCSS and yesterday the active developer JohnCoe was blacklisted while developing CoEdit.

I'm at a loss for suggestions for fixing the plugin, but put forward that penalising a couple of the most active developers we have at present is not a good idea. smile

-- MattWilkie - 24 May 2004

As mentioned before, CoreTeam members can add active people to the whitelist. Simply contact one of us and tell your IP address(es) or address range. JohnCoe is now on the whitelist.

-- PeterThoeny - 25 May 2004

perhaps TWikiCommunityGroup could be added to ALLOWTOPICCHANGE on BlackListPlugin ? ( please smile )

-- WillNorris - 27 Aug 2004

Thanks for asking. What issue do you have?

I'd rather not open the Plugin to all community members because I prefer to know what is going on at TWiki.org if there is a potential issue. I get an e-mail once in a while with the request to remove an IP address, which I resolve quickly.

-- PeterThoeny - 28 Aug 2004

Slogger (an addon to Mozilla and Firefox) generates a bunch of extra traffic and may make innocent users go over the top. Maybe we should have the following as an error message:

      * Set BLACKLISTMESSAGE = You are black listed at the %WIKITOOLNAME% web site due
to excessive access or suspicious activities. Please contact site administrator
%WIKIWEBMASTER% if you got on the list by mistake. If you are using web logging software
such as Slogger, make sure you block %WIKITOOLNAME%. Black listed IP addresses will be
submitted to major blacklist databases.

-- RichBlinne - 12 Dec 2004

Enhancement request from RawParamLeaksEmailAddresses: Add feature to bump up score by a higher amount of a view with raw parameter is encountered. This is to protect from e-mail harvesters and site scrapers.

-- PeterThoeny - 19 Jan 2005

I just added this functionality on TWiki.org. I will post the new Plugin after a day or so. Please report any problems here.

-- PeterThoeny - 19 Jan 2005

New Plugin version with "raw" score feature posted on Plugin topic.

-- PeterThoeny - 19 Jan 2005

Something is happening at my site that confuses me.

I've manually added my home site to to the WHITELIST. That's fine.

I've manually added the list at WikiSpammersOnTWikiSites to the BLACKLIST, as suggested in that topic. The text at BlackListPlugin says that this is a manually maintained list.

When I do that I suddenly get all those addresses added to the BANLIST, which is a bit confusing.

But even more confusing, they appear in the BALIST configuration:

    * BANLIST configuration, comma delimited list of: Points for registration, points for each save and upload, 
      points for other actions like view, threshold to add to BANLIST, measured over time (in seconds)
       o Set BANLISTCONFIG = 30, 5, 1, 150, 300
       o Your current score: 203.88.152., 203.88.155., 218.87.226.118, 219.65.75., 221.217.48.121, 221.237.3.67,
         221.237.5.169, 222.183.118.77 for IP address <my home IP address>

That doens't seem right. What's happening?

  • Anton, this looks very odd. I glanced over the code, but it was not obvious why your installation would behave like this. The Your current score line should just show something like: 12 for IP address 12.34.56.78 as you can verify in TWiki.org's BlackListPlugin. The BANLIST should not repeat the BLACKLIST. Check your Plugin's settings, is BANLISTCONFIG a real bullet, is the line immediately below that a real bullet? Otherwise it is time for some TWikiDebugging -- PeterThoeny - 06 Apr 2005

-- AntonAylward - 03 Apr 2005

Here's a patch that:

  • allows to add/remove multiple IPs at once (like copy-paste the banlist from BlackListPlugin into the add-form)
  • fixes being black-sheep-ed directly if the blacklist is empty

Remaining issues: the plugin reads and writes the banlist without applying for a flock()

-- MichaelDaum - 05 Apr 2005

Thanks Michael, I will take that into the next release.

MichaelDaum also suggested by e-mail to protect the pub dir from web access. Create a pub/TWiki/BlackListPlugin/.htaccss file with thi content:

<Files "*">
      deny from all
</Files>

This will also go into the next release.

-- PeterThoeny - 06 Apr 2005

I've done a nicer user interface to the BlackListPlugin: see here.

-- MichaelDaum - 01 Jul 2005

Neat Anti spam trick (perl code)

This code fragment, somewhere early in TWiki.pm, will make use of the RBL database.

$spammer_ip = $_SERVER['REMOTE_ADDR'];
list($a, $b, $c, $d) = split('.', $spammer_ip);
if( gethostbyname("$d.$c.$b.$a.list.dsbl.org") != 
"$d.$c.$b.$a.list.dsbl.org") {
  header( "Location: http://dsbl.org/listing?".$spammer_ip);
  return false;
}

The upside
it uses a RBL database so you don't have to
The downside
its someone elses database
The downside
its for mail spam.

but hey ...

-- AntonAylward - 14 Jul 2005

checked .zip into CVS

-- WillNorris - 19 Jul 2005

Sometimes an authorized editor becomes blocked when s/he edits a lot of topics. It would be nice if the plugin allows always edits from a group of authorized users (e.g. the TWikiAdminGroup).

-- AndreaSterbini - 24 Aug 2005

The idea is to use the WHITELIST for that, but I see your point, it would be more convenient to specify a TWiki group.

-- PeterThoeny - 24 Aug 2005

A look behind the scenses. For performance I am using two cache files (located in pub/TWiki/BlackListPlugin) to merge the external and internal lists:

  • _spam_list.txt is the internal spam regex list (manually maintained by add/remove spam regex pattern form in Plugin topic)
  • _spam_merge.txt cache: Cached text of external spam regex list, pulled from external web site, refreshed once every 60 min (by default) on topic save acticity
  • _spam_regex.txt cache: Internal and external list merged into one, formatted as a ready to use regex string, refreshed once every 10 min (by default) on topic save acticity.

I notied that the external spam regex source http://arch.thinkmo.de/cgi-bin/spam-merge sometimes returns an empty list. If this happens, the Plugin disregards it, uses the old cache, and retries again after the refresh period. If this happens _spam_merge_err.txt contains the failed content.

Possible enhancements:

  • Automatically remove items from internal list of found in external list
  • Automatic (semi-automatic) way of feeding local regex list back to external list (what protocol?)

Spammers work on many wiki sites, one after the other. Ideally a two way sync should be done in 15 minute intervals to catches active spammers in the tracks.

-- PeterThoeny - 29 Oct 2005

Peter, I'm seeing the same bug I was back in Cairo in 03/April and this is not only in the 'score' but also in the local spam list pattern list - once again the ban list IP addresses aer appearing.

My thoughts are that this is not a rendering problem with Dakar sicne I saw it in Cairo as well, but some sensitivity in the code that makes it stick with one file in /pub.

I also get this in my error log:


[Sat Oct 29 21:09:23 2005] [error] [client 192.168.254.15] [Sat Oct 29 21:09:22 2005] view: Useless use of a constant in void context at /var/www/html/dakar/lib/TWiki/Plugins/BlackListPlugin.pm line 237.
[Sat Oct 29 21:09:23 2005] [error] [client 192.168.254.15] [Sat Oct 29 21:09:22 2005] view: "my" variable $refresh masks earlier declaration in same scope at /var/www/html/dakar/lib/TWiki/Plugins/BlackListPlugin.pm line 264.
[Sat Oct 29 21:09:23 2005] [error] [client 192.168.254.15] [Sat Oct 29 21:09:22 2005] view: Useless use of a constant in void context at /var/www/html/dakar/lib/TWiki/Plugins/BlackListPlugin.pm line 265.

-- AntonAylward - 30 Oct 2005

The three errors can be fixed easily, thanks for the heads up.

In regards to render issue, follow-up in Bugs:Item797. Was possibly also in a post Cairo Beta release that had the preferences module refactored.

-- PeterThoeny - 30 Oct 2005

I'm getting the following message whenever I save any page: Wiki-spam detected: "" is a banned word and cannot be saved... This is occuring even though my IP is in the WHITELIST.

We are running version 04 Sep 2004 $Rev: 1742 $, Plugin API version 1.025, with today's BlackListPlugin.

Any idea where to start debugging this? (Turning DEBUG on didn't help much.) Comparing the pages with http://arch.thinkmo.de/cgi-bin/spam-merge is an onerous task, but I didn't see any matches.

-- BruceDawson - 07 Nov 2005

It looks like the regex matches an empty string. I checked the code, all newlines and line feeds are filtered correctly, so this should not happen. What do you have in the local SPAMLIST? Cache files _spam_merge.txt and _spam_list.txt in the plugin attachment directory should contain lines of regexes; _spam_regex.txt should start with http://[\w\.\-:\@/]*?, followed by all regexes separated by pipe symbols.

-- PeterThoeny - 07 Nov 2005

I saw the same symptom BruceDawson reports when the pub/TWiki/BlackListPlugin/ directory was not writable by CGI scripts.

-- RonRisley - 07 Nov 2005

# ll -d pub/TWiki/BlackListPlugin/
drwxrwsr-x    2 apache   gnhlug.o     4096 Nov  6 21:56 pub/TWiki/BlackListPlugin/

SPAMLIST is at http://wiki.gnhlug.org/twiki2/bin/view/TWiki/BlackListPlugin

_spam_regex.txt has "http://[\w\.\-:\@/]*?()", but there is no _spam_list.txt (there are _spam_merge.txt (empty), _spam_merge_err.txt (2 comment lines), _ban_list.txt (which has about 12 IP addresses in it), _magic.txt (which has 2 numbers in it; the rightmost looking like a date), and _event_log.txt (has 3 lines with date, IP address, and URIs to bin/search, null string, and bin/view)

-- BruceDawson - 07 Nov 2005

Immediate workaround to get you going with topic save: Add one entry in the local SPAMLIST, e.g.: johan-gauss\.org

An empty _spam_merge.txt is an indication that the external spam list could not be read. However, a _spam_merge_err.txt with two comment lines indicates that reading the external page succeeds. Try to add some debug statements in _handleSpamList to see what is going on.

-- PeterThoeny - 07 Nov 2005

I noticed the following behaviors inconsistent with the documentation:

  • Every page view added (using the default configuration) 1 point to the ban list computation, while the setting seems to indicate that it should add 20 (setting is 20, 5, 1, 20, 120, 300 ). The documentation seems to say that the fourth item is the "Points for other actions like view".
  • Under the bullet BANLIST there is a form to add IP addresses manually. However, when entering a partial address as explained in the comment under BLACKLIST just above, this does not work (maybe BANLIST does not take partial IP?).

-- ThomasWeigert - 08 Nov 2005

  • BANLISTCONFIG setting: Documentation error, view and view raw need to be reversed.
  • BANLIST: The BANLIST supports only full IP addresses; only WHITELIST and BLACKLIST support partial IP addresses (as documented)

-- PeterThoeny - 08 Nov 2005

Adding an entry to SPAMLIST appeared to fix the problem. I didn't bother to add debug statements in _handleSpamList. Thanks!

-- BruceDawson - 08 Nov 2005

It fixes the symptom not the problem. Your site is not protected against the known list of spam sites until _spam_merge.txt contains the a few thousand lines of regular expressions.

-- PeterThoeny - 08 Nov 2005

Small problem in actual use.

First I want to say that this plugin is working great. It has really helped a lot the past days.

I just got caught in my own BlackListPlugin for the 2nd time. There is a small bug. Even when you are on the whitelist you get added to the banned list. But you are not excluded. You cannot save a topic with banned links so that is good.

Turned out that my Sandbox WebHome had been spammed and I had not noticed one of those spams that are invisible between <div> tags. So when people install BlackListPlugin you can actually risk catching an innocent user that edits a topic that has previously been spammed.

The spam can have been added by Mr Bad Guy the day before the new reg ex was added to the SPAMLIST. And then Mr. Innocent comes to edit a topic and does not notice the illegal link and gets banned right away. That is a sad side effect. I wonder how this could be avoided. You could also catch someone saving something innocent that happens to match a reg ex.

I do not have a clear solution to this. We need to fight spam. And we need to treat the users of our TWikis nicely. A dilemma.

-- KennethLavrsen - 10 Nov 2005

This is an issue that can't be solved easily. It can be minimized if the site admin is proactively removing spam. That is actually the best defense against spam since spammers google the internet for existing spam, knowing that those wikisites are easy target.

-- PeterThoeny - 12 Nov 2005

Topic save failed on TWiki.org due to a regex pattern error in the external merge spam list. Faulty entry:

texas-hold-em-(4u|555|winner).(com # 2005-11-24:LOC

Save failed with a Unmatched ( in regex error. As a workaround I manually removed the line from the cache file and set the cache refresh to one week.

-- PeterThoeny - 24 Nov 2005

The external spam list is now fixed. We have a dependency now on external content: TWiki sites using the BlackListPlugin can fail if the external spam list contains a regex error. Not sure how to guard against this.

-- PeterThoeny - 24 Nov 2005

I am having the following problem with BlackListPlugin: It appears that every user coming in as TWikiGuest receives the ban message. However, the BANLIST only shows three IP addresses which from the log have questionable behavior, and none of these matches any of the rejected users (and neither does the BLACKLIST). The log shows many entries where the event is "blacklist" for TWikiGuest. However, the banlogs etc. in pub for BlackListPlugin show no activity.

What could be going wrong here?

-- ThomasWeigert - 27 Nov 2005

Not sure what is going on. Try to enable DEBUG and watch debug.txt. The processing happes in initPlugin, check what happens with $remoteAddr.

-- PeterThoeny - 27 Nov 2005

BlackListPlugin catches innocent users because </pre> is in the spam list.

It seems the BlackListPlugin is now catching innocent people that just open an existing topic and saves it.

I have tracked the issue to people saving a topic in which </pre> is present.

When I go to the http://arch.thinkmo.de/cgi-bin/spam-merge some idiot has added </pre> to the list.

Two questions:

  • How can we make a quick fix to avoid this?
  • How do I contact the owner of this list? There is no info at all who to wrote to.

-- KennethLavrsen - 25 Dec 2005

The latest Plugin should already filter out HTML tags. The _getSpamListRegex function does the following because I noticed HTML in the merge list:

    # merge public and local spam list
    my $text = _getSpamMergeText() . "\n" . _handleSpamList( "read", "" );
    $text =~ s/<[^>]*//go;      # strip <tags>
    $text =~ s/ *\#.*//go;      # strip comments

It looks like your users face a different issue. Possibly saving a topic that already had bad text before upgrading the Plugin?

Not sure who the owner of the merge list is. Try MoinMoin:AntiSpamGlobalSolution, possibly maintained by MoinMoin:ThomasWaldmann.

-- PeterThoeny - 25 Dec 2005

Since my last posting another 3 people have been backlisted without reason.

I use the latest version of the plugin. (November 8th on Cairo).

People get blacklisted because they save a topic with something as simple as

http://myname.dhsfs.com   >

I have put 3 spaces between the com and the > to avoid getting blacklisted from twiki.org.

Tried to delete the data files in pub/TWiki/BlackListPlugin and let it create new files.

The _spam_merge.txt contains the line

</pre> # 2005-11-12:WM

and this becomes |>| in _spam_regex.txt

The minutes I delete the two the plugin stops blacklisting these innocent entries. But soon the files are overwritten and hell is loose again. I believe I know why now.

Looking at your regex

$text =~ s/<[^>]*//go;      # strip <tags>

Doesn't this exactly leave behind the >?

Shouldn't it be

$text =~ s/<[^>]*>//go;      # strip <tags>
or something similar.

I implemented this fix and now I no longer get the > in _spam_regex.txt

-- KennethLavrsen - 03 Jan 2006

Thanks Kenneth, indeed there is a bug in the regex. I examined the list and discovered also entries with spaces such as CAPAZ MESMO, ISTO E....

New Plugin version posted in BlackListPlugin topic:

  • Filter lines with space from spam list
  • Fixed bug that inproperly filtered HTML from spam list
  • Added Crawford's fix for end/postRenderingHandler spec change issue of Dakar Release

Change:

     # merge public and local spam list
     my $text = _getSpamMergeText() . "\n" . _handleSpamList( "read", "" );
-    $text =~ s/<[^>]*//go;      # strip <tags>
     $text =~ s/ *\#.*//go;      # strip comments
+    $text =~ s/.*?[ <>].*//go;  # remove all lines that have spaces or HTML <tags>
     $text =~ s/^[\n\r]+//os;
     $text =~ s/[\n\r]+$//os;
     $text =~ s/[\n\r]+/\|/gos;  # build regex

-- PeterThoeny - 03 Jan 2006

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r2 - 2006-08-05 - PeterThoeny
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.