Tags:
create new tag
view all tags

Bug: Email addresses can be harvested by spammers

When viewing topics with the 'raw' modifier, email addresses are returned unobscured.

Test case

me@examplePLEASENOSPAM.com

The above email address is obscured on the original page at

http://twiki.org/cgi-bin/view/Codev/RawParamLeaksEmailAddresses

but available in clear text at

http://twiki.org/cgi-bin/view/Codev/RawParamLeaksEmailAddresses?raw=on

This allows email harvesting of otherwise obscured email addresses.

Possible counter measures:

  • robots.txt (possible?) - usually not honored by spammers
  • restrict read access to URLs using the "raw=on" URL parameter on the Twiki level to unauthorized users (possible?)
  • restrict read/any access to URLs using the "raw=on" URL parameter on the Twiki level to everybody (possible?)
  • restrict any access to URLs containing "raw=on" on .htaccess layer (should work by means of rewrite rules)
  • restrict any access to URLs containing "raw=on" on httpd.conf layer (should work by means of rewrite rules)
  • add a captcha type thing for raw=on urls
  • any other ideas?

The goal should be to be as unrestrictive as possible and to keep the functionality of the "raw" option available for as many users as possible while still making original email addresses unavailable to external and internal indexing mechanisms.

This should probably be considered a problem with the default TWiki configuration and the installation instructions, not a bug in the code itself.

Environment

TWiki version: TWikiBetaRelease2004x10x30
TWiki plugins: DefaultPlugin, EmptyPlugin, InterwikiPlugin
Server OS: Debian GNU/Linux
Web server: Apache 2.0.52
Perl version: 5.8.4
Client OS: Debian GNU/Linux
Web Browser: Firefox 1.0

-- AlsterWassermann - 13 Jan 2005

Follow up

Agreed, it should be thought of as a config/documentation problem. I would argue strongly against munging the text produced by "raw", as it is critical to TWikiApplications that rip that text for external processing.

-- CrawfordCurrie - 13 Jan 2005

I'd favour an ALLOWTOPICREADRAW (or just ALLOWTOPICRAW) - not only for email leakage protection but also because there are some sites that consider their TWikiApplications to be intellectual capital: enabling SEARCH/FORMs etc to be protected from being ripped off by looking at the source could be key to them more adoption of TWiki as a solution.

-- MartinCleaver - 13 Jan 2005

How about just making the raw option require an authenticated user (similar to Martin's option, but his idea is more controllable - would require authenticated view script). Or perhaps a separate viewraw script for ease of .htaccess setup, so only viewraw needs authentication.

-- RichardDonkin - 13 Jan 2005

I would be irritated by someone understanding TWikiApplications as intellectual property unless public property is meant. Check LicensingAndCopyrightFAQ

-- AlsterWassermann - 18 Jan 2005

I don't understand why this would not be classified as a bug and just a configuration issue. How can I right now and without mod_rewrite restrict access to anonymous raw views?

-- MattWilkie - 19 Jan 2005

An alternative is to enhance the BlackListPlugin to watch out for raw parameters and bump up the score by a bigger value. That will catch a harvester quickly.

-- PeterThoeny - 19 Jan 2005

This is now implemented on TWiki.org via BlackListPlugin. Each regular view increases the score by one point, each "raw" view by 20 points.

There is a potential "gotcha". People could get on the blacklist by looking at several pages in raw mode quickly. There is a whitelist for the contributors. As before, please contact me or one of the CoreTeam members with your IP address, we can put you on the whilelist.

Let us know if there are any issues.

-- PeterThoeny - 19 Jan 2005

As AntonAylward noted earlier this cries for the recognition that currently twiki TWikiGuest conflates two different classes of users: anonymous but logged in users and not logged in. If we create a new NotLoggedIn user then we can do things like DENYCOMMENT = NotLoggedIn, DENYWEBVIEW = NotLoggedIn, and the like. This would also provide and easy way to prevent access to raw page views and spidering of old revision pages etc, etc.

-- MattWilkie - 19 Jan 2005

Although the BlackListPlugin may stop harvesting from the site directly there is still the google cache. It is possable to create a search in google for raw user topics and view them in googles cache. Therefore we also need SearchEngineIndexOnlyPlainView to stop google and other search engines from indexing and caching raw pages.

-- SamHasler - 25 Jan 2005

Fixed on my install by restricting these urls to authentificated users with this RewriteRule :

RewriteCond %{QUERY_STRING} raw=                                                                                       
RewriteRule ^/view/(.*) /viewauth/$1 [R]

-- BenVoui - 23 Feb 2005

Fix record

Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r14 - 2005-02-23 - BenVoui
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.