Tags:
archive_me1Add my vote for this tag create new tag
, view all tags

Documentation Proposal: Dealing With Robots

TWikiAdminCookBook has a section on how to deal with robots

I don't know if this is the kind of documenatation you guys can use or not, but if it is, please feel free to take it and do whatever you want. Or email me and let me know: pauljohn AT ku DOT edu.

-- PaulJohnson - 11 Nov 2004

what about creating a robots.txt based on the topics names in the wiki? (for which pages are allowed)

-- WillNorris - 11 Nov 2004

Isn't also possible to identify robots from the way the identify themselves to the webserver and present different skins to them? I am not sure if TWiki has the capabilities, but it should be possible to use a different skin for Internet Explorer, Mozilla, Opera and one for bots. The bots could be presented with only the plain topic text and no side bars.

-- ChristopherOezbek - 27 Jan 2005

PreventGoogleToIndexRevisions has an example robots.txt for blocking access to actions such as edit/attach/diff etc.

SearchEngineIndexOnlyPlainView should solve most of the other problems mentioned above when it is implemented.

-- SamHasler - 01 Feb 2005

Even though the skins may have a <meta name="robot" ... statement in them, it is edited out in View.pm for all except older revisions.

This makes no sense to me. What is the point of making the site unconditionally indexable?

-- AntonAylward - 17 Jul 2005

Not so. In CairoRelease it is never edited out. In DevelopBranch it is edited out only if you have enabled {AntiSpam}{RobotsAreWelcome} in configure.

-- CrawfordCurrie - 18 Jul 2005

My audience asked me to allow our corporate intranet search engine to be explicitly allowed on my TWiki. In addition to a robots.txt file, I seem to have found an efficient approximation. I have replaced the meta element which usually excludes robots by the following conditional:

%IF{ "$ QUERYSTRING" then="<meta name='robots' content='noindex, nofollow' />"}%

Unlike robots.txt, this does not prevent the spider from visiting the pages. However, it works fine against indexing either of:

  1. old revisions
  2. non-default skin or cover (like "printable")
  3. sortcol manifolds in pages containing sortable/editable tables
  4. ...or any combination of the above.

-- HaraldJoerg - 20 Mar 2006

This should not be necessary in TWiki 4.0 since the robots noindex metatag is already present if there is a query string. Technically, the skin's view templates have a robots noindex metatag that gets removed by twiki/lib/UI/View.pm if there is a query string:

    if( $indexableView &&
          $TWiki::cfg{AntiSpam}{RobotsAreWelcome} &&
            !$query->param() ) {
        # it's an indexable view type, there are no parameters
        # on the url, and robots are welcome. Remove the NOINDEX meta tag
        $tmpl =~ s/<meta name="robots"[^>]*>//goi;
    }

-- PeterThoeny - 22 Mar 2006

My fault. I had "slightly" changed the meta element in my custom skin, so that the regex never removed the element, even with RobotsAreWelcome set to $TRUE. Eventually I removed the meta element and, as a consequence, had to compensate with what I quoted. Now, with the correct meta element in place, everything works fine.

All that cool stuff how to control robots (with /bin/configure, robots.txt, httpd.conf) would make a nice HowTo recipe. Let's see when (or whether) I get round to it...

-- HaraldJoerg - 22 Mar 2006

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r8 - 2006-03-22 - HaraldJoerg
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.