create new tag
, view all tags
There are ways to keep sites or pages from being crawled by search engine indexing robots.




Last time I paid attention to which WikiLearn pages were being indexed by Google, I found a lot of duplicates because Google was indexing pages like WebChanges and other pages with dynamic searches.

Because a robots.txt file must be used at the top level of a site (like http://twiki.org/robots.txt) and the WikiLearn site is currently hosted at twiki.org along with several other webs, I can't use the robots.txt approach.

The alternate, which doesn't work for all robots, is the robots meta tag:

from HTML Author's Guide to the Robots META tag:

<meta name="robots" content="noindex,nofollow">
<meta name="description" content="This page ....">

I may try this at the top of the content of some WikiLearn topics to see if it will work even though it will not at the required/recommended location (in the head).

See the last link under resources (Hack #100) for "noarchive" and other possibilities.

Page being tested:

  • RobotsTest
  • WebChanges (added 7 May 2003)


See Resource Recommendations. Feel free to add additional resources to these lists, but please follow the guidelines on ResourceRecommendations including Guidelines_for_Rating_Resources.



  • () RandyKramer - 15 Apr 2003
  • If you edit this page: add your name here; move this to the next line; and include your comment marker (initials), if you have created one, in parenthesis before your WikiName.

Page Ratings

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r2 - 2003-05-07 - RandyKramer
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by PerlCopyright 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding WikiLearn? WebBottomBar">Send feedback
See TWiki's New Look