Tags:
create new tag
, view all tags
This ongoing testing is an attempt to learn how often Google indexes TWiki. I also did some (one time) testing to determine how useful the results of a search are. The overall objective is to decide how useful Google is for searching TWiki (Wikilearn).

Update/ToDo: After the next updates of either index (try mid-May, 2003), I want to see if the RobotsTest page has been indexed. (I've added a RobotExclusion metatag at the top of the TWiki content and want to see if it is effective from that location.)

So far there are two problems:

  • The last index by Google occurred almost two months ago (as of 11/27/01)
  • Google indexes more pages than I'd like, including the (very) dynamic ones like WebChanges. Although indexing this particular page is helpful in my current effort (to find out how often Google indexes TWiki), it is usually not a useful search result, it just adds clutter. I might be able to reduce some of this clutter by modifying templates to prevent robots from indexing these types of pages. (More discussion buried below -- see #Some_Results_of_a_Successful_Search_on_20011115.)

Update: There be wierdness here! Two anomalous results:

  • Yesterday (27 Nov 01), my little check on WebChanges showed the last cached copy was dated 4 Oct 01 -- today it shows 13 Nov 01 -- is there some "latency"?? Am I doing something wrong in this test?
  • Also today (28 Nov 01), I checked the WebChanges last cached copy. Although there are not necessarily revisions every day, this shows the last cached copy was taken sometime on 04 Nov 01 or later. (Well, probably 04 Nov 01 -- as the WebStatistics are updated every day.)
  • And trying the WebChanges test on twiki.org and twiki.sourceforge.net gives still different results -- 13 Nov 01. -- Oh, wait, that makes some sense -- they're two different domains (?) so it's logical that they are indexed at two different times. That also explains why I got distinctly different search results from the two domains -- probably more pages at the time in one index than the other. So, I may get more up-to-date results at any time on one site rather than the other. Interesting, I'll have to use to my advantage. So, for purposes of this test, I'll have to have two test queries, one for sourceforge, one for twiki.org. But, wait, see next item:
  • But, I'm getting more and more confused -- searching "Wikilearn WideTextTestPage site:twiki.org" finds it (in the domain indexed on 13 Nov 01), but on twiki.sourceforge.net does not find it (in the domain indexed on 4 Nov 01). The page has been there since 24 Oct 01. What??? Interchanged some dates, not quite as surprising, but still, surprising!
  • Ok, just a little less wierdness on my part (but maybe more on Google's part). Wikilearn on twiki.org was last indexed on 13 Nov 01, on twiki.sourceforge.net was last indexed on 4 Nov 01. Codev is just the opposite, that is Codev on twiki.org was last indexed on 4 Nov 01, on twiki.sourceforge.net was last indexed on 13 Nov 01

I'll try again tomorrow. (Ok, I just did -- last bullet was from 30 Nov 01).

See AboutThesePages.

Contents:

Record of Successful and Unsuccessful Searches and Dates

Here is an edited list of pages (some deleted) that existed on 11/15/01. I have added notes to indicate when I first successfully found the page using Google, and also a list of dates when I searched Google for these pages.

Dates of Google searches:

  • 11/15/01
  • 11/16, 17, 20, 21, 23, 26, 27, 28/01
  • <Fell asleep at the wheel!>

The last index by Google of Wikilearn as of 11/23/01 is 10/04/01.

Note: A way to determine the approximate date of the last indexing of Wikilearn by Google is to search for "Codev WebChanges site:twiki.org" and then view the cached page. (Thanks to AndrewDalgleish.) (I'd make the previous link direct to the cached copy, but I'm not absolutely sure that it changes at the next index -- will test at the next index.) The date is approximate because:

  • It depends on whether there have been changes every day (major cause of uncertainty) (I switched the search from Wikilearn to Codev because changes almost always occur daily on Codev.)
  • It is not clear that all of Wikilearn is indexed at the same time (though I would suspect it is, thus I see this as a minor cause of uncertainty)

A better page to check is the cached copy of WebStatistics.

Here is the current Google link to the 4 Oct 01 cached copy, for future comparison: http://www.google.com/search?num=100&hl=en&q=Codev+WebChanges+site%3Atwiki.org

Here is the current (28 Nov 01) Google link to the 5 Nov 01 cached copy, for future comparison: http://www.google.com/search?q=cache:qqmlo6EYn1U:twiki.org/cgi-bin/view/Codev/WebChanges+Codev+WebChanges+site:twiki.org&hl=en

Note they are clearly different, so different as to make me think I copied the wrong thing. I'll have to see what happens the next time the site is indexed.

Test query for twiki.org: Codev WebChanges site:twiki.org, and then view the cached page.

Dates below are approximate, and I might have missed some.

Dates of Google indexing of Wikilearn (i.e., twiki.org):

  • 04 Oct 01
  • 13 Nov 01 (recognized on 28 Nov 01, not on 27 Nov 01 -- not sure why the delay)
  • 15 Dec 01
  • 02 Jan 02
  • 04 Feb 02
  • ... <not tested for a while>
  • 12 Apr 03 I must have been mistake about this — checking a few days later, the latest page indexed seems to be around 12 Mar 03.

Test query for twiki.sourceforge.net: Codev WebChanges site:twiki.sourceforge.net, and then view the cached page.

Dates of Google indexing of Wikilearn (i.e., twiki.sourceforge.net):

  • 04 Nov 01 (discovered 28 Nov 01, but not effectively tested before then)
  • 15 Dec 01
  • 28 Jan 02
  • ... <not tested for a while>
  • 11 Mar 03

Comparison of hits:

Codev WebChanges

Date twiki.org twiki.sourceforge.net
29 Nov 01 234 118

Test query for twiki.sourceforge.net: Wikilearn WideTextTestPage site:twiki.sourceforge.net.

Test query for twiki.org: Wikilearn WideTextTestPage site:twiki.org.

WikiLearn WideTextTestPage

Date twiki.org twiki.sourceforge.net
29 Nov 01 12 6

Some Results of a Successful Search on 20011115

A search on "Wikilearn DataStorageForMultiLevelWikiWebs site:twiki.org" showed 11 hits (16 total with 5 suppressed because very close to duplicate).

Presumably, if I had searched on twiki.sourceforge.net and twiki.org, it would have found twice this number. (Tested -- showed 4 (22 total with 18 suppressed) -- interesting, but I'm not going to analyze those results.) (I have to recheck the second point -- don't recall how I tested, I don't know that I could have had two "site:" clauses in my query, so quite possibly I should have added these results to the previous 11(=16-5) results.)

DataStorageForMultiLevelWikiWebs was the most recent of the WikiLearn pages that I could find on 20011115.

I copied the 16 hits here in order to analyze them to some extent, and then keep notes.

Darn, the second page was a cached search page -- if it had shown the date the page was cached (or the date the search was performed), it would have told me when Google crawled TWiki. Darn! (Ok, see above for a better approach using the cached WebChanges page.)

Edited results of search (all 16 listed, the five excluded in the 11 are denoted "(Excluded):".

TWiki . Wikilearn . DataStorageForMultiLevelWikiWebs <br />
twiki.org/cgi-bin/view/Wikilearn/DataStorageForMultiLevelWikiWebs <br />
This is the (only) page I really wanted to find (I think).

TWiki . Wikilearn . WebChanges 
twiki.org/cgi-bin/view/Wikilearn/WebChanges 
No point in finding this page.

TWiki . Wikilearn . WebChanges 
twiki.org/cgi-bin/view/Wikilearn/WebChanges?skin=print 
This is even less desirable -- same as above with a different skin!

(Excluded): TWiki . Wikilearn . DataStorageForMultiLevelWikiWebs3 
twiki.org/cgi-bin/view/Wikilearn/DataStorageForMultiLevelWikiWebs3 
This is OK, page probably existed at the time Google indexed TWiki, but has since been deleted.  I wonder if I can find the date it was deleted?

(Excluded): TWiki . Wikilearn . DataStorageForMultiLevelWikiWebs1
twiki.org/cgi-bin/view/Wikilearn/DataStorageForMultiLevelWikiWebs1 
This is OK, page probably existed at the time Google indexed TWiki, but has since been deleted.  I wonder if I can find the date it was deleted?

(Excluded): TWiki . Wikilearn . DataStorageForMultiLevelWikiWebs2
twiki.org/cgi-bin/view/Wikilearn/DataStorageForMultiLevelWikiWebs2 
This is OK, page probably existed at the time Google indexed TWiki, but has since been deleted.  I wonder if I can find the date it was deleted?

TWiki . Wikilearn . WebIndex
twiki.org/cgi-bin/view/Wikilearn/WebIndex
No point in seeing this! 

(Excluded): TWiki . Wikilearn . RandyKramersPagesOnTest
twiki.org/cgi-bin/view/Wikilearn/RandyKramersPagesOnTest 
This is the case of a page with a search field included.  Not sure I can prevent these in the general case.

TWiki . Wikilearn . DataStorageForMultiLevelWikiWebs1
twiki.org/cgi-bin/view/Wikilearn/DataStorageForMultiLevelWikiWebs1?skin=print 
Another print skin!  That's interesting, the ones without the print skin were among those excluded -- this (these) were left in???

TWiki . Wikilearn . DataStorageForMultiLevelWikiWebs3
skin=print 
Oops, deleted too much, but another print skin!  That's interesting, the ones without the print skin were among those excluded -- this (these) were left in???

TWiki . Wikilearn . DataStorageForMultiLevelWikiWebs2
twiki.org/cgi-bin/view/Wikilearn/DataStorageForMultiLevelWikiWebs2?skin=print 
Another print skin!  That's interesting, the ones without the print skin were among those excluded -- this (these) were left in???

TWiki . Wikilearn . WebStatistics
twiki.org/cgi-bin/view/Wikilearn/WebStatistics 
No point in finding this (normally, at least).

(Excluded): TWiki . Wikilearn . WebIndex
twiki.org/cgi-bin/view/Wikilearn/WebIndex?skin=print 
Another print skin!  That's interesting, the ones without the print skin were among those excluded -- this (these) were left in???

TWiki . Wikilearn . WebStatistics
twiki.org/cgi-bin/view/Wikilearn/WebStatistics?skin=print 
Another print skin!

TWiki . Codev . TopicListAndWebListVariable
twiki.org/cgi-bin/view/Codev/TopicListAndWebListVariable 
I can't tell why this page was found, even by searching the cached page!!??

TWiki . Codev . FeatureBrainstorming
twiki.org/cgi-bin/view/Codev/FeatureBrainstorming 
This is understandable, page contains a dynamic search and this same page title exists in the Codev web.

Contributors:

  • RandyKramer - 15 Nov 2001
  • <If you edit this page, add your name here, move this to the next line>
Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r11 - 2003-05-01 - RandyKramer
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by PerlCopyright 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding WikiLearn? WebBottomBar">Send feedback
See TWiki's New Look