Tags:
create new tag
, view all tags
I am posting the following email to the Mandrake expert and newbie mailing lists, pointing out that the text of the letter is also posted here and that "adventurous" people can respond on this wiki (TWiki) page. Later I expect to submit it to the AbiWord mailing lists, and perhaps others.

See AboutThesePages.

Better Ways to Search Mail List Archives?, Potential RFE

I'm looking for better ways to search recent archives of mail lists, starting with the Mandrake and AbiWord archives. Initially, I've focused on the Mandrake archives, for example: http://www.mandrake.com/en/archives/newbie/.

Contents

Background

First some background on what I'm trying to do:

I'm maintaining the WikiLearn site as a learning notebook as I attempt to learn Linux, C++, Python, and related open source technology. See http://twiki.org/cgi-bin/view/Wikilearn/AboutThesePages and http://twiki.org/cgi-bin/view/Wikilearn/WhyWikiLearn.

BTW, anyone can help, and may use WikiLearn as their learning notebook, for the benefit of themselves and others.

One of the ways I learn is by reading various mail lists. When I find a piece of information on a mail list that I think is worth recording for either my own or someone else's future use, I create a new page or edit an existing page and do one of three things with the information:

  • Paraphrase it entirely into my own words, often but not always adding the originator of the post as a contributor to the page.
  • Quote all or part of it, with proper attribution (ideally this attribution could include a link to the original email in the archives, just like the next item).
  • Link to the original email in the archives, with perhaps some words of explanation about what is covered in the linked email.

With the latter two, my hope is to eventually "refactor" the page into my own words (or have someone else refactor it into their words), and list the author of the post (if appropriate) and the "refactorer" as contributors.

The Problem: Finding Posts in the Archives

The problem I have is finding the email in the archives. Here are the ways I've tried searching the Mandrake archives and some of the problems I've run into. (And some questions.)

Use a Search Site like Google

There are two problems with this:

  • Recent posts might be in the archives, but unless I'm very lucky, they have usually not yet been indexed by Google. (Google may index the site only every four weeks or less often.) Is there a way to get Google to index a site daily? Is there another publicly available search site that indexes mail list archive sites daily?

  • Although I can limit the search (in Google) to one site using the site directive (like site:http://www.mandrake.com/), there is no convenient way to get more specific (like site:http://www.mandrake.com/en/archives/newbie/). (I know I can add "en archives newbie" or "/en/archives/newbie/" to the search string, but these are not convenient, and the second uses up my chance to include a phrase in the search (I don't know if this is still true or not — at the time I made the statement it was on the premise that Google could handle only one phrase in a search query). I have written to Google about this issue, and will post if I get a useful response.

(Aside: Ideally, I'd like to create some personal web pages (maybe on TWiki) that include a textbox (where I can enter search terms) and a command button that would initiate a search within a specific archive. An attempt at an example: "Search the Mandrake expert archives for [             ] ". If anyone can explain how I can do that in HTML (or whatever) that would be helpful (but is not currently my highest priority).

Use Search Tools on the Archive Site

Mandrake Archives

On Mandrake I've tried the "Search this Site" text box, which doesn't limit the search to just the archives -- and thus the searches ususally don't work very well. For an example of the problem, try:

  • go to, for example, http://www.mandrake.com/en/archives/newbie/2002-06/
  • copy the title of one of the emails into the search box (with or without enclosing quotes) (I copied "enlarge a linux partition", an email within the first 20 on that page.)
  • press enter to initiate a search
  • view the results

I never get a hit on an email that I know is there.

AbiWord Archives

On AbiWord, no search tool is provided with the archive -- the best you can do is display the emails in various orders (like date, author, thread, subject) and then scroll through or use your browser's search tool.

Use Your Browser's Search Tool

Go to a specific page of the archives (like all the posts for a specific month) then use the search features of my browser to search that specific page. This is not very convenient for two reasons:

  • First, you must know the month the post was made, and then load that month's page (or load and search page after page until you find what you are looking for). (Mandrake is even less convenient, because they divide a single month into multiple pages.)

  • Pages tend to be big (1000 posts or more), so I have to wait for these long pages to load (over my dial up line) before I can conduct the search.

The Question

Does anyone have better ways to search the archives?

SubQuestion

Is the Message-ID: (like <20020629014222.GD11389@mandrakesoft.com>) archived with a message, and is that a useful thing to search on, either in the archives or in Google? Update: Nope -- just tried it myself, apparently Google does not index the Message-ID, nor is it included on the archive pages.

The Suggestions (RFEs)

If not (and maybe even if so), I have a few suggestions on potential ways to make the use of archives more convenient:

Add Link to the Archive before Sending to List Members

Enter a post in the archives as soon as it gets to the mail list server, and then include a link to the archived email with the post that goes out to the subscribers. An example of the link: "This post has been archived as http://www.mandrake.com/en/archives/newbie/2002-06/msg02005.php."

(Was that clear? Normally, when someone posts to a mail list, IIUC, it first goes to the mail list server, and is sent from there to the subscribers. What I'm suggesting is that it take a momentary detour when it arrives at the mail list server during which it is added to the archives and a line is added to the post (perhaps a header, or a footer) which says, for example, "This post has been archived as http://www.mandrake.com/en/archives/newbie/2002-06/msg02005.php.".) (I recognize the archive could be moved. I'm hoping that even if it does move, things like the "2002-06/msg02005.php" would be preserved (along with stuff to identify the archive, like "en/archives/newbie").)

Dedicated Search Engine on Site with Daily Index Update

Provide a search engine on the site specifically for the archives, and with the index updated at least daily. (Preferably more often -- should be little reason it can be done every hour with a cron job -- if it indexes only the added posts it should not be a major burder (little or no worse than doing the same thing every 24 hours).

Comments

Add your comments here. This is a sample comment to suggest a method of adding your comments. You can also add comments in the body of the message above. In either case, please sign your comments. Near the bottom of the page in edit mode, you will find your signature in a form that you can copy and paste to sign your comments. If you put comments within the message above, please show them in italic by enclosing them with underbars, as I've done here. -- RandyKramer - 17 Jul 2002

-- RandyKramer - 17 Jul 2002

Contributors

  • () RandyKramer - 16 Jul 2002
  • <If you edit this page: add your name here; move this to the next line; and include your comment marker (initials), if you have created one, in parenthesis before your WikiName.>

Revision Record

  • 01 Oct 2003 — minor phrasing and formatting "improvements" (I hope)

Page Ratings

Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r10 - 2003-10-01 - RandyKramer
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by PerlCopyright 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding WikiLearn? WebBottomBar">Send feedback
See TWiki's New Look