Tags:
create new tag
, view all tags

Proposal to change Wiki Word syntax

For example, Summer2006forSeniors and Md5sumsAfterBurning, (the latter from TWiki:TWiki.WikiWord) should both be wikiwords. Having to be (excessively) creative to find a good wikiname is very unfriendly to users and liable to lead to confusion. Only allowing number at the end is unnecessarily restrictive and, again, completely unintuitive.

-- MeredithLesly in http://develop.twiki.org/~develop/cgi-bin/view/Bugs/Item2556


The GaGaParser lives smile

Defining the regexp is basically a matter of defining what you want to show up as autolinked words and balance that against performance.

A number is allowed anywhere after UlU, not only in the end (see TWiki:TWiki.WikiWord / "Syntax of a WikiWord").

I like the current balance, but you could try to start discussion on this in a Codev topic and raise interest for a re-definition.

(I got a conflict on save, seems like merging of form fields doubles content).

-- SteffenPoulsen - 28 Jun 2006

If users create what would seem to be a wikiword and it's not auto-linked, retaining that for (alleged) performance reasons is a very bad idea. I'd be very surprised if there were performance issues, however, as it involves a minor tweak to the wikiWordRegex. I also thought we were trying to improve the user-unfriendliness of TWiki, so I'm surprised that you would like the current balance. From my reading, TWiki is out of the mainstream in this one.

I hope you reported the form bug.

ML

Could you post the tweak to the regex you use / suggest, so we could try it out for performance? (The cost is not so much the regex itself, but in looking up candidates on topic rendering).

-- SteffenPoulsen - 28 Jun 2006

Please re-open http://develop.twiki.org/~develop/cgi-bin/view/Bugs/Item2556 when you reach a consensus. FWIW I agree with Meredith; I have always been frustrated that you can't sensibly use numbers in wikiword, and treating them as equivalent to upper-case letters seems to work quite well.

-- CrawfordCurrie - 30 Jun 2006

The change has been made in the scratch branch TWikiWithTags. I changed the regex so that numbers were used as in an equivalent to the non-capitalised part of the word, as in the two examples above. I'm not convinced that using them as upper case letters is all that helpful, but I don't have a violent object to it either.

"Consensus". What have you been smoking?

-- MeredithLesly - 30 Jun 2006

http://develop.twiki.org/svn/twiki/scratch/TWikiWithTags/lib/TWiki/Regexes.pm lists the regex as:

qr/[$TWiki::regex{upperAlpha}]+[$TWiki::regex{lowerAlphaNum}]+[$TWiki::regex{upperAlpha}]+[$TWiki::regex{mixedAlphaNum}]*/o;

I think this looks like an ok suggestion, and performance seems ok as well. It will turn words like W3C, I18N, I10N, J2EE, M17N and that kind of semi-abbreviations into real WikiWords so there might be a documentation job in explaining wikiwords against abbreviations with this.

http://develop.twiki.org/svn/twiki/scratch/TWikiWithTags/data/TWiki/WikiWord.txt and the javascript regex'es (for WebTopicCreator) look like they are not updated yet?

-- SteffenPoulsen - 30 Jun 2006

Meredith has got something right here.

We seem to have 3 choices that could make sense.

  • Numbers are like lowercase: Linux2000Conference would be a Wikiword. Linux2000 would not. W3C would be a wiki word. I18n would not.
  • Numbers are like UPPERCASE. Linux2000Conference would still be a Wikiword. Linux2000 would be a WikiWord. W3C would not be a wiki word but it would be seen as an acronym if the W3C topic exists in the same web. I18n would not be a wikiword still.
    • Currently acronyms cannot contain numbers. But we could consider touching the abbreviation regex at the same time if this would make the whole picture clearer after merge. -- SteffenPoulsen - 30 Jun 2006
  • Numbers are just Ignored: Linux2000Conference would be a Wikiword. Linux2000 would not. W3C would not be a wiki word. I18n would not either. I think this would be confusing to document and understand.

I can live with both the uppercase and lower case solution.

If we look at the main negative effects on current users and current topics then it is:

  • Lowercase: W3C, I18N etc becomes wikiwords
  • Uppercase: Linux2000 becomes wikiwords

My first reaction is that Meredith's lowercase interpretation is the least intrusive. Naturally some old topics will suddenly have ?-marks but they are not unreadable so I think we win more than we loose by making this simple change in spec.

So I would supports Meredith's proposal and her implementation in TWiki 4.1. And that is without having been smoking anything wink Numbers should be allowed in WikiWords and be threated as lower case letters.

-- KennethLavrsen - 30 Jun 2006

Note to self: also update wikiword regexes in pub/TWiki/TWikiJavascripts/twiki.js.

-- ArthurClemens - 30 Jun 2006

This one was accepted but since Meredith is off the radar screen someone needs to implement this. Any volunteers?

-- KennethLavrsen - 26 Sep 2006

Only from my shallow memories.... I think it has been implemented in Code (DEVELOP?), but not in test cases. I am running the change in one of my "productive" TWikis but doubt that I "developed" it myself - I rather guess I merged some diff from somewhere...

-- HaraldJoerg - 27 Sep 2006

The code change is simple and already made but not in TWiki4 branch.

As I see it the spec that seems most popular is the original proposal that number are lowercase.

This is the simple patch that introduces that.

Index: TWiki.pm
===================================================================
--- TWiki.pm    (revision 11638)
+++ TWiki.pm    (working copy)
@@ -370,7 +370,7 @@
     $regex{headerPatternNoTOC} = '(\!\!+|%NOTOC%)';

     # TWiki concept regexes
-    $regex{wikiWordRegex} = qr/[$regex{upperAlpha}]+[$regex{lowerAlpha}]+[$regex{upperAlpha}]+[$regex{mixedAlphaNum}]*/o;
+    $regex{wikiWordRegex} = qr/[$regex{upperAlpha}]+[$regex{lowerAlphaNum}]+[$regex{upperAlpha}]+[$regex{mixedAlphaNum}]*/o;
     $regex{webNameBaseRegex} = qr/[$regex{upperAlpha}]+[$regex{mixedAlphaNum}_]*/o;
     $regex{webNameRegex} = qr/$regex{webNameBaseRegex}(?:(?:[\.\/]$regex{webNameBaseRegex})+)*/o;
     $regex{defaultWebNameRegex} = qr/_[$regex{mixedAlphaNum}_]+/o;

Next is to implement the unit test case. I think I understand how to but I cannot make the unit test cases run without creating a near 200 kbyte error log.

And we need Arthur to update twiki.js.

-- KennethLavrsen - 01 Oct 2006

The decision for the spec change has been made and I accept that. (/me thinks that this can break existing content, such as Y2K suddenly showing a spurious questionmark link; /me thinks also from an educational perspective that the redefined WikiWord is not as easy to explain to new users ("UPPER lower UPPER lower" vs. "UPPER lower or numbers UPPER alphanum"))

-- PeterThoeny - 01 Oct 2006

Yes. Y2K does become a wikiword with this change as also discussed above. It is a spec change and it is not backwards compatible which is why it needed to be discussed at a release meeting - which is was - and where it was accepted.

However the way it is not backward compatible is not breaking the readability. You get the spurious question mark link but you can still read the topics. But we gain a feature which many of us, and our users have been missing.

The educational side of this is important.

On the release note side this feature needs to be highligted in 4.1. I think all spec changes and API changes should be highlighted at the top of the release note and not just be an entry in the table of bug items. This way the admins will notice that they have an educational task (like sending an email to all users when they upgrade).

And the way that the WikiWord definition is presented in docs should be more simple.

It should simply say Upper Lower Upper Anything.

And then as a 2nd sentence say that number are considered lower case.

I think that is simpler to understand.

-- KennethLavrsen - 02 Oct 2006

  • The code in TWiki.pm is updated
  • The unit tests are updated to cover new spec
  • Doc is updated. Only definition found was in WikiWord. Also fixed a couple of mistake now that I was at it.
  • TWikiJavascripts updated. It was easy so Arthur can focus on other things. Good you told us Arthur. I would not have thought about it.

So I consider this one complete

-- KennethLavrsen - 04 Oct 2006

After all, I am not too happy with this spec change, I have seen that this breaks a lot of content. Latest example is Rev 3 of HowToInstallCpanModules (now fixed) that shows unwanted raw links caused by INSTALLMAN1DIR in the textarea box. This spec change also results in many new questionmark links popping up in existing text, with common words such as I18N, L10N, PERL5LIB etc. For example, the Codev web uses I18N 378 times.

Since this spec change made it into a production release it is too late to revert it, so I suggest to keep it as is. Going foreward we should evaluate very carefully the impact a proposed spec change does to the millions of existing pages and apps out there, and to make a community based decision based on the TWikiRelease04x01Process. Ideally we should incorporate feedback from a TWikiEndUsersGroup which represents the needs of the end users.

-- PeterThoeny - 05 Mar 2007

Another example of user confusion this spec change causes: Bugs:Item3826

-- PeterThoeny - 02 Apr 2007

I don't understand what

DataCenterEnergyEfficiency(SomeTWikiWord)[1].pdf (sic)

has to do with adding numbers to WikiWords. The bug you quote seems to me to be an unrelated issue, todo more with the scope of the = being ignored.

-- SvenDowideit - 02 Apr 2007

The actual example is DataCenterEnergyEfficiency(TUI3004B)[1].pdf. TUI3004B is now a WikiWord because numbers are treated as lower case letters; it is linked in above example because of the leading parenthesis.

-- PeterThoeny - 02 Apr 2007

If I recall correctly, the agreement on a release meeting was that the benefits of the spec change outweigh its drawbacks (which were known and discussed then).

First of all, the occurrence of an occasional unwanted link is mostly harmless, and easy to fix. I've found that today's marketing speech, having created so many CamelCase words, is a much greater source of "unwanted link annoyance" than "words" with digits in them.

Let's take the notorious I18N as an example:

  • Before the spec change, you'd have to write [[I18N]] if you wanted to link to an existing topic, after the spec change a simple I18N is fine.
  • Before the spec change, you could write I18N and avoid pointing to an nonexisting topic, after the spec change !I18N is required.

From its very nature, I18N is close to REST or Y2K: It is a TLA, or a FLA to be precise. Note what are links in the previous sentence, and what aren't! But I18N unfortunately was a non-acronym before the spec change, and after the spec change it continues to be a non-acronym in TWiki's acronym detecting sense.

Autolinking to acronyms is, in my opinion, a good idea, because it can help to understand topics written by geeks who don't bother to explain their abbreviations (which is quite ok if all the readers are geeks, too). But it only helps if there are acronym topics, and in my TWikis I create them whenever I see fit. And apparently Richard Donkin has done exactly that for I18N.

Things like PERL5LIB are rare, and should best be written as 'code'. Again, not harmful for a reader even if it is a unwanted link.

In hindsight it may be that "better" spec changes would have been possible, which have not been considered at all.

-- HaraldJoerg - 02 Apr 2007

Just for the record. The spec change was processed through the formal release process: EdinburghReleaseMeeting2006x07x18 and noone voted against. The main concern back then was unit test cases.

I personally did the implementation because I was convinced that the benefits were greater than the trouble. I was the main driver for this feature.

Knowing the actual effect it had I would actually have had the opposite oppinion now because it turned out that it caused more trouble than I had expected. For example Motorola kit numbers (looks like XYZ1234A) now becomes unwanted wiki links that we have to escape with !

But now that the feature has been released and used I do not want to revert it. But I want to learn from the experience. And that is that you have to be very critical against spec changes that affects existing topics and especially changes that affects "normal" text. You can introduce a %VERYSPECIALVARIABLE{"oddvalue" funnyfoo="sillyvalue"}% and get away with it with no problems. But changing specs that makes normal business text (including what looks like part numbers, software variable, function names etc etc) being interpreted differently is something you really have to avoid.

If this was year 1999 and I was part of defining how TML should work I would have avoided the WikiWord and only accepted links inside [[]]

But TWiki is what TWiki is today and that is fine. But the lesson learned for me is not to extend the wiki word definition further without testing it on actual installations. And this is also why I have been talking against the underscore wikiwords. Syntax within [[]] I have no problem extending.

-- KennethLavrsen - 02 Apr 2007

Edit | Attach | Watch | Print version | History: r19 < r18 < r17 < r16 < r15 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r19 - 2007-04-02 - KennethLavrsen
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.