SEARCH with TQL (TWiki Query Language)
SEARCH currently supports types
keyword,
word,
regex. Keyword and word search are simple to use, regex is powerful, but you need to be a rocket scientist to understand it. In addition, regex search assumes a flat file backend. We need a TWiki Query Language so that we can easily create complex queries à la
SQL. The TQL is enabled with a
type="query" parameter to SEARCH.
Requirements
TQL could be aligned with syntax of
FormattedSearch format,
%IF{}%, and potentially
DBCacheContrib.
Initial implemenation is "just" to query of form fields. The syntax should be intuitive and extensible. That is, let's plan a syntax that later can support also queries to:
- search in topic title
- search in text body
- search subset of text body, such as:
- search in H2 headings only
- search in bullets only
- search in second column of table #3 only
- query form name
- query form fields
- query of other meta data, such as:
- parent topic
- file attachments
Example query in plain English: Show me all topics where:
- form field "Firstname" is "Emma"
- and form field "Lastname" is "Peel"
- and form field "City" starts with "San"
- and form field "Country" is not "Costa Rica"
- and row "2007" of table named "membership" is "paid"
Spec of TQL
Spec of TQL is TBD.
Draft proposal 1:
%SEARCH{"Firstname='Emma' AND Lastname='Peel' AND City~'San .*' AND Country!='Costa Rica'"}%
Draft proposal 2:
%SEARCH{"$formfield(Firstname)='Emma' AND $formfield(Lastname)='Peel' AND $formfield(City)~'San .*' AND $formfield(Country)!='Costa Rica'"}%
Draft proposal 2 is aligned with the
FormattedSearch syntax, e.g. it "does not surprise users".
Implementation
Scope of initial implementation is "just" a query language for form fields. Implementation of Draft proposal 1 has been done on 08 May 2007.
--
Contributors: CrawfordCurrie,
PeterThoeny - 08 May 2007
Discussion
This is a follow-up of
SimpleFieldQueriesInMETASEARCH. I moved below content of 08 May 2007 from there to here.
--
PeterThoeny - 08 May 2007
I would go so far to not extend METASEARCH, but to offer all meta search features in SEARCH, and to deprecate (but never remove) METASEARCH.
Adding meta query features to SEARCH is not difficult. It simply means to transform the meta query syntax we agree on to a regular expression search (containing ANDed tokens if needed). So, the only change is in the front-end of search.
For example, this search:
"$formfield(Firstname, Emma) +$formfield(Lastname, Peel) +$formfield(City, San Jose) -$formfield(Country, Costa Rica)" type="keyword"
would result in this regular expression search:
"META\:FIELD.*\"Firstname\".*value=\"Emma\";META\:FIELD.*\"Lastname\".*value=\"Peel\";META\:FIELD.*\"City\".*value=\"San Jose\";!META\:FIELD.*\"Country\".*value=\"Costa Rica\""
--
PeterThoeny - 08 May 2007
I'm not going to use regular expression searches for this for several reasons:
- Mixing RE and form field searches doesn't allow enough operators; you only have exact match
- Your suggested implementation implicitly assumes meta-data inline in the topic, which is not a valid assumption
- It's much simpler to use code taken from DBCacheContrib and do a proper SQL-style query
As you say, it's fairly easy to extend Search.pm to do this, and that's what I have done. I have used
type="query", which lets you write (for example)
%SEARCH{"TOPICPARENT.name='MeetingForm' AND (Date<'1 May 2007' OR TOPICINFO.author='TWikiGuest')"}%. I will be checking in shortly.
Our standard example:
%SEARCH{"Firstname='Emma' AND Lastname='Peel' AND City~'San .*' AND Country!='Costa Rica'"}%
--
CrawfordCurrie - 08 May 2007
I am not hung up on translating to regular expressions, this is just an implementation detail. It is a low hanging fruit way to to cope with the current implementation. With regex you can do fuzzy matches. At a later point, the search implementation can be changed once we have a DB backend.
On query syntax, I think we need a clear spec and some time to digest this. Specifically, we should strive to make the syntax intuitive and extensible. That is, plan a syntax that later can support also query of:
- search in topic title
- search in text body
- search subset of text body, such as:
- search in in H2 headings only
- search in bullets only
- search in second column of table #3 only
- query form name
- query form fields
- query of other meta data, such as:
- parent topic
- file attachments
Example query in plain English: Show me all topics where:
- form field "Firstname" is "Emma"
- and form field "Lastname" is "Peel"
- and form field "City" starts with "San"
- and form field "Country" is not "Costa Rica"
- and row "2007" of table named "membership" is "paid"
This does not mean we should implement this now, but we should plan the syntax to allow it to be extended in a way to support above queries.
--
PeterThoeny - 08 May 2007
How about aligning the syntax with the existing
%IF{}% syntax?
--
PeterThoeny - 08 May 2007
I just wanted to add that I am very excited about this feature. It strengthens the TWiki project as a wiki application platform, and paves the way for a scalable storage backend. A TWiki Query Language is also a noteworthy feature for TWiki Release 4.2.
--
PeterThoeny - 09 May 2007
To read about the existing (MAIN) implementation of query searching, see
http://develop.twiki.org/~twiki4/cgi-bin/view/TWiki/QuerySearch
Peter, by aligning the syntax, you mean use $ for accessing TWiki variables? I considered that, but decided against it because $ is an operator in %IF, and has a loaded meaning viz it accesses TWikiVariables and URLPARAMS, neither of which apply in a query. Otherwise the operator set is the same; except maybe the RE operator; I will check. Any suggestions for a better cross-topic ref syntax (currently Web.TopicName:Field) is welcome; Michael uses
@Web
.TopicName:Field in
DBCachePlugin, but that breaks the parser quite a bit.
--
CrawfordCurrie - 09 May 2007
BTW I understand what you are driving at w.r.t
$formfield, but it would:
- surprise the heck out of any user coming from DBCachePlugin or FormQueryPlugin
- That represents a small number of users. Both plugins combined represent 1/3rd of a percent of the TWiki downloads (340 vs. 10,000/month). Better to focus on an intuitive and extensible syntax, and not on the existing syntax of these plugins. -- PTh
- be quite alien to anyone used to SQL or any other query language I know of.
- be unreadable, readability is really important, we shouldn't sacrifice it.
- Debatable. I find using unqualified names for form field ambiguous and confusing. -- PTh
- be easy to get confused between the query language, and formatting strings, which are something quite different.
- Fair point. Although it means there is a confusion if the syntax is not aligned: "Now which way was it, do I have to type
parent in the search parameter and $parent() in the format parameter, or was it the other way around?" -- PTh
Keep $formfield as a convention for formatting, along with $nop and $percnt and all other such conventions.
I have aligned with the IF syntax (they now share a common parser).
Note that this search is going to be pretty inefficient until someone implements a way to cache meta objects (which is what the
DBCacheContrib basically does)
--
CrawfordCurrie - 09 May 2007
I find Crawford's implementation scoring the lowest
NerdoMeter points (low is good). I like that the syntax is simple, and yet flexible (I can AND, OR and NOT and use parenthesis). The $formfield syntax should be used only for the format in the search to avoid confusing people.
--
KennethLavrsen - 09 May 2007
I think we all agree on the AND, OR, NOT and grouping with parenthesis. The question is, should we align the syntax with something TWiki users are already familiar with, or not. If not, should this be aligned with
SQL, or something new? (like the unfamiliar syntax of
DBCachePlugin or
FormQueryPlugin (are they aligned?)) Millions of people are familiar with the existing TWiki syntax, even more people with
SQL...
Regardless of these questions, the current proposal to use form fields without indication that they are form fields is ambiguous and confusing. (It is clear in
SQL since you only think in rows, columns and relations, not text with some structure.) The ambiguity stems from the fact that it is unclear that
Firstname='Emma' refers to a form field name; it could be a search in body text, a bullet, table or something else. In contrast,
$formfield(Firstname)='Emma' is clear and unambiguous. Same for
formfield(Firstname)='Emma', or even
formfield('Firstname')='Emma'.
Let's compare items that are (potentially) shared by TQL and
FormattedSearch. TQL syntax is based on draft proposal 1, but with unambiguous formfields:
| Item |
TQL Syntax |
TQL Example |
Formatted Search Syntax |
| Topic name |
topic |
topic~'FAQ' |
$topic |
| Parent topic |
parent |
parent='MeetingMinutes' |
$parent |
| Date |
date |
date > '2007-05-01' |
$date |
| WikiName |
wikiname |
wikiname='TWikiGuest' |
$wikiname |
| Form name |
formname |
formname='ExpenseForm' |
$formname |
| Form field, named |
formfield(name) |
formfield(Firstname)='Emma' |
$formfield(name) |
| Text body |
text |
text~'FAQ' |
$text |
| Attachment, named |
attachment(name) |
attachment(photo.jpg).size < '100K' |
TBD |
"Hmm, which way was it? Do I have to type
$parent in the search parameter and
parent in the format parameter, or was it the other way around?"
--
PeterThoeny - 10 May 2007
I think for now it is better not to implement the attachment queries. The spec needs to be thought through. People might want to run queries such as:
- test if there is an attachment called photo.jpg or photo.png
- of all jpg images, return me the name of the latest one.
- show me a list of all attachments that have 'FAQ' in the comment field.
This results in a list of attachments per topic, and requires some formatting action on each attachment. That is, format="" needs to be enhanced as well.
--
PeterThoeny - 10 May 2007
Other questions:
- Besides exact matches (= and !=), should we allow regex matches or wildcard maches, or both? Regex is powerful but geekish, wildcards are less flexible but widely understood.
- Are regex queries possible once we have a DB backend?
- How much should we align the syntax with SQL? Plan for exact subset?
- Should this be aligned with ContentAccessSyntax?
--
PeterThoeny - 10 May 2007
I like Crawford's syntax too.
If I had to choose an alternative, that allows extension to any kind of queries over any datum in the topic, I would prefer something like
form.FirstName='Emma', or better still (to drop the assumption of a single form per topic)
FormName.FirstName='Emma'.
This is the alternative I'm proposing, using the same items as Pete table:
| Item |
TQL Syntax |
TQL Example |
| Topic name |
topic.title |
topic.title~'FAQ' |
| Parent topic |
topic.parent |
topic.parent='MeetingMinutes' |
| Date |
date |
topic.date > '2007-05-01' |
| Form name |
topic.formname |
topic.formname='ExpenseForm' |
| Form field, named |
topic.form.name or form.name |
topic.form.Firstname='Emma' or form.Firstname='Emma' |
| Form field, named by formname |
topic.formname.name or formname.name |
topic.MyForm.Firstname='Emma' or MyForm.Firstname='Emma' |
| WikiName |
(I do't really understand this item... ) |
| Text body |
topic.text or text |
topic.text~'FAQ' or text~'FAQ' |
| Attachment, named |
topic.attachment(name) |
topic.attachment(photo.jpg).size < '100K' |
--
RafaelAlvarez - 10 May 2007
DBCachePlugin and
FormQueryPlugin both use the
DBCacheContrib engine, so yes, their syntaxes are aligned. I want to avoid alienating existing users of these plugins (there are quite a few) by too radical a shift in syntax.
Peter asked:
- Besides exact matches (= and !=), should we allow regex matches or wildcard maches, or both? Regex is powerful but geekish, wildcards are less flexible but widely understood.
- Yes.
type=regex is a fundamental search mode, so regex searches can't be all that scary. The current implementation has the ~ (match) operator. CC
- Are regex queries possible once we have a DB backend?
- Yes, though they may be rather inefficient in some DBs that only implement the strict CONTAINS. CC
- How much should we align the syntax with SQL? Plan for exact subset?
- Really this alignment is limited to the contents of the WHERE clause. I have aimed for as much compatibility as I deemed "sensible" i.e. not trying to implement full SQL syntax, but ensure that a db query can be trivially transformed into SQL CC
- Should this be aligned with ContentAccessSyntax?
- That syntax is dead in the water, which is part of the reason I proposed this in the first place. I felt that by gaining experience with this sort of lookup we can learn what is right.
There have been several mentions of
SQL, but none of
XPath
, which makes me think most people haven't heard of it. XPath is recommended as part of the JCR (Java content repository) specification, as is
SQL. I chose to go the
SQL-like route here for two reasons: (1) I had existing code from DBCacheContrib that implements it and (2) I find XPath expressions horrendousy unfriendly. Some aspects of the
ContentAccessSyntax are driven from XPath, and are equally unfriendly, but it's hard to think of another way to do it
Anyway, I have been revisiting some of the design decisions in the code so far, ironing out some difficult concepts that gave us grief in DBCacheContrib, and will check in a revised version shortly. This uses a syntax much closer to what Rafael describes above, and much more consistent with
DBCacheContrib.
Later: I checked in, but just the doc; the implementation is still the previous syntax. See
http://develop.twiki.org/~twiki4/cgi-bin/view/TWiki/QuerySearch
. Comments invited.
--
CrawfordCurrie - 10 May 2007
The doc is all updated, and unless there is some feedback then it will go as it is implemented. It's running in Bugs web; visit
Bugs:AllOutStandingItemsQuery
to experience a real live query!
Setting this as ready for release meeting.
--
CrawfordCurrie - 13 May 2007
If noone objects then I will change this to Consensus Reached tomorrow.
It seems all are happy with how the spec ended up.
--
KennethLavrsen - 19 May 2007
It is a good start. I like the operators a lot: =, !=, ~, <, >, >=, <=, lc(x), uc(x), NOT, AND, OR, ()
I do not want to hold up progress, but my primary concern raised on 10 May 2007 has not been addressed:
The current proposal to use form fields without indication that they are form fields is ambiguous and confusing. (It is clear in SQL since you only think in rows, columns and relations; not text with some structure.) The ambiguity stems from the fact that it is unclear that Firstname='Emma' refers to a form field name; it could be a search in body text, a bullet, table or something else.
There are additional items I would like to see addressed (some are raised previosly.)
Suggestions:
- To avoid confusion, align field specifiers with existing terminology of FormattedSearch:
-
formname (instead of form)
-
formfield (instead of name)
-
parent.topic (instead of parent.name)
- The names used reflect the names in %META. If a user looks into the source of a topic (and many do) they need to be able to make the connection easily, and not have to fight with an arbitrary rename mapping. CC
- In general I like to avoid exposing implementation details to users. Better to align naming of documented stuff. -- PTh
- To avoid confusion, always require field specifiers, e.g. disallow a field specifier that implicitely points to a named formfield. Think TOM (topic object model), not rows & columns.
-
formfield.Headgear ~ 'Bowler Hat' (instead of Headgear ~ 'Bowler Hat')
- Disagree. It makes queries unreadable, and makes them incompatible with DBCachePlugin and FormQueryPlugin. The shortcut syntax is unambiguous, and makes queries much easier to read. CC
- This shortcut is ambigous and can hurt down the road when we want to extend the ContentAccessSyntax. This shortcut is aligned with SQL, but TWiki is more of a TOM than a relational database. -- PTh
- There is only one value for a named formfield, e.g. no need to add name, title, value qualifiers.
-
formfield.Lastname (instead of PersonForm[?name='Lastname'].value)
-
formfield.'Last name' (instead of PersonForm[?name='Last name'].value)
- At a later point when we support multiple forms per topic:
-
formname.PersonForm.formfield.Lastname (instead of PersonForm[?name='Lastname'].value)
-
Lastname already means PersonForm[?name='Lastname'].value; but you are arguing against this in (2)! The syntax is already designed for multi-form specifiers by using the form name as the context specifier (i.e. PersonForm.Lastname). The code is already there to do this, I just didn't document it because it would be confusing while we only have one form. CC
- I think you missed the point. The formfield has name, title, value parameters. There is only a need to query the value from a name formfield. That is, formfield.Lastname.name and formfield.Lastname.value is overkill; you only need formfield.Lastname to get at the one interesting item (the value). So, this is an attempt to simplify things and making the TQL more concise. -- PTh
- Avoid
X[?query] syntax if possible. It raises the NerdoMeter factor almost to the level of regular epressions; let's avoid it if possible. This needs more thinking, but here is a first try:
-
formname = 'HistoryForm' AND formfield.Age > 2 (instead of HistoryForm[?name='Age'].value>2)
-
attachments.'purdey.gif' (instead of attachments[?name='purdey.gif'])
-
attachments.'purdey.gif'.comment ~ 'Weekly report' (instead of attachments[?name='purdey.gif'].comment ~ 'Weekly report')
- I agree that the [? syntax is not the most readable. See my remarks below. CC
- "Think TOM" (topic object model). At a later time we can add:
-
table[1] -- first table
-
table.ToDo -- table named "ToDo"
-
table.ToDo[1][3] -- 3rd row in first column of table named "ToDo"
-
h2[7] -- seventh H2 heading
- I designed the syntax (way back in 2003 in the FormQueryPlugin) to support the TOM concept - indeed, this is where the concept comes from. You may have seen that the FormQueryPlugin already supports access to embedded tables using this syntax. BTW this is another reason to keep the syntax as context-free as possible, as context sensitivities always end up making extensions much harder (as seen with the Attrs syntax) CC
- Plan for CAS (ContentAccessSyntax). That is, once we have CAS we should be able to use the exact same TQL syntax. An intuitive way is to have a prefix followed by TQL:
- For SEARCH:
"parent.formfield.Headgear ~ 'Bowler Hat'"
- For CAS:
TQL:parent.formfield.Headgear returns the value of formfield named "Headgear" of parent topic. (Similar to Proposal 5 of CAS)
- For CAS, equivalent:
TQL:"parent.formfield.Headgear" (needed if there are spaces in the query)
- For CAS, variable syntax:
%TQL{"parent.formfield.Headgear"}%
- You are teaching your grandmother to suck eggs. CC
- ?? -- PTh
--
PeterThoeny - 20 May 2007
As we learnt with the
FormQueryPlugin there is a balancing act, between a clean, context-free, syntax which risks being nerdy, and shortcuts (such as those described in (4) which are complex to support and increase the number of things a user has to learn. I considered doing something like you describe, and even coded up a couple of experiments (in the FQP). However I concluded that it imposes the same expectations on users as the [? syntax. Ultimately every user of queries ends up having to select on the basis of something more complex than the name. This is especially prevalent with attachments and tables. For example,
attachments[? name ~ 'gif$' && size > 1024] is a typical meme.
Right now I feel that the syntax strikes just about the right balance, and based on careful iterative design and use during the DBCache experience we know it works, is extensible to TOM, and is accessible to end users. As XPath shows (IMHO), extending the range of shortcuts confuses the syntax and risks making it much harder to extend, as well as making it less usable. For this reason I would prefer not to tinker with it.
I'd much rather focus your thoughts on how to support more difficult TOM concepts, such as sectioning the topic. For example, if I have this topic:
---+ Heading
| Fleem | Barge | Clump |
| Rass | Tank | Libbit |
---++ Subheading
| Snot | Crag | Himble |
then is that represented in the TOM as:
- HEADING 'Heading'
- TABLES [ [ 'Fleem', 'Barge', 'Clump' ], [ ...
- 'Subheading'
or as
- HEADINGS [ 'Heading] => [ 'Subheading' ...
- TABLES [ [ [ 'Fleem', 'Barge', 'Clump' ], [ ..., [ [ 'Snot', ...
or both? Or should we require tables to be typed (as is the case in
FormQueryPlugin)? How do we section paragraphs? Is an explicit <P> to be treated the same as a a blank line? Should we be deconstructing
HTML? It's not as obvious as it seems; you really have to try it to find out.
On that note there is existing experience from three different sources on this topic:
- The FormQueryPlugin, which extracts Tables
- Sven's sectional editing experiments
- The existing core 'lift-out' parser, which is halfway to building a TOM already
We have to build experience with the TOM before firtling with the query language.
--
CrawfordCurrie - 20 May 2007
To be flexible, probably both. But that needs to be designed very carefully.
On
[?...] syntax, the linear
first.second.third.etc is much easier to grok than a nerdy
first.second[?but.something.else].more. Fast forward some time, think of a
WYSIWYG editor that has a wizzard or an autocomplete to compose a TQL in a SEARCH or a CAS. It is much more intuitive to drill down a hierarchy (with autocomplete) than to do the
[?excpetion]. It is also easier to implement.
--
PeterThoeny - 21 May 2007
I have no problem fast-forwarding; but the risk when you fast forward is that you skip something important, and I don't want to do that. It's too easy to
add features, mush harder to
remove them, so I'd rather keep to a well structured, consistent syntax now. We can add shortcuts and featurettes later, as and when the users clamour for them.
--
CrawfordCurrie - 21 May 2007
We spent a lot of time discussing this in the twiki release meeting last night. See
FreetownReleaseMeeting2007x05x21. There were valid concerns expressed about the complexity of the syntax. This is important enough that I'm going to burn some time trying to explain again, point by point. Please read this and try to understand before proposing any more syntax changes.
I'll start by laying out
the basic problem.
The problem the syntax has to deal with is that of addressing data that is stored within a meta-data object. Meta-data is organised using two separate collection concepts - hashes (unordered sets indexed by name) and arrays (ordered sets indexed by number). Arrays in meta-data are
also indexed by strings, by virtue of indexing on the
name field embedded in the hashes stored in the arrays. While this almost-but-not-quite-an-associative-array is a valuable concept in %META, it doesn't scale to any other kind of array (e.g. arrays of paragraphs within a topic, arrays of rows in tables). So we need a syntax which allows arrays to be indexed by
any field that may be stored in an array element, but also by an
integer index. And we have to do this in a user-friendly, accessible way. This is what the syntax I have designed tries to do.
OK, to the first question;
why can't we use the syntax used in formatting expressions
Well, we could. Formatted search provides three syntax items that support recovery of meta-data from a topic -
$parent,
$formname and
$formfield - and this syntax could be used in recovering meta values. However this approach has a number of major problems:
- It only addresses a subset of the fields in meta. There is no equivalent syntax for addressing attachments, topicinfo etc. New syntax would have to be invented to address these items, and then users would have a reasonable expectation that this invented syntax would then be usable in formatted search results - which it isn't.
- The syntax is specifically designed to support recognition within a block of plain text - for example,
the $formfield(pet) sat on the $formfield(furniture). As such it is heavily decorated with escapes; the $ and the brackets. When translated to the context of a query expression these escapes are not only unneccessary, they are also ugly and awkward to type, and the extra syntax is pointless too. Why should I have to write $formfield(Age) > 25 AND ( $formfield(Disease)='Schizophrenia' OR $formfield(Disease)='Paranoia' ) when all I want to say is Age > 25 AND ( Disease='Schizophrenia' OR Disease='Paranoia' ).
- The
$formfield syntax only supports recovery of the value field in a form entry. There is no way to recover title - or any other field we might add, such as type or author.
So yes, we could use the
$formfield syntax, but it isn't a good idea. It's designed for use in formatted text, not queries, and the restrictions it implies are just storing up trouble for the future. BTW I said in the meeting that the formatted text syntax was not well designed; I need to clarify that I meant it is not well designed
for this purpose. I have no issues with it in the context of formatting search results.
- Not sure why you bring this up again. May be it got lost in the noise, but I retracted from the
$something() syntax on 10 May 2007 (see above.) -- PTh
So how about the
geekiness of the [? syntax for array searches.
Yep, it's geeky, no question about it. It can be made somewhat less geeky by use of some plain english (
[where instead of
[?, which is a good idea) but the fact remains that the query language
has to have some way of expressing a search over the arrays stored in the meta. Otherwise queries are totally emasculated, to the point of being useless. Today this really only impacts form fields and attachments, which are both stored in field-indexed arrays, but in future other data will be stored this way (tables, paragraphs etc). If we don't have a mechanism to index these collections, we will be cutting TOM off at the knees before it even gets started. It is
not sufficient to only support indexing of a single field name within the array - an approach which is implicit in a syntax such as
attachments.'logo.gif'.size - because the assumption that the field exists in all arrays is invalid. Adding a "dot" syntax that implicitly indexes the
name is a bad idea because:
- it increases the size of the syntax, makes it context sensitive (probably requiring a total rewrite of the parser), and increases the amount users have to learn; they will still have to use the
[where syntax where the dot syntax doesn't work,
- increases the risk of confusion with existing topic reference syntax,
- is really unfriendly where complex expressions have to be built,
- we tried it in FQP; it was just too confusing for people, who couldn't understand why arrays and hashes - fundamentally different concepts - had to use the same syntax. It remains as a legacy problem there, and I don't want to import the problem into the core.
So yes, it's geeky, but we need to focus on ways to make it less geeky without destroying it, and not just add arbitrary complexity on top in the mistaken belief that it makes it easier to use.
The
[? syntax was chosen because of the obvious relationship to array index syntax - which by universal convention uses square brackets.
OK, what about
use of shortcuts for field references.
I'm still not totally sure what the objection is here, but currently
Age is a shortcut for
field[where name='Age'].value. The argument is that
formfield.Age is a better syntax. Presumably the thinking is that
formfield. is an implicit lookup of the
name field in the records stored in the
formfield array. This is in itself a shortcut; in what way is it a better shortcut? It's more typing, and it immediately creates a problem when multiple forms are attached to a topic. At the moment the
Age syntax naturally evolves into
PersonForm.Age when multiple forms are attached (in fact this syntax is already supported) whereas
formfield.Age requires additional syntax to disambiguate it - unless you converge with the
current syntax. Sorry, it just doesn't make sense to me.
I know I sound like a cracked record, but the reality is that I didn't arrive at this syntax by chance; it is the result of actual use of queries in a
real environment, by
real users (and
not programmers, as I am constantly accused), and a great deal of hard work. I do not believe that any alternate syntax proposal has any validity whatsoever until it is
actually proven to work.
A final point raised was on the support for regular expressions. The assembled team at the release meeting felt that general regular expressions are too geeky (despite their being fully supported in
type="regex" SEARCHes) and that the ~ operator should be removed or simplified to a
SQL LIKE operator. This has obvious advantages when mapping to
SQL queries, so I think we should do it. Because the % signs used in
SQL LIKE conflict with the TWiki variable syntax, I propose to use the * wildcard in place of them.
--
CrawfordCurrie - 22 May 2007
Thanks for the clarification.
--
ArthurClemens - 22 May 2007
MichaelDaum made a valuable observation yesterday; he pointed out that META isn't just limited to TOPICINFO, FIELD etc - arbitrary META can be added, and we have already seen the example of META:PREFERENCE. This raises a complexity I hadn't considered; that the top level keywords (
form,
info etc) cannot be keywords - because there are no equivalent keywords for META:PREFERENCE etc.
This problem isn't easy to solve. You can't just treat any top level name as a META entry type, because then that conflicts with the field name shortcut. We need syntax to either escape field names, or escape META field names. We discussed a range of ideas, but of these I think the best is to use META: explicitly in the name, making it less likely to conflict with form field names. We can retain the existing aliases under the existing rules - there is no conflict there. OK, that's a bit obscure, sorry. But what it means is that if you want to reference a META:PREFERENCE field in a query you have to say this:
"META:PREFERENCE[where name='Blah'].value". The : is
not an operator, it's part of the name.
I propose to change the operator used for references to
/ instead of
: to avoid any ambiguity.
--
CrawfordCurrie - 23 May 2007
Lynnwood (another experienced DBCache user) did his usual trick of asking a really obvious question - "why do we need to say WHERE?" - that made me think. The reality is that
X[1] is sematically equivalent to
X[WHERE index=1] - there is no need for a special array index syntax if we assume the existance of an
index pseudo-attribute on all hashes stored in arrays. That means that the plain squab suffices as a syntax for all array accesses - they are
all associative arrays indexed by queries. We can even quietly ignore the
index attribute for now, and simply use
[ wherever we used
[WHERE. e.g.
attachments[name ~ '*.gif' AND size > 1024],
attachments[index=3],
fields[name ~ 'Accountant*' AND value ~ "*ArthurAnderson*"]. When we cone to use the syntax for content access, we might use =# instead of
index, or simply imply it.
--
CrawfordCurrie - 26 May 2007
I actually thought the [something was necessary for the syntax. Having neither [? nor [WHERE but instead plain [ is fine.
I still do not like that I have to put .value but I can live with it because there is the shortcut syntax which is very simple and scores much lower on the
NerdoMeter and which ordinary deadly people will normally use when just making a simple search for form field values.
Does anyone still have objections? Or can we compromise on the syntax this proposal has migrated to now and call it a consensus decision?
--
KennethLavrsen - 26 May 2007
I applaud Crawford for incorporating the feedback of Michael Lynnwood, but I have not seen my concerns addressed. I will post more detailed feedback later today.
--
PeterThoeny - 28 May 2007
Thanks Crawford for explaining your design decisions, the motives are very clear. It is nice to see the syntax evolve based on feedback from Michael and Lynnwood.
Let us take a step back and look at the options from 10,000 feet above the ground:
1. Use existing
type="regex" SEARCH:
-
Powerful: Very flexible and powerful, you can do almost anything.
-
Ubiquitous: Millions of programmers are familiar with RegularExpressions.
-
Nerdy: You need to be a programmer to understand and use it.
-
Queries are data store dependent
2. Align TQL closely with
SQL:
-
Ubiquitous: Millions of DB admins are immediately familiar with the syntax.
-
Speed: Good performance once we have a DB backend since the queries can be passed through to the DB engine.
-
Speed: Any extension based on regex that goes beyond the SQL LIKE operator results in a speed equivalent to the current regex search since all records need to be scanned.
-
SQL is a query language designed for accessing data organised in tables. TWiki data is not organised in tables, it is organised in a tree. For an example of how difficult it is to use SQL to access a tree-structured DB, look at systems such as ClearQuest, where the SQL required to perform even simple queries is monumental. The syntax I have designed is closely aligned with the SQL WHERE clause already. CC
3. Align TQL closely with
DOM:
first.second[3].third.etc AND some.more
-
Ubiquitous: Millions of web developers are familiar with. TWiki application developers are most likely to be familiar with DOM, more so than SQL.
-
To be defined: Syntax not clear yet on how to do more powerful queries similar to regex and FQP.
- "DOM" is not a query language, it's a document object model. Many different syntaxes are used for accessing the DOM, depending on the language being used to program using the model. The most common query language used for accessing the DOM is XPath, which is War and Peace to my proposed Janet and John. - CC
Now compare this with a "rolling our own" solution, as done with the current implementation:
4. Enhanced
FormQueryPlugin (
FQP) syntax:
-
Powerful: Very flexible and powerful, supports even queries within queries.
-
Users of FQP are already familiar with syntax (although that represents an estimate of 0.3% of TWiki installs (340 vs 10,000)).
-
Unfamiliar: New syntax to learn for 99.7% of TWiki users. Unlike regex, SQL and DOM, no books to learn from.
-
Nerdy: You need to be a programmer, except for the most basic queries.
I have already pointed out above that "DOM" is not a query language, so reference books do you no good. Every book I have seen on SQL, and every reference I have seen on the web, has to be double and even triple checked, because no two implementations of SQL behave exactly the same. Further, the syntax I have designed is already well aligned with SQL WHERE sytax, and is further powerful enough to act as a general purpose TWiki content access syntax, not just in queries but in general. XPath is the only "standard" contender that can claim the same. CC
Detailed proposal will follow later today.
--
PeterThoeny - 28 May 2007
I am reading and re-reading, but I cannot find where the syntax now stands. Crawford, could you gives use a short list of the most current syntax?
--
ArthurClemens - 28 May 2007
I think the latest doc as currently implemented is at
http://develop.twiki.org/~twiki4/cgi-bin/view/TWiki/QuerySearch
--
PeterThoeny - 28 May 2007
No; that server is not being automatically updated. The latest syntax is only available in that topic in subversion.
Here it is.
X |
refers to the field named X. For example, info, META:TOPICMOVED or attachments. |
X.Y |
refers to the subfield Y of the field named X. For example info.date, moved.by, META:TOPICPARENT.name |
X[query] |
refers to all the elements of the array X that match query - for example, attachments[size>1024]. query can be as complex as you like; for example, DocumentForm[name!='Summary' AND value='top secret']. |
X/Y |
refers to the field the field specified by Y from the topic specified by the value of X. X must evaluate to a topic name, and Y to a query to be applied to that topic. For example, parent.name/(form.name='ExampleForm') will evaluate to true if (1) the topic has a parent, (2) the parent topic has the main form type ExampleForm. |
Examples:
-
attachments[name='purdey.gif' AND author='%WIKINAME%'] - true if there is an attachment called purdey.gif on the topic, attached by me
-
(Firstname='Emma' OR Firstname='John') AND Lastname='Peel' - - true for 'Emma Peel' and 'John Peel' but not 'Robert Peel' or 'Emma Thompson'
-
HistoryForm.Age > 2 - true if the topic has a HistoryForm, and the form has a field called Age with a value > 2
-
META:PREFERENCE[name='FaveColour' AND value='Tangerine'] - true if the topic has the given preference setting and value
-
Person/(HeadGear~'*Bowler*' AND attachments[name~'*hat.gif' AND date < d2n('2007-01-01')]) - true if the form attached to the topic has a field called Person that has a value that is the name of a topic, and that topic contains the form ClothesForm, with a field called Headgear, and the value of that field contains the string 'Bowler', and the topic also has at least one attachment that has a name matching *hat.gif and a date before 1st Jan 2007. (Phew!)
There has been a suggestion that I am dragging my feet over this syntax. As well as some
comments above I will point out a few more things:
- Wherever alternatives have been presented above, I have responded promptly above, usually on the same day
- I have taken into account feedback from Peter (alignment with %IF, shortcuts, regex issues), Arthur ([WHERE), Michael (access to non-standard meta-data) and Lynnwood (simplify array access syntax) and redesigned and reimplemented the syntax to address their specific concerns on a continuing basis. I believe that actions speak louder than words.
- I have seen no evidence that anyone has actually attempted to use this feature to assess it. So far feedback seems to be based on opinion rather than experience.
- The documentation I have written is rich with examples; I know it can be improved, but I'm trusting to your abilities to understand the full power of the implementation.
The perception of my dragging my feet may come from the fact that I have been very carefully assessing every proposal for change. Changes have to be assessed on the basis that they don't overcomplicate the syntax (creating burdens that will punish us later, such as context sensitivity), and they don't compromise the ability of the language to be used for general content access in a TOM.
Finaly, Peter, I have great difficulty from the above perceiving exactly
what your concerns are. You have made a lot of alternative syntax proposals, but the only specific I can actually find is that the syntax isn't too nerdy. Is that a fair summary of your concerns? If so, then alternative syntax just doesn't cut it; you have to consider how to realign the underlying data model - e.g. to a table structure for
SQL access. Personally when I went down that route I decided it was too confusing for the end user, but perhaps you can see something I missed.
--
CrawfordCurrie - 28 May 2007
Here is an updated proposal that closely aligns itself on a
DOM, e.g. can be extended later to be our TOM. It is aligned as much as possible with Crawford's proposal, but avoids its complexity. It also allows us to extend and use the TOM in a flexible way, such as an auto-complete feature in a souped up
WYSIWYG editor.
Here is a quick primer on the
JavaScript DOM, preparing us for the TQL that works on the TOM.
- JavaScript DOM level 1 examples:
-
document.forms[1].elements[4] - access the fifth element of the second form.
-
document.address.zipcode - equivalent to above, assuming the second form is called "address" and the fifth element has name "zipcode".
-
document.forms.length - number of forms in document.
-
document.forms[1].length - number of elements in the second form.
- Note: DOM level 1 has a limitation where you cannot distinguish tags at the same level that are named the same.
- JavaScript DOM level 2 examples:
-
document.forms['noteForm'].total.value - get the value of the field named 'total' in form named 'noteForm'
-
document.getElementsByTagName('p').item(4) - get the fifth paragraph
-
document.getElementsByTagName('p')[4] - (equivalent to above)
-
document.getElementById('example1').style.color = 'blue' - get an element by ID and set the color to blue
-
document.getChildNodes().item(0).getChildNodes().item(1) - get first child node, and within that the second child node
-
document.childNodes.item(0).childNodes.item(1) - (equivalent to above)
-
document.firstChild.firstChild.nextSibling - (equivalent to above)
Spec of TWiki Query Language (TQL):
- the basic format is a
topic.x.y.z construct, which represents the path through a hierarchy of items.
- the top of the hierarchy is the topic.
- an item can be a:
- named field specifier (such as
form)
- named item by context (such as
FeatureProposalForm)
- named items can be placed in single quotes if they contain spaces (such as
'Postal Code').
- some items are arrays, such as the list of attachments or form fields.
-
X[N] will get the Nth element of an array field. N starts at 0. For example, attachments[4] will get the fifth attachment in the list.
-
X['name'] will get the named element of an array field.
- the TOM is based on MVC (model-view-controller), e.g. you can have multiple views into the data. Assuming a future multiple-forms-per-topic spec,
forms[0].formfields[2], form.City and EmployeeForm.City may all point to the same form field.
- multiple views are only allowed for queries, not for document construction (later phase of TOM)
- TOM hierarchy:
-
topic - top of hierarchy
-
name - name of topic
-
parent - TOPICPARENT
-
info - TOPICINFO
-
author
-
date
-
format
-
version
-
text - main body of topic
- Note: TOM below this point TBD.
-
form - FORM - the main form of the topic
-
name - equivalent to $formname()= in formatted search
-
formfields - array of elements
-
length - number of form fields in the form
- each element has:
-
name - name of form filed, such as 'City'
-
title - title of form filed, such as 'C i t y'
-
value - value of form filed, such as 'San Francisco'
-
attachments - FILEATTACHMENT - array of elements
-
length - number of attachments
- each element has:
-
name
-
attr
-
comment
-
path
-
size
-
user
-
rev
-
date
-
comment
-
moved - TOPICMOVED
-
meta - user defined meta data
-
example - META:EXAMPLE
-
name - META:EXAMPLE{ name="..." }
-
value - META:EXAMPLE{ value="..." }
-
etc - META:EXAMPLE{ etc="..." }
- shortcuts:
- in TQL, the leading
topic. can be omitted, e.g. parent.name is identical to topic.parent.name
- for array items, if there are no more qualifiers specified, the "value" parameter is returned, e.g.
form.City is identical to topic.form.City.value
- constrcuts such as
topic.x.y.z can be combined with operators:
-
= - Left-hand side (LHS) exactly matches the value on the Right-hand side (RHS). Numbers and strings can be compared.
-
!= - Inverse of =.
-
~ - Regular expression match. Use perl regular expressions.
-
< - LHS is less that RHS. If both sides are numbers, the order is numeric. Otherwise it is alphabetic (applies to all comparison operators)
-
> - >
-
>= - greater than or equal
-
<= - less than or equal
-
lc(x) - Converts x to lower case, Use for caseless comparisons.
-
uc(x) - Converts x to UPPER CASE. Use for caseless comparisons.
-
NOT - Invert the result of the subquery
-
AND - Combine two subqueries
-
OR - Combine two subqueries
-
() - Bracketed subquery
Examples:
-
topic.form.formfields[0].name - name of first form field
-
topic.form.formfields['City'].value - value of form field named "City"
-
topic.form.City.value - (same as above)
-
form.City - (same as above)
-
topic.AccountForm.City.value - starting at the top, find an element named "AccountForm" (e.g. form name), and get the value of element named "City" in it (e.g. form field)
-
topic.parent.form.City.value - in parent topic, value of form field named "City"
-
topic.form.Person.value.topic.ClothesForm.Headgear ~ 'Bowler Hat' - true if the form attached to the topic has a field called Person that has a value that is the name of a topic, and that topic contains the form ClothesForm, with a field called Headgear, which has a value that contains the string 'Bowler Hat'
-
(topic.PersonForm.Firstname.value='Emma' OR topic.PersonForm.Firstname.value='John') AND topic.PersonForm.Lastname.value='Peel' - true for 'Emma Peel' and 'John Peel' but not 'Robert Peel' or 'Emma Thompson'
-
(form.Firstname='Emma' OR form.Firstname='John') AND form.Lastname='Peel' - shortcut form of the previous query
-
HistoryForm.Age.value>2 - true if the topic has a HistoryForm, and the form has a field called Age with a value > 2
-
attachments['purdey.gif'] - true if there is an attachment called purdey.gif on the topic
It took me some time to come up with this analysis and proposal. I believe it paves the way to have a solid TOM at a later release. I would like to have one and the same syntax for a SEARCH with TQL and for the
ContentAccessSyntax, e.g.
I'd like to avoid the need to create yet another syntax once we implement the CAS. I hope this updated proposal gets some consideration.
Not all questions are solved though with this proposal:
- For a more crisp syntax, should level 1 DOM be abandoned? That is, disallow implicit
topic.AccountForm.City.value? Possibly require an explicit topic.forms['AccountForm'].formfields['City']?
- For a more crisp syntax, should the shortcut to omit the
topic. prefix be abandoned? E.g. always require a topic.something construct?
--
PeterThoeny - 29 May 2007
Well, I guess this discussion isn't leading to anywhere unless something of it is actually implemented. As far as I understand it, Crawford has implemented his proposal already, so he is most qualified do define it.
--
FranzJosefGigler - 29 May 2007
This is one of the single most important feature we add to TWiki in a long time. Actually since 4.0.0
It must a feature which is defined with community consensus and with all aspects accounted for. Both the simple use cases and the advanced.
And once the troll is out of the box - we are stuck with it - so we'd better get it right.
You do not have to implement the code to get the syntax defined. The fact that code has been implemented should not influence the decision of something this important.
And if you really look at the two proposals then they are not really that far apart which is also why I have been pushing hard on both Crawford and Peter. Crawford for not seriously taken Peter's proposal into account and for not being present at release meeting, and Peter for being too slow to follow up (waiting a full week is not promoting a quick resolution).
It must be possible to get a consensus on this one when - in reality - the two views are not that far apart. We all agree on the feature and its function. The only open issue is the syntax.
And additional view is - there has to be a simple and easy to use short cut syntax. Normal people knows nothing about
SQL and nothing about
DOMs and TOMs. It is voodoo talk to them and the geeky syntax proposed is totally away from what people can cope with. So this is why I keep on insisting that Crawfords original shortcut syntax (or something similar) for simple formfield value queries must be present. Peter suggests a syntax that also I find more logical in the advanced usecase but it still leaves too much
NerdoMeter score for the simple case. To normal users a syntax that requires some dot OO-type syntax is very foreign. Normal people do not even know simple
HTML so forget the idea that they know object oriented syntax or
SQL syntax.
But our syntax ALSO needs to care for the the advanced use cases where geeks that are programmers can implemented advanced Twiki applications. And here I will not choose side but just urge Peter and Crawford to get to a good compromize. I would really like to avoid a vote on this one and get Peter and Crawford to agree. Maybe meet on IRC informally in very near future and discuss things through.
And the already implemented code should not influence the decision.
I have asked two of my super users to review the proposals above and give input also so that it is not just my little single opinion that is heard all the time.
Finally - where are the oppinions of the 3 other customer advocates here?
--
KennethLavrsen - 29 May 2007
OK, I have studied Peter's proposal above, and here's what is different:
- He wants implicit associative array indexing. Thus
attachments['fred'] is synonymous with attachments[name='fred']
- I have no objection to this approach, apart from the obvious fact that it is a shortcut and more complex to implement.
- He wants the object model used in queries (and by implication content access) to use a different object model to that encoded in TWiki topics at present; viz, moving formfields under the form hash.
- This is how the DBCacheContrib has always worked (indeed the proposed object model is almost identical to DBCacheContrib's). The reason I stopped doing it this way is the complexity inherent in having to search a TWiki::Meta object in a way for which it is not designed.
- I would like to hear from others on this point, because it significantly complicates the code, as well is breaking the link between what is stored in meta-data, and how it is referenced.
- He has re-introduced regular expressions, which I removed in the interests of compatibility with DBI. I suspect that was not intentional.
- He has dropped the d2n operator. I suspect this was an error of omission, rather than deliberate, as dates cannot be queried without it, as forms carry no type information.
- He wants to equivalence the
., / and [-with-atom operators, and replace them all with dots (trust me on this, Peter, that is the implication of your proposal).
- The DBCacheContrib works this way. The reason I dropped it is it forces the syntax to be context-sensitive; it means you cannot decouple it from the context it is evaluated in. My goal was a context-free syntax, so that queries could be parsed and cached without needing any context information. This significantly improves evaluation performance, and simplifies query transformation e.g. to SQL.
It's actually rather ironic that I started from the
DBCacheContrib syntax, and moved towards a syntax that is more representative of the current design of TWiki topics, precisely because I anticipated resistance to anything coming from the DBCacheContrib/FormQueryPlugin/DBCachePlugin stable. And now here I am being told that my original design was more user-friendly. Perhaps we should just dump this work and ship the
DBCachePlugin instead.
BTW a warning; the
DOM is not a particularly good document model, and we need to be careful not to use it as a beacon. TWiki requires a far more flexible model. This is because of the interleaving of structural element inherent in TWiki topics - a problem the
DOM doesn't have to deal with, because it is a string hierarchy.
--
CrawfordCurrie - 29 May 2007

(depressed and demotivated)
A question: How do I test that a topic has an attached form of name "MyForm"?
Besides that, and the use of
/ to denote indirection to another topic (but I can't think a better alternative besides . and ::) I find Crawford proposed syntax easier to work with, as it has less rules, less tokens and is more concise.
The less rules you need to remember, the less you need to type and the less you need to parse to understand an expression, the more productive you became and it takes less time to become more productive.
--
RafaelAlvarez - 29 May 2007
I have the impression that opinions on the 'geeky factor' are solely based on gut feelings, so I am very interested in the responses of Kenneth's users. Not to forget that Crawford has also been working with real users.
I do have a number of years experience with Javascript and ActionScript (both ECMA). It is not great for queries. That is why xpath has now been integrated in ActionScript 3.
DOM syntax looks easier for single values, but it often gets very lengthy to the point of illegibility - each path has to be repeated. That is why Peter suggests to simplify
(topic.PersonForm.Firstname.value='Emma' OR topic.PersonForm.Firstname.value='John') AND topic.PersonForm.Lastname.value='Peel'
to
(form.Firstname='Emma' OR form.Firstname='John') AND form.Lastname='Peel'
which is essentially Crawford's
(Firstname='Emma' OR Firstname='John') AND Lastname='Peel'
More important, queries often are more complex because you want to get a
range of values. With
DOM syntax you will need to loop and store temporary variables to get that.
For example if I want all uploaded attachments from this week (note that I want to have the attachments, not just TRUE).
As I understand from the suggested shortcut syntax for
name, is it right that
DocumentForm[name!='Summary' AND value='top secret']
would be shortened to
DocumentForm['Summary' AND value='top secret']
?
--
ArthurClemens - 29 May 2007
I am sorry, but I am retracting all my proposals. My proposals (which I modifed a number of times based on feedback) were well intended, with stated goals. Instead of a constructive dialog to iron out a solid spec I see "he wants", "I anticipated resistance to anything coming from..." etc. I do not "want" things my way, I am trying to help define a solid spec that works for all of us in the
TWikiCommunity. Community spirit is more important to me, hence I am retracting all my proposals on this topic. Even if it means that we might miss details that will come after us later on, such as limitation in
WYSIWYG implementation, or one TQL for SEARCH and a similar but not identical one for
ContentAccessSyntax.)
I am refraining from furthur feedback, please consider Crawford's syntax a "consensus reached" proposal once other members give the OK.
--
PeterThoeny - 29 May 2007
Rafael; you just say "MyForm". The reference to the form returns an array of the fields, which is interpreted as a "true" value in the context of a query. Similarly, if you want to know if there is
any form attached to the topic, then simply saying "form" should suffice.
Peter, many thanks for all your input on this topic; you are the only person to put forward really constructive alternatives, and I have unashamedly worked your input into the syntax and semantics. The query engine was designed from day one as a general purpose content-access engine, so I am comfortable with its use in that context.
--
CrawfordCurrie - 30 May 2007
Hmmm... While I normally don't comment on topics such as this one (I don't feel qualified), I'm a bit disappointed to see that Peter's proposal has been withdrawn, and feel that I'm a bit too late with this commentary. This is an interesting topic, and as Kenneth said earlier,
This is one of the single most important features we add to TWiki in a long time. Actually since 4.0.0. It must a feature which is defined with community consensus and with all aspects accounted for. Both the simple use cases and the advanced.
And to be fair, I haven't witnessed this much static since Meredith left the project, coming up on a year ago. Thankfully, things have been very collaborative and rolling along smoothly since then !!
So what that means, to me, is that the only reason why this kind of sensitive debate could enter into our happy world is because the subject being discussed truly is important. And in that case, the debate is good. It is healthy, and we should not be disappointed or upset if talking things through is difficult or if it takes a long time. Again, as Kenneth said earlier,
once the troll is out of the box - we are stuck with it - so we'd better get it right.
So anyway, now for my comment which appears to be just a tad too late (but I hope it is still useful): While I can't speak to the intricacies of the code, I do see the value in the syntax that Peter has proposed. Surely, it may land higher on the
NerdOMeter, but sometimes you have to do that in order to accomplish the underlying architecture that is needed to build a
framework
Having read a very thick book on Javascript, intended for programmers, a very long time ago, I could immediately identify with the TOM hierarchy that Peter proposed. And I could see its usefulness & validity. The only strike against it: difficult for end users.
So I'd like to ask the best question to ask when choosing between difficult choices: is it possible to have
BOTH ? Can we somehow get both the benefits of Crawfords simple syntax
and Peter's well thought-out TOM ?
Remember, we're building for the long term here. And once the troll is out of the box - we are stuck with it - so we do want to get it right.
--
KeithHelfrich - 01 Jun 2007
I am now, by implication, and without any evidence, being told that my design is not well thought out, and being painted as the "bad guy" in this discussion; something I find extremely insulting, and cannot leave unanswered.
I have tried hard throughout the discussion above to quickly address each and every one of the feedbacks, and integrated several of them into the design, as I noted above. The final work is not mine alone, it is a product of all the people who contributed here. But I
did contribute the bulk of the design, and
all of the implementation.
Let me repeat, again. My proposal was designed
from day one to be a
ContentAccessSyntax in context of a
TopicObjectModel - for goodness sakes, I
invented the first full
TopicObjectModel, the
DBCacheContrib. As I noted above, Peter (unconciously, I am sure) recycled much of my own previous implementations back at me, and (again unconciously, I am sure) re-made some of the same mistakes I had made in that first pass. Not his fault - it's extremely difficult to think through all the aspects of this problem in isolation from an implementation. That's the main reason I implemented the first proposal so fast - so people could gain
experience with it.
Of course there is a tremendous danger in defining a
ContentAccessSyntax in isolation, without also considering the
TopicObjectModel. I have tried very hard to map onto the
existing data model, in the interests of maintaining reverse compatibility, while taking into account all the experience gained from DBCacheContrib and the various attempts at sectional editing. Most recently my own
EditRowPlugin has a generic object model for tables.
Please, let us move this discussion on. I would be
delighted to receive constructive feedback on the type="query" implementation, based on experience - indeed, it is very much in your interests to find and point out any holes
before it goes into production. We have a grace period before 4.2, during which we can revert my code at any point. At the same time, let us bring the
TopicObjectModel back into focus, and concentrate our energies into execution in that domain.
--
CrawfordCurrie - 01 Jun 2007
Crawford. Look at the bright side.
You have suggested and implemented the
most important feature since 4.0.0. That is a clean cut complement. Just a month ago some where concerned if we had enough meat for a 4.2.0. Now we have!
Keith: I would strongly advice not to implement two parallel syntaxes. Peter's and Crawford's (the advanced case) syntaxes are not that far apart as this discussion may indicate. Having two parallel syntaxes will
- confuse the hell out of users
- make it a nightmare to maintain the code in future
- make it a nightmare to get proper test coverage
Some like the mother and some the daughter. Now that Peter has withdrawn his proposal I encourage people to carefully study the documentation of the new feature that Crawford has checked in and which has improved a lot since the first proposal.
Since the bugs web is currently not updating itself from
SVN and not all of you are developers with an
SVN checkout - you can see the current
SVN version of the new documentation that describes this new feature at
http://merlin.lavrsen.dk/twiki/bin/view/TWiki/QuerySearch
. To try and convert towards a final and agreed solution please propose any additional ideas as modifications/improvements to this.
You are also welcome to register at http://merlin.lavrsen.dk/twiki/bin/view/Main/WebHome
and play all you want in any web you want. Then you can play with the feature. There are plenty of topics in the Motion and PWC webs that you can use for searches. This server is there for testing and all the webs inside are available for playing. It updates every half past the hour where for a few minutes you will see error message. Then just wait two minutes and try again.
--
KennethLavrsen - 01 Jun 2007
The type="query" search tested in practical.
And I went for the simple use case first. I like the simple syntax. Much more logical than doing the same with the regex meta hack we have been used to.
The test I am doing is replacing some formatted searches where I only search for one value in one field.
But the new search has a severe performance problem. I have not timed it very accurately. Just counting the seconds. My meta regex searches take round 7 seconds. The tql search takes 12 seconds. Almost double up. That needs to get optimized. There is no good reason why such a simple search should be half the speed.
Take a look for yourself.
I tested with the crond stopped. I repeated tests at least 10 times for each page. Back and forth. There is no doubt. The new search has a performance problem which we need to resolve before we can release. Opened a bug report on this
Bugs:Item4178
.
Let me know if I can help with analysis or debug.
--
KennethLavrsen - 01 Jun 2007
"There is no good reason why such a simple search should be half the speed" - actually, there is. I spent all the time I had planned to spend on optimisation answering alternative syntax proposals, as it was clear that getting the syntax right was a greater issue than performance of a (potentially broken) implementation. At the moment the implementation is based on a linear, brute-force search through all the topics. I did some experiments with using RE's to accelerate it, but decided that RE optimisation wasn't practical in the time I had available.
There are three possible approaches I have been considering:
- "Winnowing" the set of topics that have to be brute-force searched by extracting and applying static RE's derived from the query
- A DBCache based search engine (raises the requirements on the install, but up to 10X faster than an RE search)
- A DBI store, which can be searched using SQL
I'd be delighted if someone was interested in implementing one of these alternatives, so I could address another.
--
CrawfordCurrie - 02 Jun 2007
I am one of the SEARCH "super users" that
KennethLavrsen asked to look at this discussion and provide some feed-back to - he said it was a 10 minute review thing - it took me 30 minutes just to read the discussion... Thanks, pal!
Even though I do advanced things with SEARCH I find that the most important aspect is to make the non-advanced aspects of SEARCH as accessible as possible. This makes me rather indifferent to the advanced aspects discussed above - in most cases in real-life very little is required to solve the needs so in my view the short-cut syntax plus basic operations should be the focus. In my naïve world that can be solved with either of the advanced models behind it.
Having said that I think that the full-blown TOM that Peter describes scares me - it might very well be what should be the advanced side of this, but then it must be carefully hidden using appropriate short-cut notation otherwise the new-comers to SEARCH will barf over the complexity and most likely have a hard time getting a query to work since there would be so many places to misspell a name that debugging a SEARCH would become a nightmare.
I have two examples that I have banged my head against numerous times and as far as my superficial reading of the discussion the first one should be come easy whereas the second one is a bit more unclear to me.
- Excluding topics from a search should be easier. Even so often I run across a case where I would like to have all topics with a specific form attached to it, but excluding the ones where one of the fields holds just one of the many possible values.
- Nested searches should be easier to control. I think that the FormQueryPlugin has some ability to store a search result and use that result set as starting point for the nested search - this is something that I need over and over again, not for the re-use of the top search but to make debugging and decomposition of searches comprehendable. (Could be my math-heavy background that thinks in set theory, but that kind of narrowing works for my brain.) Related is the case where you do a top level search and only wants something displayed if there is a hit on the nested level - this is not easy for me to control with FormattedSearch today, especially when it comes to results you want to put into a table.
A final thing: becoming more database like when it comes to searching is a good thing in the many cases where you have added some structure in terms of forms to support something, but there are other types of structure that should be easier to get access to as well.
My example for this is a problem that I have faced in a number of incarnations (could be due to low search skills), but I will only describe the most recent one: I wanted to send an email to all the members of a TWiki group. I know that there is core functionality to do this that can be wrapped in a plugin, which Kenneth and I discovered a bug in Friday 1 Jun 2007, but that is somewhat besides the point I am trying to make...
The point is that the names of the members of a group are listed in a structured manner after the = sign, namely separated by commas. My naïve approach to this was to search for all names after the equal sign, but since you can only get one hit per line in the topic it is impossible to get the names captured one by one.
That sort of structure is used in many places and it is a tremendous pain not to be able to use it in a straightforward manner.
(BTW: my solution to the above problem was to search through all users, then see if the their name was on the group topic I was interested in and then extract their mail address from their topic. This is slow. And counter-intuitive!)
Just some thoughts from the trenches - do not know if they are of use in the current debate.
Final remark: I was not too happy after reading the discussion since I found it to stray away from the problem and focussing on persons instead - no one mentioned, no one forgotten. Try to avoid that so that you do not end up destroying one of the most powerful tools I have ever come across.
--
TorbenHoffmann - 03 Jun 2007
Thank you Torben for your feedback. Good points. I stated that I do not comment any longer on this topic, but since you are new to this I will. I removed the overly complex construct from my latest proposal. The only reason why it was there is to make it compatible with Crawfords complex syntax (my personal preference is KISS and extensible.)
One more thing: We
still do not have a spec on this topic. At the beginning of this topic we have "Spec of TQL is TBD." The only way to have a meaningful spec discussion is to be able to work on one, collaboratively. A lot of the confusion could be avoided if the spec discussion happens in Codev with the latest spec proposal always clearly visible & updated on top. I have seen the same issue in other proposals: No clearly documented spec, but ever changing code that is supposted to be the spec. What can we do to improve this?
--
PeterThoeny - 04 Jun 2007
Anyone wishing to review the spec of the implemented query language may do so at
http://merlin.lavrsen.dk/twiki/bin/view/TWiki/QuerySearch
or by reading that same topic in any other MAIN checkout. I avoided duplicating the same documentation here for fear of them getting out of step.
Torben, thanks for the comments. Just to pick up on a couple of things:
- I have tried to make sure the query parser is as generous as possible with reasons why a parse failed, but it's tricky to provide info on why a search doesn't match what you expected.
- I have tools to debug that, but it's not clear how to present that information. Any ideas would be welcome.
- What would be of real value would be a SEARCH wizard - a UI component that helped you generate SEARCHes (and queries, or course) interactively.
- Nested searches should be easier to control - yes. You can nest queries at the moment by embedding a %SEARCH in another %SEARCH, but as you are probably aware there are some limitations with that (the syntax of nested searches is hard work, and the nesting results in two searches with no easy way to optimise). One idea is to name searches, such that they can be used nested in later queries.
- One idea for your "users" problem is to use the FilterPlugin to post-process search results and process lists in the search results.
- You can exclude topics from searches using the
excludetopic parameter to %SEARCH. Using a query it's very simple to do the exclusion you describe above. Even so often I run across a case where I would like to have all topics with a specific form attached to it, but excluding the ones where one of the fields holds just one of the many possible values - %SEARCH{"form.name='MyForm' AND NOT TheFieldName='The Value To Exclude'" type="query"}%
--
CrawfordCurrie - 04 Jun 2007
Does
d2n work with time intervals (as from
TimeSpecifications)?
--
ArthurClemens - 04 Jun 2007
good question.... TBH, I'm not sure. it just feeds the string to
parseTime, so if that function supprts intervals, then so does
d2n
--
CrawfordCurrie - 04 Jun 2007
My feedback in
FreetownReleaseMeeting2007x06x04:
- Disambiguate: Do not support shortcut
Firstname='Emma'; require a qualifier form.Firstname='Emma' so that auto-complete and wizzard can be built.
- KISS: Remove the internal
META:KEY and use only key, e.g. instead of supporting META:TOPICINFO.date and info.date, support only info.date
- KISS: Support query of any custom meta, such as
%META:DANCE{ name="Tango" ...}% with a meta.dance.name='Tango' syntax
- KISS, beta feedback: Mark the
X[query] and X/Y syntax as experimental during beta, with note that they might get removed in 4.2 release. Undocument or remove feature if too complex or if too limiting for future ContentAccessSyntax.
--
PeterThoeny - 04 Jun 2007
My feedback short:
- shortcut
Firstname='Emma' is essential to having a non geek search for normal people and I am willing to pay the price for the possible extra work that can create for a search builder. Many others agree with this view.
- The internal key feature enable searching for meta that is not known. Let is keep it. BUT change the documentation so that the human names are the normal case and then at the end document the genetic feature with example of both standard meta and special meta. Clean doc work but makes a heck of a difference to the users.
The decisions we agreed at
FreetownReleaseMeeting2007x06x04
- Present current syntax documented but with the challenged syntax marked as experimental and subject to be altered. At the beta we can question the users listen to alternative suggestions.
- Make a customer focused decision based on beta testing feedback. Customers should hopefully have a great importance to shape our oppinions and final decision.
- Give Crawford peace to focus on performance of the new search the next month.
We have a plan. We have consensus.
--
KennethLavrsen - 05 Jun 2007
This feature has been realized in TWiki 4.2.
I have started a cookbook topic in
QuerySearchPatternCookbook.
--
ArthurClemens - 05 Feb 2008