[xquery-talk] Regular Expression search

Fri Dec 16 11:00:26 PST 2005

Hi Jason,

that sounds precisely like what we do. Probably just the obvious way to
do it, I guess ;-)

Do you actually have plans to implement the XQuery FTS spec? I wonder if
that will actually take off and be used by, uh, users.

Martin

> Hey John,
> 
> MarkLogic actually uses indexes for wildcard queries.  For example, the 
> original poster's questions about finding things starting with 
> "MyNameIs" could be solved efficiently using a query like this:
> 
> //(subTagA|subTagB)[starts-with(., "MyNameIs")]
> 
> That should execute efficiently against a large data set if the 
> character indexes are enabled.  If the poster instead wanted any word 
> token to start with that sequence of characters (rather than the element 
> itself), he could use the MarkLogic function cts:contains() and the * 
> wildcard:
> 
> //(subTagA|subTagB)[cts:contains(., "MyNameIs*")]
> 
> The cts:* functions operate on tokens rather than simple character 
> sequences, providing search engine style features.  You can see the 
> difference in the previously discussed query to find the token "Name". 
> Using standard XQuery you write this:
> 
> //*[contains(., "Name")]
> 
> But this matches "xName" and "Nameste".  When I search for "foo" I don't 
> want to find "food"!  Using cts:contains() you match just word tokens:
> 
> //*[cts:contains(., "Name")]
> 
> The tokens are broken at index time according to language rules, and you 
> have the option at query time to specify stemming rules (should Names 
> and Naming match?), case sensitivity (is "name" ok?), thesaurus (what 
> about "nom de plume"?), and so on.
> 
> It's fun stuff.  I wrote about this in longer form at:
> http://idealliance.org/proceedings/xtech05/papers/02-04-01/
> 
> -jh-
> _______________________________________________
> talk at xquery.com
> http://xquery.com/mailman/listinfo/talk
>