[xquery-talk] Regular Expression search
Martin Probst
martin at x-hive.com
Fri Dec 16 11:00:26 PST 2005
Hi Jason,
that sounds precisely like what we do. Probably just the obvious way to
do it, I guess ;-)
Do you actually have plans to implement the XQuery FTS spec? I wonder if
that will actually take off and be used by, uh, users.
Martin
> Hey John,
>
> MarkLogic actually uses indexes for wildcard queries. For example, the
> original poster's questions about finding things starting with
> "MyNameIs" could be solved efficiently using a query like this:
>
> //(subTagA|subTagB)[starts-with(., "MyNameIs")]
>
> That should execute efficiently against a large data set if the
> character indexes are enabled. If the poster instead wanted any word
> token to start with that sequence of characters (rather than the element
> itself), he could use the MarkLogic function cts:contains() and the *
> wildcard:
>
> //(subTagA|subTagB)[cts:contains(., "MyNameIs*")]
>
> The cts:* functions operate on tokens rather than simple character
> sequences, providing search engine style features. You can see the
> difference in the previously discussed query to find the token "Name".
> Using standard XQuery you write this:
>
> //*[contains(., "Name")]
>
> But this matches "xName" and "Nameste". When I search for "foo" I don't
> want to find "food"! Using cts:contains() you match just word tokens:
>
> //*[cts:contains(., "Name")]
>
> The tokens are broken at index time according to language rules, and you
> have the option at query time to specify stemming rules (should Names
> and Naming match?), case sensitivity (is "name" ok?), thesaurus (what
> about "nom de plume"?), and so on.
>
> It's fun stuff. I wrote about this in longer form at:
> http://idealliance.org/proceedings/xtech05/papers/02-04-01/
>
> -jh-
> _______________________________________________
> talk at xquery.com
> http://xquery.com/mailman/listinfo/talk
>
More information about the talk
mailing list