[xquery-talk] Regular Expression search

Martin Probst martin at x-hive.com
Fri Dec 16 11:05:37 PST 2005


> > Are you sure? It's probably possible for simple cases (e.g. "Foo|Bar"),
> > but for general regular expressions? How would you do that?
> 
> Sleepycat's Berkeley DB XML has what it calls a "substring" index which 
> is currently used to optimise fn:contains(), fn:starts-with() and 
> fn:ends-with(). This works by splitting the content down into sequential 
> three character segments, ie:
> 
> "abccccb" is split into "abc", "bcc", "ccc", "ccb"
> 
> This type of index could be used to optimise regular expression. If you 
> define a regular expression to match the string above, it might look 
> like this:
> 
> "abc+b"
> 
>  From this regular expression, you can see that the keys you need to 
> look up in the container are:
> 
> "abc" & ("bcb" | ("bcc" & "ccb") | ("bcc" & "ccc" & "ccb"))

Interesting. Though of course the real general case for Regular
Expressions is probably just out of reach.

> > Apart from that, if you need regular expressions to search your XML,
> > there's probably a major problem with your XML design ;-)
> 
> Search and querying are very different. Search is basically for 
> document-centric XML (like XHTML), where as querying is for data-centric 
> XML (like invoices, etc). If you're using regular expressions for 
> data-centric XML, then I'd say you have a design flaw - but not if you 
> are using them for document-centric XML.

Yes, that's right. But if your "searching" in the document way, then
what you want is probably a full text index operating on tokens (as
Jason posted) rather than a regular expression specification of the
content. I use to think of regular expressions as structured search for
non structured content ;-)

Martin



More information about the talk mailing list