[xquery-talk] contains(), matches(), highlight-matches()

Michael Sokolov msokolov at safaribooksonline.com
Sat Jul 6 20:47:18 PDT 2013


On 7/6/13 10:58 AM, Joe Wicentowski wrote:
>    Speaking of implementation-specific features, eXist-db has
> match highlighting, but it's limited to results of index-augmented
> queries (i.e., you have to already have applied a full text, range, or
> ngram index on a qname in the given collection for the highlighting to
> work).  It performs very well thanks to the indexes, but requires
> forethought and/or understanding of indexing that isn't really within
> reach of the student who only understands contains() and matches().  I
> recall MarkLogic has a cts:highlights() function, but again,
> implementation-specific.
>

Nice post, Joe!

It's good to have a pure XQuery highlighting function.  You always need 
to be able to "explain" to users with highlighting why their search 
matched some result.

I've implemented a highlighting function in Lux (see 
http://luxdb.org/apidocs/lux/functions/Highlight.html) that provides 
highlighting for Lucene query matches, as a Java extension function.  I 
considered providing a functional replacement capability as you have, 
but working strictly in the context of XQuery 1.0 makes that impossible, 
and I decided not to try extending the language for this purpose (as 
Marklogic did: their highlight method is not strictly a "function" - it 
evaluates special variables in its replacement argument in the context 
of the highlighting operation rather than in the enclosing scope).

 From a performance standpoint, early termination is a very useful 
optimization, especially for highlighting, which can often dominate the 
running time of search result pages.  I've found that highlighting large 
documents with potentially many matches can be very slow and unnecessary 
when it's typical to show only the first such match to a user in a 
snippet.  I wonder if you've considered some mechanism for truncating 
processing for large results?  I haven't, but it's something I'm 
thinking about.

Cheers

-Mike Sokolov


More information about the talk mailing list