[xquery-talk] contains(), matches(), highlight-matches()
Michael Sokolov
msokolov at safaribooksonline.com
Sat Jul 6 20:47:18 PDT 2013
On 7/6/13 10:58 AM, Joe Wicentowski wrote:
> Speaking of implementation-specific features, eXist-db has
> match highlighting, but it's limited to results of index-augmented
> queries (i.e., you have to already have applied a full text, range, or
> ngram index on a qname in the given collection for the highlighting to
> work). It performs very well thanks to the indexes, but requires
> forethought and/or understanding of indexing that isn't really within
> reach of the student who only understands contains() and matches(). I
> recall MarkLogic has a cts:highlights() function, but again,
> implementation-specific.
>
Nice post, Joe!
It's good to have a pure XQuery highlighting function. You always need
to be able to "explain" to users with highlighting why their search
matched some result.
I've implemented a highlighting function in Lux (see
http://luxdb.org/apidocs/lux/functions/Highlight.html) that provides
highlighting for Lucene query matches, as a Java extension function. I
considered providing a functional replacement capability as you have,
but working strictly in the context of XQuery 1.0 makes that impossible,
and I decided not to try extending the language for this purpose (as
Marklogic did: their highlight method is not strictly a "function" - it
evaluates special variables in its replacement argument in the context
of the highlighting operation rather than in the enclosing scope).
From a performance standpoint, early termination is a very useful
optimization, especially for highlighting, which can often dominate the
running time of search result pages. I've found that highlighting large
documents with potentially many matches can be very slow and unnecessary
when it's typical to show only the first such match to a user in a
snippet. I wonder if you've considered some mechanism for truncating
processing for large results? I haven't, but it's something I'm
thinking about.
Cheers
-Mike Sokolov
More information about the talk
mailing list