[xquery-talk] contains(), matches(), highlight-matches()

Joe Wicentowski joewiz at gmail.com
Sat Jul 6 07:58:36 PDT 2013


Hi Christian,

> this is a nice idea for using higher-order functions! A minor remark:
> In the last example in the blog article, it may be advantageous to
> remove the predicate (otherwise, the highlighted texts will be lost as
> soon as the predicate has been processed).

Thank you for pointing this out.  That's what happens when you write a
blog post during the wee hours of the morning!  I've fixed the example
in the blog article.

In the process of fixing the example I realized a subtle limitation of
the highlight-matches() function.  Whereas the boolean matches()
function filters results - returning book elements whose title
children meet the matches condition in the predicate - the
highlight-matches() function creates in-memory copies of the title
elements and returns those, not the parent book element.  Returning
the parent book element with highlighted title children would require
more code, but it would definitely be a larger operation.  Still, for
the purposes of the example, I'll keep the short version (for now).

> A little excursion: It may come to no surprise that one of the most
> requested features for XQuery Full Text 3.0 is the possibility to
> highlight matches [1], and I believe it would be a good idea to
> introduce higher-order functions to make highlighting more flexible.

The demand definitely makes sense to me, and it's great to hear this
may be part of the final 3.0 spec.  At the same time, the
sophistication of the Full Text facility imposes a pretty hefty set of
requirements on implementations.  So I'm glad there's a way to achieve
basic highlighting with just "plain old" XQuery 3.0 - even if it's
limited to regex matching and doesn't have the full set of linguistic
features at its disposal.

> Qizx also provides such a highlighting function with an optional
> function argument (look for "word-function" [2]). The syntax differ a
> little it was developed before HOF were finalized. BaseX also has a
> function for highlighting results, but it allows uses no function as
> argument [3].

Very cool.  Speaking of implementation-specific features, eXist-db has
match highlighting, but it's limited to results of index-augmented
queries (i.e., you have to already have applied a full text, range, or
ngram index on a qname in the given collection for the highlighting to
work).  It performs very well thanks to the indexes, but requires
forethought and/or understanding of indexing that isn't really within
reach of the student who only understands contains() and matches().  I
recall MarkLogic has a cts:highlights() function, but again,
implementation-specific.

>> I'm curious: Was a function like this impossible to write in pure
>> XQuery before 3.0's support for fn:analyze-string() and higher order
>> functions?
>
> I can’t think of any straightforward solution without analyze-string()
> (we could write custom string match functions in XQuery, using
> recursive functions and all kinds of XQuery string functions, but the
> resulting solution may get too slow for large text corpora).
> Higher-order functions are probably not mandatory, but they clearly
> help to make the solution more elegant.

This is helpful to know.  Thank you so much for your feedback.

Joe



More information about the talk mailing list