From joewiz at gmail.com Fri Jul 5 22:17:01 2013 From: joewiz at gmail.com (Joe Wicentowski) Date: Sat, 6 Jul 2013 01:17:01 -0400 Subject: [xquery-talk] contains(), matches(), highlight-matches() Message-ID: Hi all, Today I posted some code [1] with a function for highlighting regex matches in an XML node. I'd appreciate any comments or improvements to the code (via response here, pull requests, or whatever method is best for you). I also wrote an accompanying post [2] for relative beginners to XQuery. While seasoned programmers probably come up this function on the first day they learn XQuery, I think it's beyond most XQuery-first-programmers like I was when I began. Yet I think the function is so basic that in my post I ventured to call it XQuery's "missing third function" - the first two being contains() and matches(). I'm curious: Was a function like this impossible to write in pure XQuery before 3.0's support for fn:analyze-string() and higher order functions? Surely, if it had been possible, it would've been in the functx library. Thanks in advance for your insights, Joe (@joewiz on twitter) [1] https://gist.github.com/joewiz/5937897 [2] http://joewiz.tumblr.com/post/54729725793/xquerys-missing-third-function From christian.gruen at gmail.com Sat Jul 6 04:04:39 2013 From: christian.gruen at gmail.com (=?ISO-8859-1?Q?Christian_Gr=FCn?=) Date: Sat, 6 Jul 2013 13:04:39 +0200 Subject: [xquery-talk] contains(), matches(), highlight-matches() In-Reply-To: References: Message-ID: Hi Joe, this is a nice idea for using higher-order functions! A minor remark: In the last example in the blog article, it may be advantageous to remove the predicate (otherwise, the highlighted texts will be lost as soon as the predicate has been processed). A little excursion: It may come to no surprise that one of the most requested features for XQuery Full Text 3.0 is the possibility to highlight matches [1], and I believe it would be a good idea to introduce higher-order functions to make highlighting more flexible. Qizx also provides such a highlighting function with an optional function argument (look for "word-function" [2]). The syntax differ a little it was developed before HOF were finalized. BaseX also has a function for highlighting results, but it allows uses no function as argument [3]. Obviously, both extensions are based on XQuery Full Text, and are implementation-defined (and I guess there are various others). One advantage it provides is the support for linguistic features, such as tokenization, the normalization of diacritics, etc. > I'm curious: Was a function like this impossible to write in pure > XQuery before 3.0's support for fn:analyze-string() and higher order > functions? I can?t think of any straightforward solution without analyze-string() (we could write custom string match functions in XQuery, using recursive functions and all kinds of XQuery string functions, but the resulting solution may get too slow for large text corpora). Higher-order functions are probably not mandatory, but they clearly help to make the solution more elegant. Thanks, Christian [1] http://www.w3.org/TR/xpath-full-text-30-requirements-use-cases/#d3e329 [2] http://www.axyana.com/qizxopen/_distrib/docs/manual/fulltext_extensions.html#d0e2303 [3] http://docs.basex.org/wiki/Full-Text_Module#ft:mark ___________________________ > Today I posted some code [1] with a function for highlighting regex > matches in an XML node. I'd appreciate any comments or improvements > to the code (via response here, pull requests, or whatever method is > best for you). > > I also wrote an accompanying post [2] for relative beginners to > XQuery. While seasoned programmers probably come up this function on > the first day they learn XQuery, I think it's beyond most > XQuery-first-programmers like I was when I began. Yet I think the > function is so basic that in my post I ventured to call it XQuery's > "missing third function" - the first two being contains() and > matches(). > > I'm curious: Was a function like this impossible to write in pure > XQuery before 3.0's support for fn:analyze-string() and higher order > functions? Surely, if it had been possible, it would've been in the > functx library. > > Thanks in advance for your insights, > Joe (@joewiz on twitter) > > [1] https://gist.github.com/joewiz/5937897 > > [2] http://joewiz.tumblr.com/post/54729725793/xquerys-missing-third-function > _______________________________________________ > talk at x-query.com > http://x-query.com/mailman/listinfo/talk From joewiz at gmail.com Sat Jul 6 07:58:36 2013 From: joewiz at gmail.com (Joe Wicentowski) Date: Sat, 6 Jul 2013 10:58:36 -0400 Subject: [xquery-talk] contains(), matches(), highlight-matches() In-Reply-To: References: Message-ID: Hi Christian, > this is a nice idea for using higher-order functions! A minor remark: > In the last example in the blog article, it may be advantageous to > remove the predicate (otherwise, the highlighted texts will be lost as > soon as the predicate has been processed). Thank you for pointing this out. That's what happens when you write a blog post during the wee hours of the morning! I've fixed the example in the blog article. In the process of fixing the example I realized a subtle limitation of the highlight-matches() function. Whereas the boolean matches() function filters results - returning book elements whose title children meet the matches condition in the predicate - the highlight-matches() function creates in-memory copies of the title elements and returns those, not the parent book element. Returning the parent book element with highlighted title children would require more code, but it would definitely be a larger operation. Still, for the purposes of the example, I'll keep the short version (for now). > A little excursion: It may come to no surprise that one of the most > requested features for XQuery Full Text 3.0 is the possibility to > highlight matches [1], and I believe it would be a good idea to > introduce higher-order functions to make highlighting more flexible. The demand definitely makes sense to me, and it's great to hear this may be part of the final 3.0 spec. At the same time, the sophistication of the Full Text facility imposes a pretty hefty set of requirements on implementations. So I'm glad there's a way to achieve basic highlighting with just "plain old" XQuery 3.0 - even if it's limited to regex matching and doesn't have the full set of linguistic features at its disposal. > Qizx also provides such a highlighting function with an optional > function argument (look for "word-function" [2]). The syntax differ a > little it was developed before HOF were finalized. BaseX also has a > function for highlighting results, but it allows uses no function as > argument [3]. Very cool. Speaking of implementation-specific features, eXist-db has match highlighting, but it's limited to results of index-augmented queries (i.e., you have to already have applied a full text, range, or ngram index on a qname in the given collection for the highlighting to work). It performs very well thanks to the indexes, but requires forethought and/or understanding of indexing that isn't really within reach of the student who only understands contains() and matches(). I recall MarkLogic has a cts:highlights() function, but again, implementation-specific. >> I'm curious: Was a function like this impossible to write in pure >> XQuery before 3.0's support for fn:analyze-string() and higher order >> functions? > > I can?t think of any straightforward solution without analyze-string() > (we could write custom string match functions in XQuery, using > recursive functions and all kinds of XQuery string functions, but the > resulting solution may get too slow for large text corpora). > Higher-order functions are probably not mandatory, but they clearly > help to make the solution more elegant. This is helpful to know. Thank you so much for your feedback. Joe From christian.gruen at gmail.com Sat Jul 6 09:55:41 2013 From: christian.gruen at gmail.com (=?ISO-8859-1?Q?Christian_Gr=FCn?=) Date: Sat, 6 Jul 2013 18:55:41 +0200 Subject: [xquery-talk] contains(), matches(), highlight-matches() In-Reply-To: References: Message-ID: >> function argument (look for "word-function" [2]). The syntax differ a >> little it was developed before HOF were finalized. BaseX also has a >> function for highlighting results, but it allows uses no function as >> argument [3]. Well, sorry my broken weekend-English? Thanks for giving us some insight into the extensions provided by eXist-db and MarkLogic! Have a nice day, Christian From mike at saxonica.com Sat Jul 6 15:01:24 2013 From: mike at saxonica.com (Michael Kay) Date: Sat, 6 Jul 2013 23:01:24 +0100 Subject: [xquery-talk] contains(), matches(), highlight-matches() In-Reply-To: References: Message-ID: <19812778-66D6-4D49-AD0B-9923BC20F0B2@saxonica.com> Saxon's attempt at introducing analyze-string into XQuery (predating the XPath 3.0 fn:analyze-string) used higher order functions directly: http://www.saxonica.com/documentation/index.html#!functions/saxon/analyze-string This in a sense is a much more direct equivalent of XSLT's xsl:analyze-string instruction than the version we ended up with in XPath 3.0. But I think there are many use cases where the approach used by the XPath 3.0 fn:analyze-string, of generating XML markup representing the regex match structure, works well for many use cases. In particular, it can readily capture the subgroups matched by the regex, which is only done in a very clumsy way in the saxon:analyze-string() design. Michael Kay Saxonica On 6 Jul 2013, at 06:17, Joe Wicentowski wrote: > Hi all, > > Today I posted some code [1] with a function for highlighting regex > matches in an XML node. I'd appreciate any comments or improvements > to the code (via response here, pull requests, or whatever method is > best for you). > > I also wrote an accompanying post [2] for relative beginners to > XQuery. While seasoned programmers probably come up this function on > the first day they learn XQuery, I think it's beyond most > XQuery-first-programmers like I was when I began. Yet I think the > function is so basic that in my post I ventured to call it XQuery's > "missing third function" - the first two being contains() and > matches(). > > I'm curious: Was a function like this impossible to write in pure > XQuery before 3.0's support for fn:analyze-string() and higher order > functions? Surely, if it had been possible, it would've been in the > functx library. > > Thanks in advance for your insights, > Joe (@joewiz on twitter) > > [1] https://gist.github.com/joewiz/5937897 > > [2] http://joewiz.tumblr.com/post/54729725793/xquerys-missing-third-function > _______________________________________________ > talk at x-query.com > http://x-query.com/mailman/listinfo/talk From liam at w3.org Sat Jul 6 16:42:00 2013 From: liam at w3.org (Liam R E Quin) Date: Sat, 06 Jul 2013 19:42:00 -0400 Subject: [xquery-talk] Xpath 3.0, XQuery 3.0 test suite released Message-ID: <1373154120.13861.51.camel@slave.barefootcomputing.com> The XQuery and XSLT Working Groups have released version 1.0 of the test suite for XQuery 3.0, XPath 3.0, Functions and Operators 3.0, and supporting documents. If you are working on an XQuery 3 or XPath 3 implementation we'd love to hear from you - please run the tests and send us the results, so that we know for sure the specs can be implemented... Note: The test suite is a living document; individual issues can be reported via bugzilla and tests are updated as and when problems are fixed. You can find it at: http://dev.w3.org/2011/QT3-test-suite/ (scroll down - or read it all! - to get to the release information) Thanks! Liam -- Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/ From msokolov at safaribooksonline.com Sat Jul 6 20:47:18 2013 From: msokolov at safaribooksonline.com (Michael Sokolov) Date: Sat, 06 Jul 2013 23:47:18 -0400 Subject: [xquery-talk] contains(), matches(), highlight-matches() In-Reply-To: References: Message-ID: <51D8E4C6.6030303@safaribooksonline.com> On 7/6/13 10:58 AM, Joe Wicentowski wrote: > Speaking of implementation-specific features, eXist-db has > match highlighting, but it's limited to results of index-augmented > queries (i.e., you have to already have applied a full text, range, or > ngram index on a qname in the given collection for the highlighting to > work). It performs very well thanks to the indexes, but requires > forethought and/or understanding of indexing that isn't really within > reach of the student who only understands contains() and matches(). I > recall MarkLogic has a cts:highlights() function, but again, > implementation-specific. > Nice post, Joe! It's good to have a pure XQuery highlighting function. You always need to be able to "explain" to users with highlighting why their search matched some result. I've implemented a highlighting function in Lux (see http://luxdb.org/apidocs/lux/functions/Highlight.html) that provides highlighting for Lucene query matches, as a Java extension function. I considered providing a functional replacement capability as you have, but working strictly in the context of XQuery 1.0 makes that impossible, and I decided not to try extending the language for this purpose (as Marklogic did: their highlight method is not strictly a "function" - it evaluates special variables in its replacement argument in the context of the highlighting operation rather than in the enclosing scope). From a performance standpoint, early termination is a very useful optimization, especially for highlighting, which can often dominate the running time of search result pages. I've found that highlighting large documents with potentially many matches can be very slow and unnecessary when it's typical to show only the first such match to a user in a snippet. I wonder if you've considered some mechanism for truncating processing for large results? I haven't, but it's something I'm thinking about. Cheers -Mike Sokolov From geert.josten at dayon.nl Tue Jul 23 07:10:42 2013 From: geert.josten at dayon.nl (Geert Josten) Date: Tue, 23 Jul 2013 16:10:42 +0200 Subject: [xquery-talk] [ANN] Call for Abstracts for XML Amsterdam 2013 is open! Message-ID: Call for Abstracts XML Amsterdam announces the call for abstracts for the 2013 conference, to be held at October 23rd in De Bazel in Amsterdam. XML Amsterdam is a conference for XML Developers worldwide, looking for the latest and best on developing based on XML and related standards. Our keynote speaker this year will be Norm Walsh from MarkLogic. Norm has chaired numerous W3C and OASIS committees throughout his career. He's also an author and noted speaker, and will be one of the judges in our Program Committee. Submission Abstracts should be two paragraphs about the intended topic, submitted via email to program at xmlamsterdam.com. Submissions can be in simple text form in the content of the email, or as an attachment or link if preferred. Topics must follow the theme of the conference, and should fall into one of the categories listed below. Please, kindly also provide your full name, your current title, your current company (with link), a short bio of yourself, a personal photo, and a link to a personal site (twitter, facebook, blog, ..). Deadline The deadline for submissions is Monday, August 26th. Decisions will be made by the Program Committee at September 9th. Theme This year's conference theme is: 15-year anniversary of XML Possible topics include: . Archiving . Big Data/XML . Searching . XML Processing . (Open) Linked Data . Semantic Web . XML Validation . XML metadata . History of XML . Future of XML . Or anything else (somehow) related to XML.. On behalf of XML Amsterdam 2013 committee info at xmlamsterdam.com http://xmlamsterdam.com/ XML Amsterdam is a sister event of XML Prague and XML London