[xquery-talk] XQuery simulation of XSLT 2.0 grouping

David Sewell dsewell at virginia.edu
Sat Oct 22 13:36:07 PDT 2005


Early in his XSLT 2.0 Programmer's Reference, Michael Kay presents an
example of the power of XSLT 2.0 by giving the brief code required to
produce a word frequency list, sorted in descending order of frequency,
for all the words in a document (using Shakespeare's "Othello" in XML as
an example). This is the template that does the work:

  <xsl:template match="/">
    <wordcount>
      <xsl:for-each-group group-by="." select="
            for $w in tokenize(string(.), '\W+') return lower-case($w)">
        <xsl:sort select="count(current-group())" order="descending"/>
        <word word="{current-grouping-key()}" frequency="{count(current-group())}"/>
      </xsl:for-each-group>
    </wordcount>
  </xsl:template>

The following XQuery produces the identical output:

  declare variable $corpus :=
      for $w in tokenize(doc("othello.xml"), '\W+') return lower-case($w);
  declare variable $wordList := distinct-values($corpus);
  <wordcount> {
       for $w in $wordList
       let $freq := count($corpus[. eq $w])
       order by $freq descending
       return <word word="{$w}" frequency="{$freq}"/>
  }</wordcount>

However, on my system the XSLT version takes 1.93 seconds to execute
using Saxon 8.51, while the XQuery takes 210 seconds. I realize that
XQuery 1.0 does not contain the grouping facilities of XSLT 2.0, but
I still have a couple of questions:

1. Am I overlooking a more efficient way of writing the query?

2. If not, is the assumption that one will need to rely on
   implementation-dependent optimization for this type of
   XQuery code, possibly relying on extension functions?

David

-- 
David Sewell, Editorial and Technical Manager
Electronic Imprint, The University of Virginia Press
PO Box 400318, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: dsewell at virginia.edu   Tel: +1 434 924 9973
Web: http://www.ei.virginia.edu/


More information about the talk mailing list