[xquery-talk] XQuery simulation of XSLT 2.0 grouping
David Sewell
dsewell at virginia.edu
Sat Oct 22 13:36:07 PDT 2005
Early in his XSLT 2.0 Programmer's Reference, Michael Kay presents an
example of the power of XSLT 2.0 by giving the brief code required to
produce a word frequency list, sorted in descending order of frequency,
for all the words in a document (using Shakespeare's "Othello" in XML as
an example). This is the template that does the work:
<xsl:template match="/">
<wordcount>
<xsl:for-each-group group-by="." select="
for $w in tokenize(string(.), '\W+') return lower-case($w)">
<xsl:sort select="count(current-group())" order="descending"/>
<word word="{current-grouping-key()}" frequency="{count(current-group())}"/>
</xsl:for-each-group>
</wordcount>
</xsl:template>
The following XQuery produces the identical output:
declare variable $corpus :=
for $w in tokenize(doc("othello.xml"), '\W+') return lower-case($w);
declare variable $wordList := distinct-values($corpus);
<wordcount> {
for $w in $wordList
let $freq := count($corpus[. eq $w])
order by $freq descending
return <word word="{$w}" frequency="{$freq}"/>
}</wordcount>
However, on my system the XSLT version takes 1.93 seconds to execute
using Saxon 8.51, while the XQuery takes 210 seconds. I realize that
XQuery 1.0 does not contain the grouping facilities of XSLT 2.0, but
I still have a couple of questions:
1. Am I overlooking a more efficient way of writing the query?
2. If not, is the assumption that one will need to rely on
implementation-dependent optimization for this type of
XQuery code, possibly relying on extension functions?
David
--
David Sewell, Editorial and Technical Manager
Electronic Imprint, The University of Virginia Press
PO Box 400318, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: dsewell at virginia.edu Tel: +1 434 924 9973
Web: http://www.ei.virginia.edu/
More information about the talk
mailing list