[xquery-talk] Release of the GCX XQuery EngineQ

Michael Kay mike at saxonica.com
Sat Feb 3 18:34:10 PST 2007

> About the comparison on:
> http://www.infosys.uni-sb.de/projects/streams/gcx/benchs.php
> Aside from the numbers, this paragraph can be found:
> "As the GCX engine does not support the full XQuery standard, 
> queries were adapted accordingly [...] since GCX does not 
> support attribute access, attributes in the XML stream have 
> been rewritten to subelements"
> What made you think that comparing your implementation to the 
> others was reasonable?
> I'm asking because I see it as comparing apples against 
> oranges. Some of the other products run the queries as is and 
> they implement XQuery, which is quite different from not 
> implementing XQuery and rewriting queries to ones liking.

For an academic project it seems an entirely reasonable thing to do, so long
as you are honest. Academics receive research funding in order to develop
and exploit new ideas, which if successful can be of widespread benefit to
industry and users. To evaluate the effectiveness of this approach (which is
indeed novel, as far as I can see), it seems entirely reasonable to compare
the performance achieved for the subset of queries that it can handle
against the industrial state-of-the-art. If you're doing a research project,
it's entirely legitimate to eliminate complications such as
attribute-vs-elements that one can reasonably expect to have no bearing on
the conclusions of the research.

They have also been honest in showing that the technique has limitations
when evaluating joins such as Q8, though it can still help to raise the
barrier imposed by memory usage. Not all academic projects are as frank.
(Incidentally, the figures they publish for Saxon are for the open source
version, which has no join optimizer: in fact all the processors they
measured for Q8 show O(n^2) performance. Saxon-SA runs this query on 10Mb of
data in 47ms, with O(n log n) scaleability.)

The important test to apply when assessing whether this technique is
interesting is (a) whether it is possible efficiently to determine the
subset of queries to which it is applicable, and (b) whether the query
rewrites that were done manually on this project could be automated. If the
answer to these questions is yes, then the project provides a useful
optimization technique that could usefully be incorporated into mainstream

As far as I can see, what they have done is a useful development that takes
further the idea of document projection put forward by Marian and Simeon,
which is already being exploited in industrial products such as DataDirect's
(though not yet in Saxon, regrettably). This work also, incidentally, is
likely to convince any sceptics that a language with restricted expressive
power compared to XSLT is capable of being optimized to a much greater
extent than XSLT is, which helps users in understanding the trade-off
between the two languages.

Michael Kay

More information about the talk mailing list