[xquery-talk] Release of the GCX XQuery EngineQ

Stefanie Scherzinger scherzinger at infosys.uni-sb.de
Sun Feb 4 23:15:30 PST 2007


Hi Frans,

thanks for your interest in our XQuery engine, and thanks for your feedback.

> What made you think that comparing your implementation to the others was
> reasonable?

Actually, it is not so easy to get a hold of reference implementations
and we had to make do with what is publicly available.  GCX has two
main characteristics: It's an in-memory XQuery engine and it is geared
towards  streaming XQuery evaluation.

The FluXQuery engine is the most natural choice for a reference,
because it is also a main-memory XQuery engine geared towards XML
stream processing, and it implements a very similar XQuery fragment.
There are other streaming resaearch prototypes (e.g. XSM), but they
typically have not been released by their makers yet.

The other in-memory engines (QizX, Galax, and Saxon) implement more
XQuery features (or all of them), but they are not geared towards
stream processing. But at least the principal architecture is
comparable.

Finally, we chose MonetDB out of pure interest on how we would perform
in comparison. As ours is a streaming engine,  comparing it against a
secondary-storage implementation that can make use of index structures
etc. in a different way is unfair to us.

Unfortunately, no other streaming XQuery implementations are to be had
to be compared against. However, if you know of any suitable
implementations, I'd appreciate it very much if you could point us to
them.


> I'm asking because I see it as comparing apples against oranges. Some of the
> other products run the queries as is and they implement XQuery, which is
> quite different from not implementing XQuery and rewriting queries to ones
> liking.

I hope there is no misunderstandment - of course, we first rewrote the
queries as shown on the website, and then ran the same queries on the
same data on each engine.

> Measuiring memory usage with top is as far as I know generally adviced
> against. See:
>
> http://ktown.kde.org/~seli/memory/analysis.html

Thanks for the link - however, when GCX needs only a little more than
1 MB main memory
for some same query where others require over a hundred MBs, then I
think a point has been made.

> So the short story to why you get such a low memory foot prints is that you
> don't load more of the document than is needed, as told my static
> analysis("roles")?


There are two key approches: First, document projection where we try
to load only what may be needed for query evaluation. This, of course,
has been done before (e.g. the Galax people experimented with it,
too). Second, and this is new, the garbage collector removes the
loaded data once it is not needed anymore. This is done continually,
and for many queries it works out very nicely such that only a small
subset of the input is kept in main memory at any moment in time
during query evaluation.

If you are interested in the internals, maybe you'll want to check out
 the paper on GCX:
http://www.infosys.uni-sb.de/publications/INFOSYS-TR-2006-13.pdf

Ciao,
Steffi


More information about the talk mailing list