[xquery-talk] Release of the GCX XQuery EngineQ

Mon Feb 5 13:11:25 PST 2007

> As I see it, there are two kinds of "streaming" implementations:
> - pull-based: expressions are evaluated at-need (lazily) when and if
> their results are needed; or
> - push-based: expressions are evaluated eagerly, but sub-parts of the
> results are "pushed" to a "consumer" as they are generated, while
> avoiding creating of complete "reified" sequences, if possible.
> ...
> My impression from your web-page is that GCX is "push-based", so it
> should be comparable to Qexo and Saxon.

GCX evaluates queries lazily, and we regarded it as pull-based, with
the notion that the query evaluator is the driving force:  The query
evaluator executes the
query strictly sequentially on the buffered data until it has to
"block",  either because a new node is required (e.g. when a variable
is bound to the
next node in its for-loop) or a signOff -statement is encountered.

In both cases, a request is issued to the buffer manager. If data is required
that is not resident in the buffer, the buffer manager requests
reading the next token from the input stream until the data is
available in the buffer or it has become evident that the data does
not exist in the input (e.g. as the input
has been exhausted). The reception of signOff-statements triggers the
garbage collection.
The reading of tokens from the input is coupled with projection and
role assignment.

For many queries, this "interleaving" works out nicely and we avoid
buffering the complete input document at once. Instead, we can process
the input incrementally.

One reason why this works well for us is that our simple prototype
does not yet support let-statements, so all variables in the query
only refer to nodes from the input document. It would be interesting
to extend garbage collection so that it works also for the results
computed by let-expressions.

Steffi

Steffi