[xquery-talk] General comparisons of speed of xquery vs. xslt

Wed Apr 28 21:36:34 PDT 2004

It's true that dataflow analysis is difficult if the stylesheet makes heavy
use of template rules - though there are processors that attempt it, for
example from conversations with Jacek Ambrosiak I think Gregor does a lot of
this kind of thing, and I suspect that datapower's XSLT engine does as well,
though it's a commercial product with no details published.

Of course I could defend my thesis on one level by saying that the
stylesheet author doesn't have to use template rules. Almost every XQuery
construct has a direct equivalent in XSLT (there are some trivial
exceptions, for example the capabilities of "order by" are slightly
different) and if the XSLT author confines himself to the subset of XSLT
that is available in XQuery, then it's obviously true that an XSLT processor
can apply exactly the same optimizations.

However, I think most of the optimizations that are generally useful can be
achieved in XSLT without imposing this constraint. For example, the vast
majority of cases where path expressions don't need to be sorted can be
detected statically without great difficulty. Also, one shouldn't
underestimate the potential of optimizations where decisions are deferred
until run-time, which means they can be made with the benefit of instance
information that's not available statically. 

XSLT's "2 language" architecture does make global optimization a bit
difficult, because the tendency is to engineer the two language
implementations as separate components each of which is handled
independently. That's not a stopper, however, and I think the higher
performance engines depart from this model. And in some ways it makes life
easier, because the run-time architecture for the two languages can be
significantly different.

Streaming in XSLT 1.0 was actually easier than in 2.0, some of the new
facilities we have introduced (which mimic the way XQuery specifies tree
construction) make it more difficult. In 1.0 XPath evaluation typically used
a "pull" stream while XSLT instruction evaluation used a "push" pipeline,
and this worked very well. In XQuery the natural technique is always to
pull, which means you end up doing more tree construction and copying than
is necessary. The optimizer has to do a lot of work to prevent this. The
non-composability of XSLT (XSLT instructions can't be nested inside XPath
expressions) gives it a great advantage here, which still exists in 2.0,
although it's not so clean if people make heavy use of function calls rather
than templates.

Generally I think that if there is a speed difference between processors
(for example MSXML4's XSLT processor is often reported to be three or four
times faster than the XSLT processor in .NET) then the difference seems to
apply fairly uniformly across a wide range of stylesheets. This suggests to
me that the speed difference is not primarily due to the smartness of the
optimizer, which would give very variable ratios for different stylesheets.
Rather, it has a lot to do with the general tuning of the code and in
particular the efficiency of memory management. For many single-document
transformations the performance is dominated by source document parsing,
tree construction, sorting, and serialization, (and stylesheet compilation
if you count that in) and no amount of optimization is going to make a big
impact on that.

Michael Kay

  _____  

From: Daniela Florescu [mailto:danielaf at bea.com] 
Sent: 28 April 2004 18:16
To: Michael Kay
Cc: 'Edward Gillespie'; talk at xquery.com
Subject: Re: [xquery-talk] General comparisons of speed of xquery vs. xslt

(b) with in-memory transformations there is no intrinsic reason why XQuery
should be faster than XSLT

Michael,

I think I disagree with this statement. This might be due to the fact that
I understand XQuery much better then I do understand XSLT, but here is
my rationale anyway.

Most XQuery code rewriting rules that we apply in the BEA implementation
require
serious dataflow analysis (i.e. how is the data flowing through expressions,
where is the
data coming from and where is it going), similar in spirit with the way all
modern compilers do.

Trivial examples of code rewriting rules that require dataflow analysis are
eliminating the unnecessary sorts and duplicate elimination, transforming
backwards 
navigation into forward navigation, introducing parallelism and
asyncronicity, etc, but there
are many, many others.

Moreover, we are building a streaming XQuery engine. Of course, not all
queries can be 
executed in a purely streaming fashion. We use the same dataflow analysis to

detect and minimize the need for materialization, which is essential for
query performance.

Now it seems to me that this dataflow analysis is easier to do in XQuery
(through expressions)
then in XSLT (through templates). Knowing XSLT much better then I do, what
is your take on this?
Are there any XSLT implementations that do dataflow analysis for
optimization ?

Best regards,
Dana

P.S. A while ago we wrote a paper describing our streaming XQuery
implementation
. Daniela Florescu, Chris Hillery, Donald Kossmann, Paul Lucas, Fabio
Riccardi, Till Westmann, Michael J. Carey, Arvind Sundararajan, Geetika
Agrawal:
The BEA/XQRL Streaming XQuery Processor. VLDB 2003: 997-1008
http://www-dbs.informatik.uni-heidelberg.de/publications/index.shtml
A better version will appear in VLDB Journal soon. 

And by the way, in this paper we did compare our XQuery implementation with
an XSLT implementation.
While doing so, we did translate XMark in XQuery. If there is some demand,
we can spend some time, 
polish those queries and publish them in an open forum somewhere.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://xquery.com/pipermail/talk/attachments/20040428/d5194839/attachment.htm