[xquery-talk] performance gain due to using xquery
Michael Kay
mhk at mhk.me.uk
Fri May 26 10:04:55 PDT 2006
Some further input on this: I did some more tests to try and discover the
reason for some of the discrepancies in the measurements, and in fact I
think the big speed-up was achieved not by changing the query to remove the
where clause, but by using Saxon-SA, which has a more advanced optimizer. Or
it's possible that it was a combination of the two.
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: Michael Kay [mailto:mhk at mhk.me.uk]
> Sent: 25 May 2006 15:14
> To: 'fatma helmy'; 'talk at xquery.com'
> Subject: RE: [xquery-talk] performance gain due to using xquery
>
> I tried this on the 100K and 1M versions of the XMark
> database and got run-times of 20s and 270s respectively.
>
> Your "where" clause seems to have the effect of eliminating
> paths of length 1. So I changed the query to use "for $n in
> $j/*//*" which avoids collecting paths of length 1 and this
> removes the need for the "where" clause. This reduced the run
> times dramatically, to 0.7 and 6.6 seconds respectively.
>
> Your function local:pathOfNode() puts a "/" at the start of
> each path, which you then carefully remove whenever you use
> the path. I changed the function to avoid adding the "/".
>
> You're using two different expressions to get the path to a
> node: a recursive function for element nodes,
> string-join(ancestor-or-self) for text nodes. The string-join
> appears to be faster. I changed both to use this: run time
> for the 1M file is now 5.49 seconds.
>
> I tried moving the string-join() call out of the user-defined
> function and putting it inline in place of the function call.
> Surprisingly, this increased the execution time to 11
> seconds. Putting the call for text nodes inline is fine, but
> not the call for element nodes. This has me completely
> baffled at the moment: it's something I need to examine more
> closely. It shows how important it is when tuning to try
> different things and make measurements to see which performs best.
>
> Your logic here:
>
> {for $val in distinct-values( $leafs)
> let $kval := normalize-space($val)
> return <value-per-path value='{$kval}'
> count='{count($leafs[. eq $kval ])}'/>}
>
> looks faulty, because you're comparing the normalized value
> of a text node with the unnormalized value. I changed it to
> remove the normalize-space(). Runtime for the 1M file is now
> 4.87 seconds.
>
> I haven't tried writing an XSLT equivalent. I suspect the
> performance will not be very different - though there's
> always room for surprises.
>
> I suspect there's still quite a bit of room for further
> tuning on this query: there's still quite a bit of
> redundancy. My current version of the query is:
>
> declare function local:pathOfNode($node) {
> string-join($node/ancestor-or-self::*/local-name(), '/') };
> let $j:= .
>
> let $paths := for $n in $j/*//* return local:pathOfNode($n)
>
> for $p in distinct-values($paths)
>
> let $papa:= replace($p,'/[^/]*$','')
> let $leafs :=$j//text()[normalize-space()]
> [string-join(../ancestor::*/local-name(), '/') eq $p ]
>
> return
> <STATISTICS>
> <PATH> {string($p)} </PATH>
> <RATIO> {let $c := count($paths[.=$papa]) return
> string( round( count($paths[.=$p]) div (if ($c=0)
> then 1 else $c) * 100 ) )}</RATIO> {for $val in
> distinct-values($leafs) return <value-per-path value='{$val}'
> count='{count($leafs[. eq $val ])}'/>}
>
> </STATISTICS>
>
> Michael Kay
> http://www.saxonica.com/
>
>
> > -----Original Message-----
> > From: talk-bounces at xquery.com
> > [mailto:talk-bounces at xquery.com] On Behalf Of fatma helmy
> > Sent: 24 May 2006 18:21
> > To: talk at xquery.com
> > Subject: [xquery-talk] performance gain due to using xquery
> >
> > Dear all
> > thanks to comments of michael key, my xquery is enhanced
> and i ran it
> > on saxon , for a file of size 12 M, it took 14 minutes to
> finish that
> > is my optimized query
> >
> > declare function local:pathOfNode($node)
> > {if(empty($node/..)) then "" else
> > concat(local:pathOfNode($node/..), "/",
> local-name($node))}; let $j:=
> > doc("try.XML")
> >
> > let $paths := for $n in $j//* return
> > local:pathOfNode($n)
> >
> > for $p in distinct-values($paths)
> >
> > let $papa:= replace($p,'/[^/]*$','')
> > let $leafs :=$j//text()[normalize-space()]
> > [string-join(ancestor-or-self::element()/name(),'/')
> > eq substring-after(string($p),"/") ]
> >
> > where count
> > (tokenize(substring-after(string($p), "/"),"/")) >1 return
> > <STATISTICS> <PATH> {string($p)} </PATH> <RATIO> {string( round(
> > count($paths[.=$p]) div
> > count($paths[.=$papa]) * 100 ) )}
> > </RATIO>
> > {for $val in distinct-values( $leafs) let $kval :=
> > normalize-space($val) return <value-per-path value='{$kval}'
> > count='{count($leafs[. eq $kval ])}'/>} </STATISTICS>
> >
> > now i have the following questions:-
> > if i implemented the same function using xslt or by using api from
> > java or .net would i get performance gain more than executing it on
> > xquery engine?
> > if the xquery was the best, is that due to its features as
> xquery in
> > general or is it due to saxon.
> >
> >
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam? Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> _______________________________________________
> > talk at xquery.com
> > http://xquery.com/mailman/listinfo/talk
More information about the talk
mailing list