[xquery-talk] performance gain due to using xquery

Michael Kay mhk at mhk.me.uk
Fri May 26 10:04:55 PDT 2006


Some further input on this: I did some more tests to try and discover the
reason for some of the discrepancies in the measurements, and in fact I
think the big speed-up was achieved not by changing the query to remove the
where clause, but by using Saxon-SA, which has a more advanced optimizer. Or
it's possible that it was a combination of the two.

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: Michael Kay [mailto:mhk at mhk.me.uk] 
> Sent: 25 May 2006 15:14
> To: 'fatma helmy'; 'talk at xquery.com'
> Subject: RE: [xquery-talk] performance gain due to using xquery
> 
> I tried this on the 100K and 1M versions of the XMark 
> database and got run-times of 20s and 270s respectively. 
> 
> Your "where" clause seems to have the effect of eliminating 
> paths of length 1. So I changed the query to use "for $n in 
> $j/*//*" which avoids collecting paths of length 1 and this 
> removes the need for the "where" clause. This reduced the run 
> times dramatically, to 0.7 and 6.6 seconds respectively. 
> 
> Your function local:pathOfNode() puts a "/" at the start of 
> each path, which you then carefully remove whenever you use 
> the path. I changed the function to avoid adding the "/".
> 
> You're using two different expressions to get the path to a 
> node: a recursive function for element nodes, 
> string-join(ancestor-or-self) for text nodes. The string-join 
> appears to be faster. I changed both to use this: run time 
> for the 1M file is now 5.49 seconds.
> 
> I tried moving the string-join() call out of the user-defined 
> function and putting it inline in place of the function call. 
> Surprisingly, this increased the execution time to 11 
> seconds. Putting the call for text nodes inline is fine, but 
> not the call for element nodes. This has me completely 
> baffled at the moment: it's something I need to examine more 
> closely. It shows how important it is when tuning to try 
> different things and make measurements to see which performs best.
> 
> Your logic here:
> 
> {for $val in  distinct-values( $leafs)
> let $kval := normalize-space($val)
> return <value-per-path value='{$kval}' 
> count='{count($leafs[. eq  $kval ])}'/>}
> 
> looks faulty, because you're comparing the normalized value 
> of a text node with the unnormalized value. I changed it to 
> remove the normalize-space(). Runtime for the 1M file is now 
> 4.87 seconds.
> 
> I haven't tried writing an XSLT equivalent. I suspect the 
> performance will not be very different - though there's 
> always room for surprises.
> 
> I suspect there's still quite a bit of room for further 
> tuning on this query: there's still quite a bit of 
> redundancy. My current version of the query is:
> 
> declare function local:pathOfNode($node) { 
> string-join($node/ancestor-or-self::*/local-name(), '/') }; 
> let $j:= . 
> 
> let $paths := for $n in $j/*//* return local:pathOfNode($n)
> 
> for $p in distinct-values($paths) 
>  
> let $papa:= replace($p,'/[^/]*$','')
> let $leafs :=$j//text()[normalize-space()] 
> [string-join(../ancestor::*/local-name(), '/') eq $p ] 
> 
> return
> <STATISTICS>
>   <PATH> {string($p)} </PATH>
>   <RATIO> {let $c := count($paths[.=$papa]) return
>            string( round( count($paths[.=$p]) div (if ($c=0) 
> then 1 else $c) * 100 ) )}</RATIO> {for $val in  
> distinct-values($leafs) return  <value-per-path value='{$val}' 
> count='{count($leafs[. eq  $val ])}'/>} 
> 
> </STATISTICS> 
> 
> Michael Kay
> http://www.saxonica.com/
> 
> 
> > -----Original Message-----
> > From: talk-bounces at xquery.com
> > [mailto:talk-bounces at xquery.com] On Behalf Of fatma helmy
> > Sent: 24 May 2006 18:21
> > To: talk at xquery.com
> > Subject: [xquery-talk] performance gain due to using xquery
> > 
> > Dear all
> > thanks to comments of michael key, my xquery is enhanced 
> and i ran it 
> > on saxon , for a file of size 12 M, it took 14 minutes to 
> finish that 
> > is my optimized query
> > 
> > declare function local:pathOfNode($node)
> > {if(empty($node/..)) then "" else
> > concat(local:pathOfNode($node/..), "/", 
> local-name($node))}; let $j:= 
> > doc("try.XML")
> > 
> > let $paths := for $n in $j//* return
> > local:pathOfNode($n)
> > 
> > for $p in distinct-values($paths)
> >  
> > let $papa:= replace($p,'/[^/]*$','')
> > let $leafs :=$j//text()[normalize-space()]
> > [string-join(ancestor-or-self::element()/name(),'/')
> > eq substring-after(string($p),"/") ]
> > 
> > where count
> > (tokenize(substring-after(string($p), "/"),"/")) >1 return 
> > <STATISTICS> <PATH> {string($p)} </PATH> <RATIO> {string( round( 
> > count($paths[.=$p]) div
> > count($paths[.=$papa]) * 100 ) )}
> > </RATIO>
> > {for $val in  distinct-values( $leafs) let $kval := 
> > normalize-space($val) return <value-per-path value='{$kval}'
> > count='{count($leafs[. eq  $kval ])}'/>} </STATISTICS>
> > 
> > now i have the following questions:-
> > if i implemented the same function using xslt or by using api from 
> > java or .net would i get performance gain more than executing it on 
> > xquery engine?
> > if the xquery was the best, is that due to its features as 
> xquery in 
> > general or is it due to saxon.
> > 
> > 
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around 
> > http://mail.yahoo.com 
> _______________________________________________
> > talk at xquery.com
> > http://xquery.com/mailman/listinfo/talk



More information about the talk mailing list