[xquery-talk] RE:performance gain due to xquery

fatma helmy fatmahelmy2000 at yahoo.com
Mon May 29 19:24:58 PDT 2006


i ran this optimized query using saxon, on file 12 M,
it took 30 minutes to finish, the old query took 16
minutes only .
i searched for xmark to download to run my query but
it seems i cant understand the job of xmark program?
is it optimization suite? i downloaded it , no way to
run my xquery on its interface!

--- Michael Kay <mhk at mhk.me.uk> wrote:

> I tried this on the 100K and 1M versions of the
> XMark database and got
> run-times of 20s and 270s respectively. 
> 
> Your "where" clause seems to have the effect of
> eliminating paths of length
> 1. So I changed the query to use "for $n in $j/*//*"
> which avoids collecting
> paths of length 1 and this removes the need for the
> "where" clause. This
> reduced the run times dramatically, to 0.7 and 6.6
> seconds respectively. 
> 
> Your function local:pathOfNode() puts a "/" at the
> start of each path, which
> you then carefully remove whenever you use the path.
> I changed the function
> to avoid adding the "/".
> 
> You're using two different expressions to get the
> path to a node: a
> recursive function for element nodes,
> string-join(ancestor-or-self) for text
> nodes. The string-join appears to be faster. I
> changed both to use this: run
> time for the 1M file is now 5.49 seconds.
> 
> I tried moving the string-join() call out of the
> user-defined function and
> putting it inline in place of the function call.
> Surprisingly, this
> increased the execution time to 11 seconds. Putting
> the call for text nodes
> inline is fine, but not the call for element nodes.
> This has me completely
> baffled at the moment: it's something I need to
> examine more closely. It
> shows how important it is when tuning to try
> different things and make
> measurements to see which performs best.
> 
> Your logic here:
> 
> {for $val in  distinct-values( $leafs)
> let $kval := normalize-space($val)
> return <value-per-path value='{$kval}' 
> count='{count($leafs[. eq  $kval ])}'/>}
> 
> looks faulty, because you're comparing the
> normalized value of a text node
> with the unnormalized value. I changed it to remove
> the normalize-space().
> Runtime for the 1M file is now 4.87 seconds.
> 
> I haven't tried writing an XSLT equivalent. I
> suspect the performance will
> not be very different - though there's always room
> for surprises.
> 
> I suspect there's still quite a bit of room for
> further tuning on this
> query: there's still quite a bit of redundancy. My
> current version of the
> query is:
> 
> declare function local:pathOfNode($node)
> {
> string-join($node/ancestor-or-self::*/local-name(),
> '/') };
> let $j:= . 
> 
> let $paths := for $n in $j/*//* return
> local:pathOfNode($n)
> 
> for $p in distinct-values($paths) 
>  
> let $papa:= replace($p,'/[^/]*$','')
> let $leafs :=$j//text()[normalize-space()]
> [string-join(../ancestor::*/local-name(), '/') eq $p
> ] 
> 
> return 
> <STATISTICS>
>   <PATH> {string($p)} </PATH> 
>   <RATIO> {let $c := count($paths[.=$papa]) return
>            string( round( count($paths[.=$p]) div
> (if ($c=0) then 1 else $c)
> * 100 ) )}</RATIO>
> {for $val in  distinct-values($leafs) return
>  <value-per-path value='{$val}' 
> count='{count($leafs[. eq  $val ])}'/>} 
> 
> </STATISTICS> 
> 
> Michael Kay
> http://www.saxonica.com/
> 
> 
> > -----Original Message-----
> > From: talk-bounces at xquery.com 
> > [mailto:talk-bounces at xquery.com] On Behalf Of
> fatma helmy
> > Sent: 24 May 2006 18:21
> > To: talk at xquery.com
> > Subject: [xquery-talk] performance gain due to
> using xquery
> > 
> > Dear all
> > thanks to comments of michael key, my xquery is
> enhanced and 
> > i ran it on saxon , for a file of size 12 M, it
> took 14 
> > minutes to finish that is my optimized query
> > 
> > declare function local:pathOfNode($node)
> > {if(empty($node/..)) then "" else
> > concat(local:pathOfNode($node/..), "/",
> > local-name($node))};
> > let $j:= doc("try.XML") 
> > 
> > let $paths := for $n in $j//* return
> > local:pathOfNode($n) 
> > 
> > for $p in distinct-values($paths) 
> >  
> > let $papa:= replace($p,'/[^/]*$','')
> > let $leafs :=$j//text()[normalize-space()]
> >
> [string-join(ancestor-or-self::element()/name(),'/')
> > eq substring-after(string($p),"/") ] 
> > 
> > where count
> > (tokenize(substring-after(string($p), "/"),"/"))
> >1 return 
> > <STATISTICS> <PATH> {string($p)} </PATH> <RATIO>
> {string( 
> > round( count($paths[.=$p]) div
> > count($paths[.=$papa]) * 100 ) )}
> > </RATIO>
> > {for $val in  distinct-values( $leafs)
> > let $kval := normalize-space($val)
> > return <value-per-path value='{$kval}' 
> > count='{count($leafs[. eq  $kval ])}'/>}
> </STATISTICS> 
> > 
> > now i have the following questions:-
> > if i implemented the same function using xslt or
> by using api 
> > from java or .net would i get performance gain
> more than 
> > executing it on xquery engine?
> > if the xquery was the best, is that due to its
> features as 
> > xquery in general or is it due to saxon.
> > 
> > 
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection 
> > around http://mail.yahoo.com 
> > _______________________________________________
> > talk at xquery.com
> > http://xquery.com/mailman/listinfo/talk
> 
> _______________________________________________
> talk at xquery.com
> http://xquery.com/mailman/listinfo/talk
> 




__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


More information about the talk mailing list