[xquery-talk] RE: your path-counting query

fatma helmy fatmahelmy2000 at yahoo.com
Mon Jun 5 02:58:48 PDT 2006


i am trying to exploit the xquery features to get
statistics over an xml 
for example
<book id="1">
<author> joes </author>
<title> x </author>
</book>
<book id="2">
<author> joes </author>
<title >y </title>
</book>

summary outpu should be
<path> 
/book
<attribute> id <value> 1 <count>1 </count></value>
<value> 2 <count>1 </count></value>
</attribute>
</path>

<path> 
/book/author
<value> joes <count> 2 </count> </value>
</path>

<path>
/book/title
<value> x <count> 1 </count> </value>
<value> y <count> 1 </count> </value>
</path>

i intended to prove getting this summary using xml
native database engine would be faster than getting it
from inside java application. since as the concept of
database, executing stored procedure on the database
engine and sending the result to the application would
be faster than opening a file from the client program
and mapping it into memory.
--- Michael Kay <mike at saxonica.com> wrote:

> I basically took this as far as I could without
> knowing what your query was
> actually trying to achieve. There seemed to be lots
> of coding quirks whose
> purpose I didn't understand.
> 
> One thing I did discover was that Saxon-SA gives
> much better results on this
> than Saxon-B.
> 
> Please don't take the problem off-list.
> 
> Michael Kay
> http://www.saxonica.com/
> 
> > -----Original Message-----
> > From: fatma helmy
> [mailto:fatmahelmy2000 at yahoo.com] 
> > Sent: 02 June 2006 22:09
> > To: mike at saxonica.com
> > Subject: 
> > 
> > Dear Michael
> > thanks to your comment my query was optimized to
> declare 
> > function local:pathOfNode($node)  {  
> >
> string-join($node/ancestor-or-self::*/local-name(),
> >  '/') };
> >  let $j:= . 
> >  
> >  let $paths := for $n in $j/*//* return
> >  local:pathOfNode($n)
> >  
> >  for $p in distinct-values($paths) 
> >   
> >  let $papa:= replace($p,'/[^/]*$','')
> >  let $leafs :=$j//text()[normalize-space()] 
> > [string-join(../ancestor::*/local-name(), '/') eq
> $p  ] 
> >  
> >  return
> >  <STATISTICS>
> >    <PATH> {string($p)} </PATH> 
> >    <RATIO> {let $c := count($paths[.=$papa])
> return
> >             string( round( count($paths[.=$p]) div
>  (if 
> > ($c=0) then 1 else $c)
> >  * 100 ) )}</RATIO>
> >  {for $val in  distinct-values($leafs) return
> >   <value-per-path value='{$val}' 
> >  count='{count($leafs[. eq  $val ])}'/>} 
> >  
> >  </STATISTICS> 
> > 
> > i downloaded xquest which uses xmark database, i
> wonder how 
> > could get the result in seconds on 100m file ,
> since i did 
> > not get any result upon executing it on 12m file.
> > 
> > what i did is:-
> > i imported my file in a library using xquest.
> > then i executed the query.
> > is there any missing step?
> > 
> > 
> > 
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection 
> > around http://mail.yahoo.com 
> 
> 




__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


More information about the talk mailing list