[xquery-talk] TLC XQuery timings and XMark size factors

Wolfgang Hoschek wolfgang.hoschek at mac.com
Wed Jan 17 11:09:41 PST 2007


On Jan 17, 2007, at 10:26 AM, James A. Robinson wrote:

>
> [This may be considered off topic, my apologies if it is. It's related
> to XQuery by way of a paper and the XMark test, but since it's not a
> "How do you do Y in XQuery" I'm unsure.]
>
> Hi folks,
>
> I came across an interesting looking paper last night,
>
>   "Tree Logical Classes for Efficent Evaluation of XQuery"
>   http://www.eecs.umich.edu/db/timber/files/tlc.pdf
>   (165.43 KB)
>
> The math is over my head, but I was curious about the results they
> write about regarding their algorithm when applied to XMark data sets.
> I'd not looked at XMark until now, though I've read about it on  
> some blogs
> (Dr. Kay sometimes writes about his tests of Saxon against XMark  
> data).
>
> Downloading the xmlgen program from http://monetdb.cwi.nl/xml/, I'm  
> a bit
> confused about the numbers listed in the paper, and I was wondering if
> someone who has used xmlgen could explain something to me:  The  
> authors
> say they tested 'size factors from 0.1 (approx. 67MB combined data  
> plus
> indexes space) up to factor 5 (3.5GB combined data plus indexes  
> space),
> and I'm wondering if anyone who has read (or cares to read) that paper
> can tell me if they understand how those sizes were reached?
>
> The sizes I'm seeing from xmlgen don't seem to map to the same  
> sizes the
> authors list.  A size factor of 0.1 comes out to just under 12MB of  
> data.
> Looking at http://monetdb.cwi.nl/xml/faq.txt, I was simply running
>
>   xmlgen -f 0.1 -o xmark-0.1.xml
>
> Adding pretty formating only adds another couple of megabytes to  
> the size.
> I'm curious to try and generate similar sets of data to see if I can
> run tests against a couple of platforms available to me, but this  
> first
> examination makes me wonder if there is something missing from the
> equation which I don't know about.
>

12 MB sounds about right. Here are the file sizes and scale factors  
I'm getting

-rw-r--r--    1 hoschek  hoschek    1161615 Nov 17  2005  
auction-0.01.xml
-rw-r--r--    1 hoschek  hoschek   11669705 Nov 17  2005 auction-0.1.xml
-rw-r--r--    1 hoschek  hoschek   58005732 Nov 17  2005 auction-0.5.xml
-rw-r--r--    1 hoschek  hoschek  116517075 Nov 22  2005 auction-1.0.xml

"approx. 67MB combined data plus indexes space" might indicate the  
storage consumed when storing the data in some kind of indexed XML  
database with alternative storage format, rather than in a plain XML  
file.

Wolfgang.


More information about the talk mailing list