[xquery-talk] TLC XQuery timings and XMark size factors
Wolfgang Hoschek
wolfgang.hoschek at mac.com
Wed Jan 17 11:09:41 PST 2007
On Jan 17, 2007, at 10:26 AM, James A. Robinson wrote:
>
> [This may be considered off topic, my apologies if it is. It's related
> to XQuery by way of a paper and the XMark test, but since it's not a
> "How do you do Y in XQuery" I'm unsure.]
>
> Hi folks,
>
> I came across an interesting looking paper last night,
>
> "Tree Logical Classes for Efficent Evaluation of XQuery"
> http://www.eecs.umich.edu/db/timber/files/tlc.pdf
> (165.43 KB)
>
> The math is over my head, but I was curious about the results they
> write about regarding their algorithm when applied to XMark data sets.
> I'd not looked at XMark until now, though I've read about it on
> some blogs
> (Dr. Kay sometimes writes about his tests of Saxon against XMark
> data).
>
> Downloading the xmlgen program from http://monetdb.cwi.nl/xml/, I'm
> a bit
> confused about the numbers listed in the paper, and I was wondering if
> someone who has used xmlgen could explain something to me: The
> authors
> say they tested 'size factors from 0.1 (approx. 67MB combined data
> plus
> indexes space) up to factor 5 (3.5GB combined data plus indexes
> space),
> and I'm wondering if anyone who has read (or cares to read) that paper
> can tell me if they understand how those sizes were reached?
>
> The sizes I'm seeing from xmlgen don't seem to map to the same
> sizes the
> authors list. A size factor of 0.1 comes out to just under 12MB of
> data.
> Looking at http://monetdb.cwi.nl/xml/faq.txt, I was simply running
>
> xmlgen -f 0.1 -o xmark-0.1.xml
>
> Adding pretty formating only adds another couple of megabytes to
> the size.
> I'm curious to try and generate similar sets of data to see if I can
> run tests against a couple of platforms available to me, but this
> first
> examination makes me wonder if there is something missing from the
> equation which I don't know about.
>
12 MB sounds about right. Here are the file sizes and scale factors
I'm getting
-rw-r--r-- 1 hoschek hoschek 1161615 Nov 17 2005
auction-0.01.xml
-rw-r--r-- 1 hoschek hoschek 11669705 Nov 17 2005 auction-0.1.xml
-rw-r--r-- 1 hoschek hoschek 58005732 Nov 17 2005 auction-0.5.xml
-rw-r--r-- 1 hoschek hoschek 116517075 Nov 22 2005 auction-1.0.xml
"approx. 67MB combined data plus indexes space" might indicate the
storage consumed when storing the data in some kind of indexed XML
database with alternative storage format, rather than in a plain XML
file.
Wolfgang.
More information about the talk
mailing list