[xquery-talk] TLC XQuery timings and XMark size factors

James A. Robinson jim.robinson at stanford.edu
Wed Jan 17 10:26:45 PST 2007


[This may be considered off topic, my apologies if it is. It's related
to XQuery by way of a paper and the XMark test, but since it's not a
"How do you do Y in XQuery" I'm unsure.]

Hi folks,

I came across an interesting looking paper last night,

  "Tree Logical Classes for Efficent Evaluation of XQuery"
  http://www.eecs.umich.edu/db/timber/files/tlc.pdf
  (165.43 KB)

The math is over my head, but I was curious about the results they
write about regarding their algorithm when applied to XMark data sets.
I'd not looked at XMark until now, though I've read about it on some blogs
(Dr. Kay sometimes writes about his tests of Saxon against XMark data).

Downloading the xmlgen program from http://monetdb.cwi.nl/xml/, I'm a bit
confused about the numbers listed in the paper, and I was wondering if
someone who has used xmlgen could explain something to me:  The authors
say they tested 'size factors from 0.1 (approx. 67MB combined data plus
indexes space) up to factor 5 (3.5GB combined data plus indexes space),
and I'm wondering if anyone who has read (or cares to read) that paper
can tell me if they understand how those sizes were reached?

The sizes I'm seeing from xmlgen don't seem to map to the same sizes the
authors list.  A size factor of 0.1 comes out to just under 12MB of data.
Looking at http://monetdb.cwi.nl/xml/faq.txt, I was simply running

  xmlgen -f 0.1 -o xmark-0.1.xml

Adding pretty formating only adds another couple of megabytes to the size.
I'm curious to try and generate similar sets of data to see if I can
run tests against a couple of platforms available to me, but this first
examination makes me wonder if there is something missing from the
equation which I don't know about.

Jim

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
James A. Robinson                       jim.robinson at stanford.edu
Stanford University HighWire Press      http://highwire.stanford.edu/
+1 650 7237294 (Work)                   +1 650 7259335 (Fax)


More information about the talk mailing list