[xquery-talk] Re: The State of Native XML databases

Mon Aug 20 15:39:20 PDT 2007

On 8/20/07, Michael Kay <mike at saxonica.com> wrote:
> > What do you guys use?  Again, I'm not saying that XML Schema
> > should be used, I was more looking into schema as a general
> > term, we need something to define the storage and constraints, right?
>
> Actually, you don't. I happen to think that it's often very beneficial to
> have a schema (or several...), but there's absolutely no inherent
> requirement to do so. It's quite possible to store a random collection of
> well-formed XML documents and query them.

Agree, it might be useful in some scenarios.  But how is it different
than just having a root node and inserting nodes that represent these
documents?  I guess I just don't get the benefit of collections.  If I
want to store a bunch of messages, I can store it in one file with a
<messages> root node that can be synonymous to a collection.

>
> Moreover, the notion of a schema is orthogonal to the notion of a collection
> of documents. A product may associate one with the other, but it is by no
> means essential.

Agree.

>
> > Our biggest issue with Oracle and XMLDB were the fact that
> > there is absolutely no transactional integrity outside of a
> > collection entry.
> > Each write to a particular collection entry requires a lock
> > of the document stored.
>
> That's nothing to do with transactional integrity. The granularity of
> locking doesn't affect integrity, it only affects throughput.

Yeah, sorry, I misspoke.  I meant that achieving the integrity and
concurrency is nearly impossible with non-granular locks, short of
shredding the document.

>
> It's true of course that a system that doesn't offer locking at a finer
> granularity than the document level is unsuitable for the kind of "one big
> document" database design you have chosen. So you either have to change your
> design, or choose a different product. The fact that the product was
> designed for a different usage scenario from yours doesn't make the product
> bad, it just makes it unsuitable for your chosen approach.

Well, I don't agree.  You can architect every application in a
horrendous way to supplement the limitations of the product you are
using, but that's not ideal.  If all RDBM systems locked at the table
level you can argue that a different architectural approach would also
make up for that limitation.  I could store each entry in a different
table.  That's just not right, especially if a solution exists in
other products.  We didn't necessarily choose this approach.  Our
industry standardized on a schema CDISC ODM that encapsulates all
information pertaining to a clinical trial, which can be rather large.
 There are many more domains that are working with same architectural
concerns, like it or not.  Shredding the document into snippets to
achieve scalability we desire would mean redefining the schema and/or
imposing other limitations on various other functionalities.  We'd
have to impose various constraints that can be done with a schema at
the application level for example, which would be otherwise out of the
box.

>
> Michael Kay
> http://www.saxonica.com/
>
>