[xquery-talk] Finding a XML-Database to fit our needs
greg.burd at oracle.com
Sun Dec 16 11:31:21 PST 2007
I’m biased because I am the product manager for Berkeley DB XML here at Oracle, but that said I think we’re a perfect match for your requirements. :)
We’ve tested document containers of that size and have customers pushing Berkeley DB XML in production today. I’m confident that we can meet your need for concurrent access, we support from multiple process and multiple threads (at the same time) with full transactional (ACID) protection of your document data. We also support replication of databases for highly available systems (failover, fault tolerant, five-nines operation) and for read scalability. Systems using Berkeley DB XML have been shown to be near zero-admin in deployment by integrating traditional administrative tasks into the program itself using our programmatic API. We support the latest XQuery and XQuery Update standards. We allow you to create any number of indicies and we manage a statistical cost-based query optimization engine for fast access to whole or parts of documents. We manage dead-lock detection and resolution and support MVCC to improve concurrent access in heavy read/write applications. You can even store, as well as index, key/value pairs of meta-data information associated with each document. Validation is optional, and can be different and optionally enforced per-document. Also of interest, our XQuery and XQuery Update layer is called XQilla (http://xqilla.sourceforge.net) and has recently been released under the Apache 2.0 license. Berkeley DB XML itself (the database and indexing portions of the product that use XQilla and Berkeley DB) are also open-source, dual-license under the Sleepycat License. Please consider us in your evaluations.
Also, please contact me about joining our Early Access program, we have a version in the late stages of development with fixes/features/etc that will impact your evaluation.
Here are some links to review during your evaluation:
Product documentation is located here:
I suggest you start by reading this getting started guide:
Downloads for the released versions of the Berkeley DB XML products are here:
Please join the OTN and ask specific questions on our Berkeley DB XML Forum (or, if you’re uncomfortable with that you can email me directly) as our developers monitor that actively and most questions are of general interest so we prefer to answer them in this public location so that everyone can benefit.
OTN Forum for Berkeley DB XML discussions:
Again, contact me about our ALPHA program so we can sign you up, regards,
Gregory Burd greg.burd at oracle.com
Product Manager, Berkeley DB/JE/XML Oracle Corporation
From: talk-bounces at x-query.com [mailto:talk-bounces at x-query.com] On Behalf Of Johan Mörén
Sent: Saturday, December 15, 2007 10:30 AM
To: talk at xquery.com
Subject: [xquery-talk] Finding a XML-Database to fit our needs
I'm new to the list and work for a company in Stockholm, Sweden.
We are currently evaluating a move from storing our data in a RDMBS (Oracle 10g) to storing it as native XML. The reason for doing this is that all communication to the persistence layer is done via SOAP and we believe we can save a lot of effort and time if we persist our data in the same format as we communicate it to the outside world.
We are looking for a solution that can handle approximately 16 000 000 documents ranging from 50 to 200 KB in size. About 5k to 20k documents will be updated daily. The documents are all derived from the same base type and are described by a common schema. There are 5 sub-types that could be split into different collections where the largest, in terms of number of documents, would be about 8-9 million in size.
Practically all documents have relationships described to documents belonging both to their own type but also to the other types so navigation of these relationships must be possible for querying purposes.
The documents are very data centric, containing very little free text. But some fields will need to be backed by a free-text-index for querying. Since operators will work online with the data, query times will need to be reasonably fast for not to complex queries.
Apart from the above. The database should support:
* Concurrent inserts and updates.
* XQuery 1.0 support.
* Any fragmentation of the documents (to handle the size) should be transparently handled by the database.
* Both commercial and open source alternatives are of interest.
Any input, experiences and pointers on where to look would be very much appreciated.
"You can't always write a chord ugly enough to say what you want to say,
so sometimes you have to rely on a giraffe filled with whipped cream." - Frank Zappa
More information about the talk