[xquery-talk] question about XQUERY C++ API
Ken North
kennorth at sbcglobal.net
Thu Jan 12 12:55:52 PST 2006
Natalia Kory wrote:
>> Since we have to support Oracle, Sybase and MS SQL we can not use db vendor
specific XML functionality. Each document can be up to 18KB and there could be
100,000 of the documents in the database.
... We plan to load XML documents into memory one by one and then use XQUERY to
filter out documents that do not satisfy filter criteria. Can someone help us
with the following questions:
1. Is there a C++ API we can use for XQUERY processing?
2. Any idea what performance will be like?
3. Is there a better approach to our problem?
Natalia,
Without knowing your specific application requirements, here's an approach to
consider.
First, determine when performance is most important -- fast insertions or fast
information retrieval (queries). If you're looking for the best query
performance, you don't want to retrieve 100,000 documents on a "one by one"
basis.
Assuming you're looking for fast queries, you'll want to use indexed searches
instead of "one by one" retrieval. Even if you're storing each document in a
column, you can create additional tables to speed up SQL queries. One approach
is to create a frequency distribution from your document collection to analyze
what tags are used most often. If you're doing demand-deposit accounting, for
example, account number will be an important tag that merits an index.
As for a multi-platform solution, Oracle and Microsoft SQL Server 2005 implement
the SQL/XML functions and XML type columns of the SQL:2003 standard. Sybase 15
implements the SQL/XML functions, but not the column type.
One approach I've used to work around disparate types and behavior on several
SQL platforms was to implement the same user-defined type and functions across
all of the target platforms. You can implement a type and behavior by writing
Java or C++ classes that are embedded in the database. After you install the
classes in the database, you can use the types and scalar functions in SQL
queries.
You could implement for example, an xml_doc_column that you can use in CREATE
TABLE, INSERT, UPDATE and SELECT statements and stored procedures. You can
create indexes as needed to deliver query performance.
Since the data resides in SQL databases, you should filter first using SQL
queries and deliver a result set (document collection) to your XQuery
application. For example, if your query is accounts > $10,000 + process some
tag, you want to subset the document collection using SQL. Give the XQuery
application only the accounts > $10,000 subset rather than processing 100,000
records "one by one".
======== Ken North ===========
www.WebServicesSummit.com
www.SQLSummit.com
www.GridSummit.com
More information about the talk
mailing list