[xquery-talk] Questions about SQL Server 2005 and XML
fcohen at rainingdata.com
Sun Jan 22 10:22:02 PST 2006
I recently ran across Microsoft's white paper on XML Support In SQL
Server 2005. The paper is found at:
It is a good paper with lots of detail. Below are some thoughts I had
and questions from reading the paper. I'm hoping this spurs some
discussion about the persistence layer below XQuery.
1) The MS approach to XML is to store XML in a relational db with
extensions to more natively understand XML constructs. For instance,
the data is stored in a relation table column of field type XML with
a lossy technique. SQL Server stores element tags in a binary form,
it does not retain the order of the attributes, and it limits the
depth of the hierarchy of an element.
2) To get good performance, one creates a "primary XML index" and
then one may create secondary index types on the data in that field:
1) PATH for evaluating where clauses, 2) PROPERTY for property value
lookups, and 3) VALUE for doing wild card queries. You create each of
these manually. XML indexes are B+Trees.
3) I wonder if Microsoft has tested SQL Server 2005 with STAR BOD
schemas? These are typical of UBL-based schemas and use Schema
includes extensively. Many XML parsers I have tested will not
support the complexity in the BODs.
4) SQL Server 2005 supports SOAP over HTTP endpoints to query,
insert, delete data. The paper does not say which SOAP encoding
style: Doc-lit, RPC encoded, RPC-lit.
5) The paper says "Data Exchange and Workflow" are key drivers for MS
to support XML, including "vertical domains such as financial and geo-
spatial data." GIS defines geospatial field types for XML documents.
Does SQL Server support GIS field types in XML documents?
One of the CHECK examples in the paper recommends using a constraint
for a "flower shop that must be within 50 miles of its business
location." May I express a GIS function for this constraint?
6) SQL Server 2005 takes string, file, and FOR XML as inputs. What
about taking a stream?
7) The section on Storage Representation makes it seem like if I am
not expert at creating indexes and using typed XML that SQL Server
2005 will give me slow performance. Am I reading that correctly?
8) Stored XML is "limited to 2 GB per instance." Is that 2 GB per XML
document stored in one XML field? Is that for all the XML stored in
one field? That seems really small to me and makes me wonder why
there is a limit at all?
Same question on the "depth of XML heirarchy is limited to 128
levels". Is that 128 levels within each document in an XML field?
9) SQL Server 2005 does not store attribute order in an XML field. In
SOA and workflow patterns the document that moves from one service or
operator to the next usually depends on the canonicalization of the
data where the fidelity to the original XML document is maintained.
10) I am pretty tired of waiting for the XQuery 1.0 spec to be final.
Come on already!#@^% Most of the XQuery databases already implement
full text search (Raining Data TigerLogic included.) SQL Server 2005
supports full text search of element contents. Is the fn:contains()
schema the way MS would like to see full text in a post XQuery 1.0 spec?
11) The paper covers FOR XML EXPLICIT and includes "FOR XML
EXPLICIT ... may not perform as well for deep heirarchies." Why is that?
Frank Cohen, Raining Data, http://www.RainingData.com, phone: 408 236
http://www.xquerynow.com for free XML, XQuery and native XML database
techniques and solutions.
More information about the talk