[xquery-talk] Doing some Pattern Frequency Distribution

Fri Jun 9 10:14:07 PDT 2006

Martin Probst wrote:
> This might actually qualify as a distinction between native and not so 
> native XML databases. In a real, native XML database it should not make 
> much of a difference if your document is 3 GB or your document 
> collection is 3 GB but individual documents 3 MB, as the elements of 
> processing are the XML nodes themselves.

That depends, obviously, on your definition of "real, native XML 
database", but, while the statement is true for X-Hive/DB, I tend not to 
agree in general.

For instance Apache Xindice is certainly native XML (though I would not 
call it a database as it lacks transaction support), but manipulates 
whole documents in memory so it does have problems with large documents. 
On the other hand, a system that maps XML documents to relational tables 
may not suffer from this problem. Indeed, it would be processing the XML 
nodes as you say, only in the form of one or more relational table rows.

Bas de Bakker