[xquery-talk] Doing some Pattern Frequency Distribution

Martin Probst martin at x-hive.com
Thu Jun 8 21:53:03 PDT 2006


Hi,

> I have a requirement to get Count, Blank Count, Max Length, Frequency
> Distribution and Pattern Frequency Distribution on some of the  
> elements
> in an XML which can go up to a size of 5GB.

I would recommend to go with XQuery and a real XML database for those  
sizes - I'm not much of an expert for XSLT processors, but I doubt  
you'll get good results with XSLT on datasets of 5 GB. XML databases  
are typically capable of holding less than the full XML tree in  
memory, which makes many operations on huge files possible.

> With my initial reading on
> XSLT and XQuery I felt XQuery is a best candidate for this. As you
> suggested using XSLT for "Pattern Frequency Distribution (PFD)" I need
> to change the whole solution from XQuery to XSLT.

See my email about solving that in XQuery.

Martin


More information about the talk mailing list