[xquery-talk] [ANN] nux-1.4 release

Wed Nov 30 08:40:14 PST 2005

Nux is an open-source Java toolkit making efficient and powerful XML  
processing easy. Improvements and additions in this 1.4 release focus  
on scalability, reliability and ease of use, maintaining API  
compatibility with prior releases.

A detailed changelog is here: http://dsd.lbl.gov/nux/changelog.html
Downloads are here: http://dsd.lbl.gov/nux-download/releases/
Summary:

XQuery and XOM
--------------

     •    Upgraded to xom-1.1-final (with compatible performance  
patches). xom-1.0.x and xom-1.1.x continue to work fine, albeit less  
efficient.
     •    Upgraded to saxonb-8.6.1, implementing XQuery W3C Candidate  
Recommendation, 3 November 2005 (Saxon 8.6, 8.5, 8.4, 8.3 still  
continue to work fine).
     •    saxon8-xom.jar is nomore needed as its contents are  
directly compiled into nux.jar, improving simplicity and reliability.
     •    Constructing a new compiled XQuery object is now about 20  
times faster.
     •    Added driver for official W3C XQuery Test Suite (XQTS).  
Contains some 8500 test cases.

XML Streaming and Bnux Binary XML Streaming
-------------------------------------------

     •    Added Streaming Serialization of Very Large Documents in  
the nux.xom.io package. Using memory consumption close to zero, the  
new StreamingSerializer enables writing arbitrarily large XML  
documents onto a destination, such as an OutputStream, both for  
standard textual XML as well as bnux binary XML (and STAX).
     •    Added streaming bnux deserialization for handling  
arbitrarily large input documents; uses an InputStream and an  
application provided NodeFactory just like a XOM Builder does.
     •    Added bnux serialization to an OutputStream.
     •    To enable true streaming, a serialized bnux document now  
consists internally of one or more independent pages, each at most 64  
KB large. Each page is a tokenized byte array containing a portion of  
the XML document, in document order. Once a page has been read/ 
written related (heavy) state can be discarded, freeing memory. No  
more than one page needs to be held in memory at any given time. For  
very large documents this reduces memory consumption, increases  
throughput and reduces latency. For small to medium sized documents  
it makes next to no difference.
     •    Slightly more compact bnux data format (version number has  
changed).
     •    Improved performance on reuse of BinaryXMLCodec instances  
(recommended).
     •    bnux serialization and deserialization is now roughly 3  
times faster when using documents containing namespaces, closely  
matching performance for documents without namespaces.
     •    Added Streaming conversion of standard textual XML to and  
from binary format, enabling conversion of arbitrarily large  
documents. The corresponding fire-bnux command line conversion tool  
now works in fully streaming mode, too.

Other
--------------

     •    Added AnalyzerUtil.getMostFrequentTerms(). Returns  
(frequency:text) pairs for the top N distinct terms (aka words),  
sorted descending by frequency (and ascending by term, if tied).
     •    Removed deprecated methods XOMUtil.toByteArray() and  
XOMUtil.toString(). The methods remain available but have been moved  
into class FileUtil.
     •    Added more test document collections in samples directory.
     •    Added package nux.xom.sandbox, a playground for kicking  
around various ideas and prototypes without any API compatibility  
guarantees. Code quality varies from sketchy to reliable, but is  
generally not nearly as well designed and tested as the remainder of  
Nux. In the future some of these classes may (or may not) graduate  
into stable packages.

Enjoy,
Wolfgang.