[xquery-talk] [ANN] nux-1.4 release
wolfgang.hoschek at mac.com
Wed Nov 30 08:40:14 PST 2005
Nux is an open-source Java toolkit making efficient and powerful XML
processing easy. Improvements and additions in this 1.4 release focus
on scalability, reliability and ease of use, maintaining API
compatibility with prior releases.
A detailed changelog is here: http://dsd.lbl.gov/nux/changelog.html
Downloads are here: http://dsd.lbl.gov/nux-download/releases/
XQuery and XOM
• Upgraded to xom-1.1-final (with compatible performance
patches). xom-1.0.x and xom-1.1.x continue to work fine, albeit less
• Upgraded to saxonb-8.6.1, implementing XQuery W3C Candidate
Recommendation, 3 November 2005 (Saxon 8.6, 8.5, 8.4, 8.3 still
continue to work fine).
• saxon8-xom.jar is nomore needed as its contents are
directly compiled into nux.jar, improving simplicity and reliability.
• Constructing a new compiled XQuery object is now about 20
• Added driver for official W3C XQuery Test Suite (XQTS).
Contains some 8500 test cases.
XML Streaming and Bnux Binary XML Streaming
• Added Streaming Serialization of Very Large Documents in
the nux.xom.io package. Using memory consumption close to zero, the
new StreamingSerializer enables writing arbitrarily large XML
documents onto a destination, such as an OutputStream, both for
standard textual XML as well as bnux binary XML (and STAX).
• Added streaming bnux deserialization for handling
arbitrarily large input documents; uses an InputStream and an
application provided NodeFactory just like a XOM Builder does.
• Added bnux serialization to an OutputStream.
• To enable true streaming, a serialized bnux document now
consists internally of one or more independent pages, each at most 64
KB large. Each page is a tokenized byte array containing a portion of
the XML document, in document order. Once a page has been read/
written related (heavy) state can be discarded, freeing memory. No
more than one page needs to be held in memory at any given time. For
very large documents this reduces memory consumption, increases
throughput and reduces latency. For small to medium sized documents
it makes next to no difference.
• Slightly more compact bnux data format (version number has
• Improved performance on reuse of BinaryXMLCodec instances
• bnux serialization and deserialization is now roughly 3
times faster when using documents containing namespaces, closely
matching performance for documents without namespaces.
• Added Streaming conversion of standard textual XML to and
from binary format, enabling conversion of arbitrarily large
documents. The corresponding fire-bnux command line conversion tool
now works in fully streaming mode, too.
• Added AnalyzerUtil.getMostFrequentTerms(). Returns
(frequency:text) pairs for the top N distinct terms (aka words),
sorted descending by frequency (and ascending by term, if tied).
• Removed deprecated methods XOMUtil.toByteArray() and
XOMUtil.toString(). The methods remain available but have been moved
into class FileUtil.
• Added more test document collections in samples directory.
• Added package nux.xom.sandbox, a playground for kicking
around various ideas and prototypes without any API compatibility
guarantees. Code quality varies from sketchy to reliable, but is
generally not nearly as well designed and tested as the remainder of
Nux. In the future some of these classes may (or may not) graduate
into stable packages.
More information about the talk