[xquery-talk] [ANN] Rumble 1.5.1 "Southern Live Oak"
gfourny at inf.ethz.ch
Tue Apr 7 05:16:45 PDT 2020
I am happy to announce the newest release of Rumble, the engine running JSONiq on Spark. JSONiq is XQuery's little brother that natively supports JSON-like data.
The 1.5.1 release contains many bugfixes, stability improvements, as well as:
- A growing list of input formats: now JSON (structured or semi-structured), Parquet, text, CSV, SVM, ROOT, and more on the way.
- Unified support for seamlessly reading and writing to the local file system, HDFS, S3, etc (the CLI arguments --query-path and --output-path as well as paths passed to input functions support any file system as long as the environment has the classes needed for the desired schemes).
- Many new builtin functions (XPath & XQuery 3.0 functions) are supported, i.e., our coverage of the standard continues to increase.
- Many more functions that used to force a materialization are now pushed down and executed in parallel (tail(), head(), etc).
- Navigation expressions are now faster if the data is highly structured (i.e., they automagically leverage Spark's dataframes, for example if the data was read from Parquet or CSV), but of course continue to work efficiently if the data is heterogeneous (semi-structured JSON). The user doesn't see the difference in JSONiq (data independence).
- More extensively tested on clusters such as Amazon EMR reading from and writing to S3.
- Compatibility with the latest Spark versions (2.4.x).
- And more hidden gems under development, to be announced later.
The release is free and open source (it is a 8MB jar that you can simply wget over to your laptop or cluster with Spark installed, ready to use).
Many thanks to all our contributors, many of whom are students working on their projects or theses.
More information about the talk