[xquery-talk] [xml-dev] Mistakes made in the design of XQuery 3.1
dflorescu at me.com
Sat May 30 11:12:23 PDT 2015
I will give you a somewhat short answer to that question but we can expand only if necessary.
True in general for every query language
1. In general I think a “query” language should be functional and compositional.
(besides having a high expressive power, it has other advantages like being able to do data flow analysis
through the function invocations — which is necessary for almost EVERYTHING inside a database,
from logging to index detection to optimizability)
SQL got only a “little” but of comparability only after decades of existence….for my taste it’s really clunky.
I never remember what I can compose with what — and never understand why not !??
2. In general a query language should have the FLWOR-like expression (called it as you like, SELECT-FROM-WHERE
whatever). In mathematics (or theoretical computer science) it is called monoid comprehension.
The more powerful the FLWOR like expression, the more expressive power of the query language.
I think XQuery has the best designed expression of this kind in existence. It’s compositional, elegant, symmetrical.
The semantics is very well defined, and … if you look well… NOWHERE IN THE DEFINITION OF THE SEMANTICS
OF THE FLOWR EXPRESSION THE WORD XML OCCURS.
The FLWOR expression of XQuery is a construct that has NOTHING to do with XML !!
3. In general, a query language is not a query language unless is optimizable.(aka the ability for a compiler to be able to
derive from one syntax many different ways of evaluating the result, some being more optimal then others).
I think during the design of XQuery a HUGE deal of attention was done to the optimizability of the language. The implementations
show that we didn’t get it wrong. Most implementations have decent response time (even though alas, there is no standard benchmark).
Things that are absolutely necessary for semi-structured query languages
In schema-less data, we never know what we’ll find. We HAVE to be able to branch based on what we find in the data while searching.
2. Functions and recursive functions
While dealing with nested, pre-aggregated data, we need to navigate structures on unknown depth, and for this we really need functions
and recursive functions.
3. Error management: try/catch
Even with the best precautions, while dealing with data of unknown structure, we can ALWAYS find unexpected stuff: an integer when we
thought we’ll get a string, and we try to cast it to a date. Hence, an error.
Imagine you processed already 1 million rows, and suddenly one of those rows is “bad”. In the absence of a try/catch the whole result will be an exception
and your hard work on a million rows is lost.
Now those three things: switches of all kinds, recursive functions and error management change the good old query processing DRAMATICALLY.
(why is that ? Because they make difficult the famous data flow analysis I just talked about — and which is a necessity in any query optimizer).
Relational database optimizers don’t deal with such difficulties, because they didn’t have them.
That’s a short answer of why XQuery is better for processing semi-structured data then any other alternative out there.
Note: REMARK THAT IN THIS WHOLE EMAIL I NEVER PRONOUNCED THE WORD XML.
XQuery, despite it’s very unfortunate name, is NOT an XML query language (ONLY).
It’s basic principles have nothing to do with XML.
That’s why in JSONiq it was trivial to take out the “/“ navigation, replace it with “.” navigation, replace the node constructors with object
constructors, and here we go.
> On May 30, 2015, at 10:45 AM, Ihe Onwuka <ihe.onwuka at gmail.com> wrote:
> On Sat, May 30, 2015 at 12:42 PM, daniela florescu <dflorescu at me.com <mailto:dflorescu at me.com>> wrote:
>> Great. Well JSONiq is my tool of choice for dealing with JSON.
>> Can you provide any insights to the graph database landscape. A conversation on xml-dev with Peter Hunsberger has persuaded me that a dual representation (XML for serving data and graph for running algorithms) is probably what I need but I prefer to have some basis for evaluation before jumping in with any particular product.
> Unfortunately I cannot help you here. I tried to keep up with the graph languages, but lost it at some
> point. Things are moving too fast.
> And unlike NoSQL query languages, where the situation is really pathetic, the graph languages and their implementations
> are not that bad, I think. So there are reasonable choices...
> Ok back to these NoSql offerings I have a question.
> You are always telling these vendors that their products are not databases they are data stores. Now from that and the limited amount i have read (it is difficult to continue reading beyond the point where you know this is not a product you want to use) it sounds like these products are little more than glorified VSAM files.
> I get why SQL is not an appropriate basis for a language for these data stores but I don't yet see how we make the leap from XQuery being a language for particular types of semi-structured data to it being the basis of a query language suitable for semi-structured data in principle, unless you are talking in very broad terms about the features such a query language should have and what principles should guide it's design.
> P.S. 20 years ago I wrote a paper about a graph language (we thought at that time that semi-stuctured data will necessarily be a graph…)
> which kind of influenced the design of SPARQL.
> http://www.en.pms.ifi.lmu.de/publications/projektarbeiten/Felix.Weigel/xmlindex/material/fernandez97strudel.pdf <http://www.en.pms.ifi.lmu.de/publications/projektarbeiten/Felix.Weigel/xmlindex/material/fernandez97strudel.pdf>
> But of course, this is no help to you, other then intellectual curiosity :-)
> People who are not intellectually curious end up writing blogs about why you should not do what they did. Then again they also get the kudos of being invited to give talks about their cock-ups to other people who are not intellectually curious but want to learn from the bitter experience of others.
> But yes i will try and give it a read.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the talk