[xquery-talk] the sad state of query languages for semi-structured data in the NoSQL industry

Fri May 29 06:12:27 PDT 2015

On Thu, May 28, 2015 at 5:20 PM, daniela florescu <dflorescu at me.com> wrote:

> The NoSQl industry is extremely successful, used everywhere, and
>  considered by many the child prodigee of the database industry.
>
>
I could have sworn it is the unacknowledged hip but bastard grandchild of
the network and hierarchical databases of the 60's and 70's.... so correct
me where I am wrong in what follows.

>
> They are proud of themselves because they satisfy user needs, aka:  they
> store data:
> (a) which is not in 1st normal form (aka nested, pre-aggregated)
> (b) without schema
>
> …to the practical  benefit of:
> (a) the application getting the data out of the database exactly as the
> application needs it, and not
> altered through a normalization phase.
>

Which can give you blazing fast  performance IFF. But to take an example
from my movie project. We have stored movie reviews by critic. You pull up
a page for the critic and get all the movie reviews he has ever written.
Then the client suddenly turns around (as mine did) and says I want to pull
up the movie and get all the reviews the different critics have written.
That query isn't going to be fast, and if you are not working with a proper
query language it might not be straightforward to write. So not only do you
not get a free lunch on the performance, you mind end up with a double
whammy.

In that sense nothing has changed  from db's of 60's and 70's .

Enter relational and you had (after normalisation) a database design that
was neutral to the queries that were to be run as there was no nesting. In
addition you got a proper relational query language and  something very
important - query optimisation (in theory at least) for free.

> (b) the lack of fixed schema helps with data flexibility… things change
> extremely quickly inside an application
> those days (fields being added, deleted, changed, etc)
>
>
How much data independence does that afford you.

>
> So far so good, and I think until here they are all right.
>
> [[ One may think that this looks a little bit like … XML, but hey, they
> don’t like XML. Fine.]]
>
> The problems comes when they try to QUERY this data.
>
>
> The NoSQL industry is re-inventing the wheel from scratch, and in a very
> chaotic and ad-hoc manner.
>
> Just  look at the sad state of affairs in terms of  query languages and
> their semantics.
>
> <snipped/>
>
> ==============
>
> Now I can spot several mistake here:
>
> 1. None of those query language has a clearly designed, mathematical data
> model. in the absence of such a data model, that describes the input, the
> output
> and the intermediate results of a query, how can we define a clean
> semantics ?
>
> 2. All of them have a hacky semantics — “let’s run it and we’ll se what
> the result is” kind of thing. The semantics in most cost corner cases — and
> by definition
> semi-structured data is ONLY corner cases -- is not defined.
>
> 3. Some try to piggy back on the SQL semantics, ignoring the fact that the
> SQL was designed to work on relations, and JSON (or in general, nested
> data)
> has nothing to do with relations.  SQL semantics cannot be “ported”….just
> because we reuse the same keywords.
>
>
A big reason why people in Analytics who know what they are talking about
are keen to use SQL is because you get query optimisation for free.

> 4. None attempted to define a type system (even a basic one for atomic
> types like dates, and arithmetics on them..) and a schema language.
>
> Now maybe it’s clear why I am so sad that the XQuery community, instead of
> trying to help the younger and naive NoSQL community, which still believes
> that
> SQL is “good enough”, and using the SELECT-FROM-WHERE keywords is the
> magic bullet to define the semantics of any kind of query language, the
> XQuery community
>  is still looking at their own navel, and marveling, like the well known
> CEO: "we can handle flexible data" !!!
>
> Just compare those languages I listed above with the work that has been
> done in the past 16 years in XQuery, and the correctness and the complexity
> of the result
> vs, the hacky solutions above.
>
> P.S. And yes, that work from XQuery was used 100% in the design of JSONiq,
> which was designed with the dual goal in mind:
> (a) reuse 100% of the experience of design and implementation of XQuery and
> (b) provide a query language that is synactically and semantically
> acceptable for the JSON community.
>
>
It's called Javascript. Also known as Python.

> if we succeeded or not, that’s another story, but I am not aware of any
> other solution that even comes CLOSE to that goal.
>
>
They don't share that goal.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://x-query.com/pipermail/talk/attachments/20150529/21ff4894/attachment.html>