[xquery-talk] worries about the future of JSON query processing

Wed Jun 24 11:27:49 PDT 2015

Dear Ihe,

you seem very concerned that the half-baked SQL-based languages for querying JSON will
prevale, and years of experience in building XQuery will be lost.

I don’t share your fears and concerns, and here is why.

1. There WILL be language “miscarriages” in the industry: MongoDB’s one, N1QL one, others ?
==========================================================================

Those are the ones half baked, with no semantics, built bottom up and more by example rather then 
by specification.

Those will NOT survive. For plenty of reasons. Even if millions of dollars will be spent into marketing
for them.

Nature usually takes care of those, eventually.

- no basic theoretical foundations, no specification, so no researcher will touch them with a stick, and no professor will
touch them for teaching with a stick
- no serious optimization behind them, because no serious specification
- nobody who knows what they are DOING in those respective teams (!!!!!!)  — I know the people 
who build those, they have NO  clue of what they are doing
- they compete AGAINST each other, and no company will support the OTHER company’s language
- they are built bottom up, one operator at a time, without a global picture…..so eventually they’ll become
so complex nobody will be to use them.
- neither of those languages will have enough share of the market for tools (like Tableau) to be interested 
to integrate them.

Take for example N1QL. Which database is likely to support it ?  CouchDB, because they made it.

Oracle, SQL Server, DB2 will NOT support it. I can bet a large amount of money on that. People working in those companies
KNOW what they are talking about in terms of query languages and query processing, and they will NOT accept those half baked attempts.

HP’s Vertica and other of it’s kind is likely to follow Oracle, SQL Server and DB2 bunch, and not CouchDB.

Who else ? 
  - MySQL, possible but I REALLY doubt it, because I think Oracle would like to have the same language on both sides (big Oracle and MySQL)
  - Postgress, possible (and that would be a concern, indeed..)

Among the NoSQL databases: none of the other then CouchDB will support it. Mongo, Cassandra, Riak, niet. None of them will support it, because
they have no interest to give CouchDB the upper hand.

MarkLogic !? The jury is out of that, I’ve seen them doing enough silly things recently (I mean…. Javascript on the SEVER side !???) 
so who knows, they might be silly enough to do that. They are in an awkward position right now.

Elastic Search !? Maybe, but those guys are not able to write a query processor if they would need to save their lives. It’s not in their DNA…

Overall…. nothing to worry about.

2. Scientific exercises like SQL++ (San Diego Univ), AQL (Irvine University)
=========================================================

Unlike the previous category, those are likely to be well specified, with a clean semantics and with some aesthetics
into them.

Those are very good for research, and  researchers will use them for a long time to teach and to prove all kinds of interesting stuff.

Unfortunately, very likely they will not make it into the industry, for the following reasons:
- researchers have no clue how to talk to product people
- PhD students don’t know how to write industrial -strength code even if they’d have to save their lives (and it’s not their job
anyway).
- they have limited expressive power to be usable in practice

But they will be popular (on paper).

3. Extensions of REAL SQL with JSON support. This will be EXACTLY like SQL/XML, but for JSON.
=============================================================================

This one will be proposed by a consortium of Oracle+ SQLServer+IBM., and implemented by all three, plus eventually
all the other SQL-based databases (Vertica, etc).

I see it coming. It’s coming …..every day now…. :-)

But things will move slowly there, as Oracle is producing a new egg (aka a new release) every 5 years so… where is the rush !? :-)

This one WILL be used and useful of many applications: for the cases where JSON is simple and the “query” is simple. And there 
is no shame in it…. there are MANY of those simple cases.

As soon as you get out of the simple JSON/simple query, you are out of luck with this approach, though.
(as it was the case with XML and SQL/XML).

4. Imperative programming languages with sequence comprehension designed for JSON
=====================================================================

Take Javascript for example, and add a FLWOR expression to it, with some well designed path expressions
ala XPath.

Result could be really nice (as nice as Javascript can be..), and usable in many applications.

As the number of Javascript developers is large, this will be widely used.

Could run both server side, as well as client side.  (optimization on server side for large data sets a little difficult I
would say though…)

5. XQuery-Based extensions for JSON (e.g. JSONiq)
==========================================

This approach is for the hard core:  capable of dealing with complex JSON, complex processing and 
be able to be optimized for large datasets.

Those will be the ones capable of dealing with “serious” JSON, aka some JSON that has other basic types
then numeric and strings….. like a date, maybe !??? :-)

Those will be the ones capable to deal with JSON with schemas and types.

=====================================================================

If you look carefully, only (3), (4) and (5) have actually chances of success in industry. And all three will probably be used
in practice. They’ll probably share the market for a long time.

But I would like to notice something: one of the “bugs” of XQuery becomes now a “feature”: the complexity.

Once you have an XQuery/JSONiq processor, one can support almost EVERYTHING ELSE!!. It’s just a matter of putting another
parser with a slightly different syntax. Everything else is a subset !!!!

All the existing XQuery processors, build with lots of work and sweat over the past 10 years, are rich and mature enough
 now to evolve in various directions….depending where the market goes.

..…. and this,  MUCH faster then any OTHER query processor for JSON.

My belief is that a query processor is like a good French wine…. needs time to mature, and there is nothing anybody can
do to speed up the process.

Well, the fact that the existing XQuery processors ALREADY spent 10+ years maturing their query processor means that now
they have 10+ years leg of advance over the others query processors who start from scratch now.

This is a MAJOR advantage.

I hope that XQuery implementors will take advantage of it !!!! :-)

Best regards
Dana