<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Another message that I sent this morning, and it didn't make it though.....until now.<div class=""><br class=""></div><div class="">Thanks Marklogic for opening up the blockade.</div><div class=""><br class=""></div><div class="">I guess the MarkLogic  lawyers needed a little bit of time to scratch their heads about what to do.....(and BTW,</div><div class="">silencing me isn't a solution... I lived in a communist country for 22 years... they've tried that ... didn't work)</div><div class=""><br class=""></div><div class="">But the following message is a serious discussion about the state of affairs in the query languages universe for NoSQL</div><div class="">databases.</div><div class=""><br class=""></div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On May 28, 2015, at 2:20 PM, daniela florescu <<a href="mailto:dflorescu@me.com" class="">dflorescu@me.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">The NoSQl industry is extremely successful, used everywhere, and  considered by many the child prodigee of the database industry.<div class=""><br class=""></div><div class=""><br class=""></div><div class="">They are proud of themselves because they satisfy user needs, aka:  they store data:</div><div class=""><div class="">(a) which is not in 1st normal form (aka nested, pre-aggregated)</div><div class="">(b) without schema</div><div class=""><br class=""></div><div class="">…to the practical  benefit of:</div><div class="">(a) the application getting the data out of the database exactly as the application needs it, and not </div><div class="">altered through a normalization phase.</div><div class="">(b) the lack of fixed schema helps with data flexibility… things change extremely quickly inside an application</div><div class="">those days (fields being added, deleted, changed, etc)</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">So far so good, and I think until here they are all right.</div><div class=""><br class=""></div><div class="">[[ One may think that this looks a little bit like … XML, but hey, they don’t like XML. Fine.]]</div><div class=""><br class=""></div><div class="">The problems comes when they try to QUERY this data.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">The NoSQL industry is re-inventing the wheel from scratch, and in a very chaotic and ad-hoc manner.</div><div class=""><br class=""></div><div class="">Just  look at the sad state of affairs in terms of  query languages and their semantics.</div><div class=""><br class=""></div><div class="">I am just look at the ones who claim that they can store nested and schema-less data (JSON-like, or XML-lIke)</div><div class=""><br class=""></div><div class="">(1) MongoDB</div><div class=""><a href="http://docs.mongodb.org/manual/tutorial/query-documents/" class="">http://docs.mongodb.org/manual/tutorial/query-documents/</a></div><div class=""><br class=""></div><div class="">Note: pure JSON. Couldn’t find a simple sort, for example. Etc. Etc.</div><div class=""><br class=""></div><div class="">(2) Cassandra/DataStax</div><div class=""><a href="http://www.datastax.com/wp-content/uploads/2013/03/cql_3_ref_card.pdf" class="">http://www.datastax.com/wp-content/uploads/2013/03/cql_3_ref_card.pdf</a></div><div class=""><br class=""></div><div class="">Nore: not even an OR, or a NOT. And does it mean to sort on schema-less data ?</div><div class=""><br class=""></div><div class="">(3) Spark/DataBricks</div><div class=""><a href="https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html" class="">https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html</a></div><div class=""><br class=""></div><div class="">Note: sounds more like an import/export facility… but they call it a JSON Query language</div><div class=""><br class=""></div><div class="">(4) Elastic Search</div><div class=""><a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html" class="">https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html</a></div><div class=""><br class=""></div><div class="">Note: very sophisticated full text,but not structured search of any serious kind. Just some simple aggregates (sum, etc)</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">(5) Mulesoft</div><div class=""><strong class="" style="color: rgb(50, 48, 49); font-family: openSans, Arial, sans-serif; font-size: 16px; line-height: 24px; box-sizing: inherit;"><a href="https://www.mulesoft.com/press-center/new-release-june-2015?utm_source=linkedin&sthash.axJqiSBn.mjjo" class="">https://www.mulesoft.com/press-center/new-release-june-2015?utm_source=linkedin&sthash.axJqiSBn.mjjo</a></strong></div><div class=""><br class=""></div><div class="">Note: not only they seem to have their own JSON query language, but even their own XML query language, it seems. couldn’t find more details.</div><div class=""><br class=""></div><div class="">(6) Hive</div><div class=""><a href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF" class="">https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF</a></div><div class=""><br class=""></div><div class="">Note: multiple languages (Xpath, some json, some SQL, glued together somehow chaotically)</div><div class=""><br class=""></div><div class="">I can fill in tons of pages with YET-ANOTHER-LANGUGAGE-LIKE-THIS. </div><div class=""><br class=""></div><div class="">(7) MarkLogic</div><div class=""><br class=""></div><div class=""><a href="https://docs.marklogic.com/8.0/guide/app-dev/json" class="">https://docs.marklogic.com/8.0/guide/app-dev/json</a></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">==============</div><div class=""><br class=""></div><div class="">Now I can spot several mistake here:</div><div class=""><br class=""></div><div class="">1. None of those query language has a clearly designed, mathematical data model. in the absence of such a data model, that describes the input, the output</div><div class="">and the intermediate results of a query, how can we define a clean semantics ?</div><div class=""><br class=""></div><div class="">2. All of them have a hacky semantics — “let’s run it and we’ll se what the result is” kind of thing. The semantics in most cost corner cases — and by definition</div><div class="">semi-structured data is ONLY corner cases -- is not defined.</div><div class=""><br class=""></div><div class="">3. Some try to piggy back on the SQL semantics, ignoring the fact that the SQL was designed to work on relations, and JSON (or in general, nested data) </div><div class="">has nothing to do with relations.  SQL semantics cannot be “ported”….just because we reuse the same keywords.</div><div class=""><br class=""></div><div class="">4. None attempted to define a type system (even a basic one for atomic types like dates, and arithmetics on them..) and a schema language.</div><div class=""><br class=""></div><div class="">==============</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">Now maybe it’s clear why I am so sad that the XQuery community, instead of trying to help the younger and naive NoSQL community, which still believes that</div><div class="">SQL is “good enough”, and using the SELECT-FROM-WHERE keywords is the magic bullet to define the semantics of any kind of query language, the XQuery community</div><div class=""> is still looking at their own navel, and marveling, like the well known CEO: "we can handle flexible data" !!!</div><div class=""><br class=""></div><div class="">Just compare those languages I listed above with the work that has been done in the past 16 years in XQuery, and the correctness and the complexity of the result</div><div class="">vs, the hacky solutions above.</div><div class=""><br class=""></div><div class="">P.S. And yes, that work from XQuery was used 100% in the design of JSONiq, which was designed with the dual goal in mind:</div><div class="">(a) reuse 100% of the experience of design and implementation of XQuery and</div><div class="">(b) provide a query language that is synactically and semantically acceptable for the JSON community.</div><div class=""><br class=""></div><div class="">if we succeeded or not, that’s another story, but I am not aware of any other solution that even comes CLOSE to that goal.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">Best regards</div><div class="">Dana</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div></div></div></div></blockquote></div><br class=""></div></body></html>