From dflorescu at me.com Thu Apr 23 17:24:45 2015 From: dflorescu at me.com (daniela florescu) Date: Thu, 23 Apr 2015 17:24:45 -0700 Subject: [xquery-talk] MarkLogic using JSONiq for processing JSON ? Message-ID: <01C44C03-4AE1-4963-8EAF-95383D9F85DF@me.com> I heard that MarkLogic will be using JSONiq for processing JSON. http://www.jsoniq.org Sounds like wonderful news to me. Hope it?s true. Best Dana From benito at benibela.de Mon Apr 27 01:03:13 2015 From: benito at benibela.de (Benito van der Zander) Date: Mon, 27 Apr 2015 10:03:13 +0200 Subject: [xquery-talk] Necessary whitespace Message-ID: <553DED41.8080103@benibela.de> Hi, I just noticed that in XQuery (unlike in Pascal) 3div 1 is not a valid expression. Now I am wondering, which of these are valid expressions: 3!(10---.) 12!(.div 3) 12!(12 div.) 1<2 12 div-3 3!(12 div-.) Benito From christian.gruen at gmail.com Mon Apr 27 01:12:52 2015 From: christian.gruen at gmail.com (=?UTF-8?Q?Christian_Gr=C3=BCn?=) Date: Mon, 27 Apr 2015 10:12:52 +0200 Subject: [xquery-talk] Necessary whitespace In-Reply-To: <553DED41.8080103@benibela.de> References: <553DED41.8080103@benibela.de> Message-ID: Hi Benito, These ones are valid: > 3!(10---.) > 12!(.div 3) ...and these ones are not: > 12!(12 div.) > 1<2 > 12 div-3 > 3!(12 div-.) It would take some time to elaborate all the reasons for that (I would surely need to look it up as well), but "12 div-3" is maybe easy to explain: div-3 is also a valid name test and, thus, path expression. Cheers, Christian From g at 28.io Mon Apr 27 02:18:28 2015 From: g at 28.io (Ghislain Fourny) Date: Mon, 27 Apr 2015 11:18:28 +0200 Subject: [xquery-talk] Necessary whitespace In-Reply-To: References: <553DED41.8080103@benibela.de> Message-ID: Hi, I agree with Christian on the parses/doesn't parse classification. My understanding is as follows: 3 and div are non-delimiting terminal symbols, and hence must be separated by a whitespace. This is specified here: http://www.w3.org/TR/xquery-30/#id-terminal-delimitation 12!(12 div.) doesn't parse because the . after a QName requires a whitespace (. and - are listed as exceptions in the above link). The same applies to div-. 1<2 doesn't parse because << would be recognized as a token. 1<<2 parses though. I hope it helps! Kind regards, Ghislain On Mon, Apr 27, 2015 at 10:12 AM, Christian Gr?n wrote: > Hi Benito, > > These ones are valid: > >> 3!(10---.) >> 12!(.div 3) > > ...and these ones are not: > >> 12!(12 div.) >> 1<2 >> 12 div-3 >> 3!(12 div-.) > > It would take some time to elaborate all the reasons for that (I would > surely need to look it up as well), but "12 div-3" is maybe easy to > explain: div-3 is also a valid name test and, thus, path expression. > > Cheers, > Christian > _______________________________________________ > talk at x-query.com > http://x-query.com/mailman/listinfo/talk From leo.studer at varioweb.ch Mon Apr 27 02:49:08 2015 From: leo.studer at varioweb.ch (Leo Studer) Date: Mon, 27 Apr 2015 11:49:08 +0200 Subject: [xquery-talk] Serialization question In-Reply-To: References: <553DED41.8080103@benibela.de> Message-ID: Hello I use Oxygen with Saxon enterprise edition 9.6.05. My output method is ?text?. The following statement does its job correctly for $c in doc("factbook.xml")//country order by $c/@name return $c/@name/string() However I do not understand why for $c in doc("factbook.xml")//country order by $c/@name return $c/@name returns an empty sequence. What especially intrigues me is that ordering with $c/@name works fine (no conversion to string) and the output as text not. Any hints? Thanks in advance Leo From mike at saxonica.com Mon Apr 27 02:53:09 2015 From: mike at saxonica.com (Michael Kay) Date: Mon, 27 Apr 2015 10:53:09 +0100 Subject: [xquery-talk] Necessary whitespace In-Reply-To: References: <553DED41.8080103@benibela.de> Message-ID: Agreed. To confuse matters, though, I see that we still have the problematic statement in A.2 "When tokenizing, the longest possible match that is consistent with the EBNF is used." This to my mind has always suggested the idea that the tokenization is sensitive to the grammatical context. And in some cases it is; you don't want to go looking for QNames or IntegerLiterals when you're in DirElementContent, just because a QName or IntegerLiteral is longer than a Char. However, it could also be read as meaning that given "12 div3", tokenizing "div3" as one token is not consistent with the EBNF (it doesn't lead to a valid parse), so it should be tokenized as two tokens. I don't think that has ever been the intent, and I guess section A.2.2 on delimiting and non-delimiting terminals was added to eliminate this interpretation. Michael Kay Saxonica mike at saxonica.com +44 (0) 118 946 5893 On 27 Apr 2015, at 10:18, Ghislain Fourny wrote: > Hi, > > I agree with Christian on the parses/doesn't parse classification. > > My understanding is as follows: 3 and div are non-delimiting terminal > symbols, and hence must be separated by a whitespace. > > This is specified here: > > http://www.w3.org/TR/xquery-30/#id-terminal-delimitation > > 12!(12 div.) doesn't parse because the . after a QName requires a > whitespace (. and - are listed as exceptions in the above link). The > same applies to div-. > > 1<2 doesn't parse because << would be recognized as a token. > 1<<2 parses though. > > I hope it helps! > > Kind regards, > Ghislain > > On Mon, Apr 27, 2015 at 10:12 AM, Christian Gr?n > wrote: >> Hi Benito, >> >> These ones are valid: >> >>> 3!(10---.) >>> 12!(.div 3) >> >> ...and these ones are not: >> >>> 12!(12 div.) >>> 1<2 >>> 12 div-3 >>> 3!(12 div-.) >> >> It would take some time to elaborate all the reasons for that (I would >> surely need to look it up as well), but "12 div-3" is maybe easy to >> explain: div-3 is also a valid name test and, thus, path expression. >> >> Cheers, >> Christian >> _______________________________________________ >> talk at x-query.com >> http://x-query.com/mailman/listinfo/talk > > _______________________________________________ > talk at x-query.com > http://x-query.com/mailman/listinfo/talk From mike at saxonica.com Mon Apr 27 03:30:40 2015 From: mike at saxonica.com (Michael Kay) Date: Mon, 27 Apr 2015 11:30:40 +0100 Subject: [xquery-talk] Serialization question In-Reply-To: References: <553DED41.8080103@benibela.de> Message-ID: <7379CA9A-80B9-45E2-AA81-9FB856BCB098@saxonica.com> At first sight I would have expected an error. It appears to fall foul of rule 7 in http://www.w3.org/TR/xslt-xquery-serialization-31/#serdm It is a serialization error [err:SENR0001] if an item in S6 is an attribute node, a namespace node or a function. And indeed, Saxon reports: SENR0001: Cannot serialize a free-standing attribute node So I think Oxygen is suppressing this error somehow. The question then becomes, why does the spec do that? I think the answer is that XQuery picked up the serialization spec from XSLT, and XSLT never generates free-standing attribute nodes in its result, so the problem didn't arise there. XQ 3.1 introduces the serialization method "adaptive" which is designed to display something, without failure, regardless what you throw at it. For attributes, it shows name="value" not just the value, which is what you appear to want. Michael Kay Saxonica mike at saxonica.com +44 (0) 118 946 5893 On 27 Apr 2015, at 10:49, Leo Studer wrote: > Hello > > I use Oxygen with Saxon enterprise edition 9.6.05. > My output method is ?text?. > > The following statement does its job correctly > > for $c in doc("factbook.xml")//country order by $c/@name return $c/@name/string() > > However I do not understand why > > for $c in doc("factbook.xml")//country order by $c/@name return $c/@name > > returns an empty sequence. > What especially intrigues me is that ordering with $c/@name works fine (no conversion to string) and the output as text not. > > Any hints? > > Thanks in advance > Leo > _______________________________________________ > talk at x-query.com > http://x-query.com/mailman/listinfo/talk From jmdyck at ibiblio.org Mon Apr 27 08:44:59 2015 From: jmdyck at ibiblio.org (Michael Dyck) Date: Mon, 27 Apr 2015 11:44:59 -0400 Subject: [xquery-talk] Necessary whitespace In-Reply-To: References: <553DED41.8080103@benibela.de> Message-ID: <553E597B.3090909@ibiblio.org> On 15-04-27 05:53 AM, Michael Kay wrote: > Agreed. > > To confuse matters, though, I see that we still have the problematic > statement in A.2 "When tokenizing, the longest possible match that is > consistent with the EBNF is used." In the CR period for XQuery 3.0, we changed that sentence from "valid in the current context" to "consistent with the EBNF" (See meeting 541.) > This to my mind has always suggested the idea that the tokenization is > sensitive to the grammatical context. And in some cases it is; you don't > want to go looking for QNames or IntegerLiterals when you're in > DirElementContent, just because a QName or IntegerLiteral is longer than > a Char. Right. > However, it could also be read as meaning that given "12 div3", > tokenizing "div3" as one token is not consistent with the EBNF (it > doesn't lead to a valid parse), Yes, I believe that's how that sentence is supposed to be read. There are no possible continuations of "12 div3" that conform to the EBNF, but there *are* continuations of "12 div" that conform to the EBNF. So, when the tokenizer is positioned just before the 'd', "div" is the longest possible match (LPM) that is consistent with the EBNF, so the next token is "div". > so it should be tokenized as two tokens. Well, that's less clear, but I think it's one valid interpretation. > I don't think that has ever been the intent, and I guess section A.2.2 on > delimiting and non-delimiting terminals was added to eliminate this > interpretation. I don't think there's a problem with saying it's tokenized as two tokens. Just because a text can be tokenized doesn't mean it's free of syntax errors. And section A.2.2 gives just one of the many requirements that a sequence of tokens must satisfy in order to be error-free. (Specifically, "div" and "3" are adjacent non-delimiting terminal symbols, and so must be separated by Whitespace and/or Comments.) So, in that view, A.2.2 wasn't added to modify the interpretation of the LPM rule, it was added to flag some of the cases that the LPM rule "lets through". -Michael From dflorescu at me.com Tue Apr 28 10:15:40 2015 From: dflorescu at me.com (daniela florescu) Date: Tue, 28 Apr 2015 10:15:40 -0700 Subject: [xquery-talk] MarkLogic using JSONiq for processing JSON ? In-Reply-To: <01C44C03-4AE1-4963-8EAF-95383D9F85DF@me.com> References: <01C44C03-4AE1-4963-8EAF-95383D9F85DF@me.com> Message-ID: Dear Kurt (Cagle) on linkedin to the same answer bellow you answered me: "As I said, first glance says it's close, but I'm not completely up to date on the JSONIQ spec. I know when I talked to a key developer of mutual acquaintance, he indicated that he followed JSONIQ, but that was still while it was under development." Kurt, do you have now a better idea about the technical differences between the two JSOn query languages: JSONiq designed and supported by Zorba and the one supported by MarkLogic ? Or does someone else ? What is the technical rationale for making the two languages different ? Any strong technical reasons ? If there are no strong technical reasons, and the two are different just for the sake of being different, that's very sad. Relational databases survived for 30 years because those guys were brilliant business people and understood the power of a standard/common language and common APIs for all vendors. It strengthens the (entire) community to the point that, even 30 years later, it is almost impossible to get SQL out of their hands.... It's very unfortunate that the NoSQL community, and especially MarkLogic who considers themselves the "leaders" in this market, don't get that simple fact....and they had to twist JSONiq here and there in order to avoid admitting they use the language designed by the Zorba community and avoid calling it JSONiq.... Such a lack of vision is sad. But I digress. I am still curious if someone compiled a list of technical differences. Thanks, best regards Dana > On Apr 23, 2015, at 5:24 PM, daniela florescu wrote: > > I heard that MarkLogic will be using JSONiq for processing JSON. > http://www.jsoniq.org > > Sounds like wonderful news to me. > > Hope it?s true. > > Best > Dana > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benito at benibela.de Tue Apr 28 13:33:17 2015 From: benito at benibela.de (Benito van der Zander) Date: Tue, 28 Apr 2015 22:33:17 +0200 Subject: [xquery-talk] Necessary whitespace In-Reply-To: <553E597B.3090909@ibiblio.org> References: <553DED41.8080103@benibela.de> <553E597B.3090909@ibiblio.org> Message-ID: <553FEE8D.7070900@benibela.de> Hi Michael, > > > I don't think there's a problem with saying it's tokenized as two > tokens. Just because a text can be tokenized doesn't mean it's free of > syntax errors. And section A.2.2 gives just one of the many > requirements that a sequence of tokens must satisfy in order to be > error-free. (Specifically, "div" and "3" are adjacent non-delimiting > terminal symbols, and so must be separated by Whitespace and/or > Comments.) What if it parses it in 12!(12 div.) as two tokens? "." is a terminal symbol, and "div" is not a NCName there, just part of a MultiplicativeExpr. Or in 1<2 as "<" and "2" "<<" is longer, but not consistent. Cheers, Benito On 04/27/2015 05:44 PM, Michael Dyck wrote: > On 15-04-27 05:53 AM, Michael Kay wrote: >> Agreed. >> >> To confuse matters, though, I see that we still have the problematic >> statement in A.2 "When tokenizing, the longest possible match that is >> consistent with the EBNF is used." > > In the CR period for XQuery 3.0, we changed that sentence from > "valid in the current context" > to > "consistent with the EBNF" > (See meeting 541.) > >> This to my mind has always suggested the idea that the tokenization is >> sensitive to the grammatical context. And in some cases it is; you don't >> want to go looking for QNames or IntegerLiterals when you're in >> DirElementContent, just because a QName or IntegerLiteral is longer than >> a Char. > > Right. > >> However, it could also be read as meaning that given "12 div3", >> tokenizing "div3" as one token is not consistent with the EBNF (it >> doesn't lead to a valid parse), > > Yes, I believe that's how that sentence is supposed to be read. There > are no possible continuations of "12 div3" that conform to the EBNF, > but there *are* continuations of "12 div" that conform to the EBNF. > So, when the tokenizer is positioned just before the 'd', "div" is the > longest possible match (LPM) that is consistent with the EBNF, so the > next token is "div". > >> so it should be tokenized as two tokens. > > Well, that's less clear, but I think it's one valid interpretation. > >> I don't think that has ever been the intent, and I guess section >> A.2.2 on >> delimiting and non-delimiting terminals was added to eliminate this >> interpretation. > > I don't think there's a problem with saying it's tokenized as two > tokens. Just because a text can be tokenized doesn't mean it's free of > syntax errors. And section A.2.2 gives just one of the many > requirements that a sequence of tokens must satisfy in order to be > error-free. (Specifically, "div" and "3" are adjacent non-delimiting > terminal symbols, and so must be separated by Whitespace and/or > Comments.) > > So, in that view, A.2.2 wasn't added to modify the interpretation of > the LPM rule, it was added to flag some of the cases that the LPM rule > "lets through". > > -Michael > _______________________________________________ > talk at x-query.com > http://x-query.com/mailman/listinfo/talk > From jmdyck at ibiblio.org Tue Apr 28 14:17:05 2015 From: jmdyck at ibiblio.org (Michael Dyck) Date: Tue, 28 Apr 2015 17:17:05 -0400 Subject: [xquery-talk] Necessary whitespace In-Reply-To: <553FEE8D.7070900@benibela.de> References: <553DED41.8080103@benibela.de> <553E597B.3090909@ibiblio.org> <553FEE8D.7070900@benibela.de> Message-ID: <553FF8D1.5040201@ibiblio.org> On 15-04-28 04:33 PM, Benito van der Zander wrote: > Hi Michael, > >> I don't think there's a problem with saying it's tokenized as two tokens. >> Just because a text can be tokenized doesn't mean it's free of syntax >> errors. And section A.2.2 gives just one of the many requirements that a >> sequence of tokens must satisfy in order to be error-free. (Specifically, >> "div" and "3" are adjacent non-delimiting terminal symbols, and so must be >> separated by Whitespace and/or Comments.) > > What if it parses it in > 12!(12 div.) > as two tokens? > "." is a terminal symbol, and "div" is not a NCName there, just part of a > MultiplicativeExpr. As pointed out by Ghislain yesterday, the last paragraph of A.2.2 applies: if a QName or NCName is followed by a "." or "-", the two tokens must be separated by whitespace and/or Comments. > Or in > 1<2 > as "<" and "2" > > "<<" is longer, but not consistent. "<<" is longer than "<", and there are continuations of "1<<" that conform to the EBNF, so the LMP rule compels the tokenizer to pick "<<", which leads to raising an error at ">". Ghislain also said this yesterday. It's unclear what you mean by "consistent". If you mean that having the tokenizer pick "<<" is not consistent with parsing the string as: 1 < 2 then, yes, that's quite true. -Michael From jmdyck at ibiblio.org Tue Apr 28 14:22:01 2015 From: jmdyck at ibiblio.org (Michael Dyck) Date: Tue, 28 Apr 2015 17:22:01 -0400 Subject: [xquery-talk] Necessary whitespace In-Reply-To: <553FF8D1.5040201@ibiblio.org> References: <553DED41.8080103@benibela.de> <553E597B.3090909@ibiblio.org> <553FEE8D.7070900@benibela.de> <553FF8D1.5040201@ibiblio.org> Message-ID: <553FF9F9.4020005@ibiblio.org> On 15-04-28 05:17 PM, Michael Dyck wrote: > On 15-04-28 04:33 PM, Benito van der Zander wrote: >> Hi Michael, >> >> What if it parses it in >> 12!(12 div.) >> as two tokens? >> "." is a terminal symbol, and "div" is not a NCName there, just part of a >> MultiplicativeExpr. > > As pointed out by Ghislain yesterday, the last paragraph of A.2.2 applies: > if a QName or NCName is followed by a "." or "-", the two tokens must be > separated by whitespace and/or Comments. Oh, sorry, right, you're saying it's not an NCName. Hm, that might be a spec bug then. -Michael From benito at benibela.de Tue Apr 28 15:26:57 2015 From: benito at benibela.de (Benito van der Zander) Date: Wed, 29 Apr 2015 00:26:57 +0200 Subject: [xquery-talk] Necessary whitespace In-Reply-To: <553FF9F9.4020005@ibiblio.org> References: <553DED41.8080103@benibela.de> <553E597B.3090909@ibiblio.org> <553FEE8D.7070900@benibela.de> <553FF8D1.5040201@ibiblio.org> <553FF9F9.4020005@ibiblio.org> Message-ID: <55400931.1030403@benibela.de> Hi Michael, > > > On 04/28/2015 11:22 PM, Michael Dyck wrote: > > On 15-04-28 05:17 PM, Michael Dyck wrote: > >> On 15-04-28 04:33 PM, Benito van der Zander wrote: > >>> Hi Michael, > >>> > >>> What if it parses it in > >>> 12!(12 div.) > >>> as two tokens? > >>> "." is a terminal symbol, and "div" is not a NCName there, just > part of a > >>> MultiplicativeExpr. > >> > >> As pointed out by Ghislain yesterday, the last paragraph of A.2.2 > applies: > >> if a QName or NCName is followed by a "." or "-", the two tokens > must be > >> separated by whitespace and/or Comments. > > > > Oh, sorry, right, you're saying it's not an NCName. Hm, that might > be a spec bug then. > Yes > >> Or in >> 1<2 >> as "<" and "2" >> >> "<<" is longer, but not consistent. > > > "<<" is longer than "<", and there are continuations of "1<<" that > conform to the EBNF, so the LMP rule compels the tokenizer to pick > "<<", which leads to raising an error at ">". Ghislain also said this > yesterday. > > It's unclear what you mean by "consistent". If you mean that having > the tokenizer pick "<<" is not consistent with parsing the string as: Perhaps getting a consistent parsing tree? Theoretically a parser could parse it right-to-left and see 2 before < Cheers, Benito On 04/28/2015 11:22 PM, Michael Dyck wrote: > On 15-04-28 05:17 PM, Michael Dyck wrote: >> On 15-04-28 04:33 PM, Benito van der Zander wrote: >>> Hi Michael, >>> >>> What if it parses it in >>> 12!(12 div.) >>> as two tokens? >>> "." is a terminal symbol, and "div" is not a NCName there, just part >>> of a >>> MultiplicativeExpr. >> >> As pointed out by Ghislain yesterday, the last paragraph of A.2.2 >> applies: >> if a QName or NCName is followed by a "." or "-", the two tokens must be >> separated by whitespace and/or Comments. > > Oh, sorry, right, you're saying it's not an NCName. Hm, that might be > a spec bug then. > > -Michael > > > > _______________________________________________ > talk at x-query.com > http://x-query.com/mailman/listinfo/talk > > From jmdyck at ibiblio.org Tue Apr 28 15:51:51 2015 From: jmdyck at ibiblio.org (Michael Dyck) Date: Tue, 28 Apr 2015 18:51:51 -0400 Subject: [xquery-talk] Necessary whitespace In-Reply-To: <55400931.1030403@benibela.de> References: <553DED41.8080103@benibela.de> <553E597B.3090909@ibiblio.org> <553FEE8D.7070900@benibela.de> <553FF8D1.5040201@ibiblio.org> <553FF9F9.4020005@ibiblio.org> <55400931.1030403@benibela.de> Message-ID: <55400F07.3090103@ibiblio.org> On 15-04-28 06:26 PM, Benito van der Zander wrote: >> >>> Or in >>> 1<2 >>> as "<" and "2" >>> >>> "<<" is longer, but not consistent. >> >> >> "<<" is longer than "<", and there are continuations of "1<<" that >> conform to the EBNF, so the LMP rule compels the tokenizer to pick "<<", >> which leads to raising an error at ">". Ghislain also said this yesterday. >> >> It's unclear what you mean by "consistent". If you mean that having the >> tokenizer pick "<<" is not consistent with parsing the string as: > > Perhaps getting a consistent parsing tree? Well, again, if you're saying that having the tokenizer pick "<<" does not result in a "consistent parsing tree", that's quite true, because it doesn't result in *any* parse tree. > Theoretically a parser could parse it right-to-left and see 2 before < Theoretically, yes. But the thinking behind the LMP rule was presumably a left-to-right tokenization. -Michael