[xquery-talk] Necessary whitespace

Michael Dyck jmdyck at ibiblio.org
Tue Apr 28 14:17:05 PDT 2015


On 15-04-28 04:33 PM, Benito van der Zander wrote:
> Hi Michael,
>
>> I don't think there's a problem with saying it's tokenized as two tokens.
>> Just because a text can be tokenized doesn't mean it's free of syntax
>> errors. And section A.2.2 gives just one of the many requirements that a
>> sequence of tokens must satisfy in order to be error-free. (Specifically,
>> "div" and "3" are adjacent non-delimiting terminal symbols, and so must be
>> separated by Whitespace and/or Comments.)
>
> What if it parses it in
> 12!(12 div.)
> as two tokens?
> "." is a terminal symbol, and "div" is not a NCName there, just part of a
> MultiplicativeExpr.

As pointed out by Ghislain yesterday, the last paragraph of A.2.2 applies: 
if a QName or NCName is followed by a "." or "-", the two tokens must be 
separated by whitespace and/or Comments.

> Or in
> 1<<a>2</a>
> as "<" and "<a>2</a>"
>
> "<<" is longer, but not consistent.

  "<<" is longer than "<", and there are continuations of "1<<" that conform 
to the EBNF, so the LMP rule compels the tokenizer to pick "<<", which leads 
to raising an error at ">". Ghislain also said this yesterday.

It's unclear what you mean by "consistent". If you mean that having the 
tokenizer pick "<<" is not consistent with parsing the string as:
    1 < <a>2</a>
then, yes, that's quite true.

-Michael



More information about the talk mailing list