[xquery-talk] XQuery Update Facility and unwanted whitespace

Michael Kay mike at saxonica.com
Sat Mar 25 05:14:13 PDT 2017


Whitespace in certain places isn't reported by the XML parser to the XQuery processor, so there is no way the XQuery processor can preserve it. Examples are whitespace between the XML declaration and the first element node, and whitespace within a start or end tag.

Other things that aren't reported by the parser (and therefore can't be retained) include the choice of single-vs-double quotes around attribute values, entity references, CDATA section boundaries, redundant namespace declarations, and the order of attributes within a start tag.

Using textual diff tools on XML documents isn't really a viable strategy - you need to do the diff in a way that is XML-aware. One way is to canonicalize the two documents and compare their canonical forms. Canonicalizing takes a very similar view to XDM - though not 100% identical - as to what's significant in an XML document and what isn't.

Michael Kay
Saxonica


More information about the talk mailing list