[xquery-talk] XQuery and id()/idref(); Controlling the children of nodes in the result sequence

Michael Kay mike at saxonica.com
Wed Apr 23 12:08:08 PDT 2008


I'm afraid I haven't tried to understand all the details of your query, but
I'll try to answer the questions anyway.

> - Is there any way to suppress the output of the namespaces 
> in each element? 

You can get rid of unused namespaces using "declare copy-namespaces
no-preserve;" in the query prolog. For namespaces that are needed in the
output, the way to ensure they are only declared once at the top level is to
declare them explicitly when you create the outermost element, for example

<resultset xmlns:a="http://www.example.com/">
  ...
</resultset>

- The third question I'd like to ask concerns the use of the 
> fn:idref function in XQuery. 

The most likely cause of this problem is that your input document wasn't
validated against the schema. With Saxon, make sure you use -val:strict on
the command line, and protect yourself against the error by using type
declarations in the query:

declare variable $input as document-node(schema-element(a:collection))
  := doc('instance.xml');

Another gotcha with idref() is that it selects the attribute node, not the
containing element.

> - The biggest issue is that the b:para element is output 
> twice: 

I'm not really sure what you want here, but your path expression

$d/a:collection/a:entry/a:data//*[@a:span = $s/@xml:id]

selects both the b:text element and its b:para child, and therefore both are
copied (each with a full subtree) to the output. If you don't want to copy
the b:para element in its own right, then don't select it.

Michael Kay
http://www.saxonica.com/



> -----Original Message-----
> From: talk-bounces at x-query.com 
> [mailto:talk-bounces at x-query.com] On Behalf Of Maik Stührenberg
> Sent: 23 April 2008 10:24
> To: talk at x-query.com
> Subject: [xquery-talk] XQuery and id()/idref(); Controlling 
> the children of nodes in the result sequence
> 
> Hello,
> 
> I'm new to the list and tried to find the answer to my 
> questions in several locations (including the list archive). 
> So I apologize if I haven't searched thoroughly enough and 
> the anwer has been given already.
> 
> Here's my problem:
> 
> We use a standoff annotation format for storing multiple 
> annotated text files. The text files are used for defining 
> a:span elements which delimit the textual information 
> annotated by means of start and end positions (see example below).
> The annotation is stored separately as children of the a:data 
> element. 
> In principle, everything is allowed underneath the a:data 
> element (in the underlying XSD 'a.xsd' the a:data element is 
> a wrapper for elements derived from a different namespace), 
> however, there won't be any text nodes, only elements 
> containing other elements or empty elements. So I won't have 
> any information about the hierarchy of the children of a:data.
> Connection between annotation and the annotated text is saved 
> by the a:span attributes (which is declared as xs:IDREF in the XSD).
> 
> <a:collection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>    xsi:schemaLocation="http://www.example.org/a a.xsd"
>    xmlns="http://www.example.org/a" 
> xmlns:a="http://www.example.org/a">
>    <a:entry xml:id="c1" type="text">
>      <a:spans>
>        <a:span xml:id="seg1" start="0" end="20"/>
>        <a:span xml:id="seg2" start="0" end="20"/>
>        <a:span xml:id="to1" start="0" end="4"/>
>        <a:span xml:id="to2" start="5" end="8"/>
>      </a:spans>
>      <a:data xmlns:b="http://www.example.org/b"
>        xsi:schemaLocation="http://www.example.org/b b.xsd">
>        <b:text a:span="seg1">
>          <b:para a:span="seg1"/>
>        </b:text>
>      </a:data>
>      <a:data xmlns:c="http://www.example.org/c"
>        xsi:schemaLocation="http://www.example.org/c c.xsd">
>        <c:sentence id="w35" a:span="seg2">
>          <c:word a:span="to1" id="w36"/>
>          <c:word a:span="to2" id="w37"/>
>          <!-- ... -->
>        </c:sentence>
>      </a:data>
>    </a:entry>
> </a:collection>
> 
> When I try to use an XQuery to subsum all annotation that 
> corresponds to a specific a:span element with the following 
> XQuery example, I receive the output below.
> 
> declare namespace a="http://www.example.org/a"; declare 
> namespace b="http://www.example.org/b"; declare namespace 
> c="http://www.example.org/c"; element resultset { let $d := 
> doc('instance.xml') for $s in $d/a:collection/a:entry/a:spans/a:span
> return
>    <result span="{$s/@xml:id}" start="{$s/@start}" end="{$s/@end}">
>      { $d/a:collection/a:entry/a:data//*[@a:span = $s/@xml:id] }
>    </result>
> }
> 
> <resultset>
>    <result start="0" end="20" span="seg1">
>      <b:text xmlns:b="http://www.example.org/b"
>         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>         xmlns="http://www.example.org/a"
>         xmlns:a="http://www.example.org/a"
>         a:span="seg1">
>        <b:para a:span="seg1"/>
>      </b:text>
>      <b:para xmlns:b="http://www.example.org/b"
>         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>         xmlns="http://www.example.org/a"
>         xmlns:a="http://www.example.org/a"
>         a:span="seg1"/>
>    </result>
>    <result start="0" end="20" span="seg2">
>      <c:sentence xmlns:c="http://www.example.org/c"
>         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>         xmlns="http://www.example.org/a"
>         xmlns:a="http://www.example.org/a"
>         id="w35"
>         a:span="seg2">
>        <c:word a:span="to1" id="w36"/>
>        <c:word a:span="to2" id="w37"/>
>        <!-- ... -->
>       </c:sentence>
>     </result>
>    <result start="0" end="4" span="to1">
>      <c:word xmlns:c="http://www.example.org/c"
>         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>         xmlns="http://www.example.org/a"
>         xmlns:a="http://www.example.org/a" a:span="to1"
>         id="w36"/>
>    </result>
>    <result start="5" end="8" span="to2">
>      <c:word xmlns:c="http://www.example.org/c"
>         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>         xmlns="http://www.example.org/a"
>         xmlns:a="http://www.example.org/a" a:span="to2"
>         id="w37"/>
>    </result>
> </resultset>
> 
> Several things are not perfect here:
> - Is there any way to suppress the output of the namespaces 
> in each element? Or to be more specific: what do I have to 
> change to output all namespaces once (and only once) in the 
> resultset element?
> 
> - The biggest issue is that the b:para element is output 
> twice: as child element of the b:text element (which is quite 
> fine) and alone. The same problem appears when looking at the 
> c:word elements: they should not be included as children of 
> the c:sentence element because they are related to different 
> spans, but only as children of the respective result element.
> 
> - The third question I'd like to ask concerns the use of the 
> fn:idref function in XQuery. My first examples of the query 
> used idref() to select all those nodes underneath a:data that 
> are related to a certain span -- but I didn't manage to get 
> any output although all XSD files are available (I use 
> Saxon-SA 9). What has to be changed in the XQuery to use the 
> idref function?
> 
> Again I apologize for asking three questions in my first post 
> to the list.
> 
> Kind regards,
> 
> Maik Stührenberg
> 
> 
> _______________________________________________
> talk at x-query.com
> http://x-query.com/mailman/listinfo/talk




More information about the talk mailing list