[xquery-talk] XQuery and id()/idref(); Controlling the children of nodes in the result sequence

Maik Stührenberg maik.stuehrenberg at uni-bielefeld.de
Wed Apr 23 12:23:41 PDT 2008


Hello,

I'm new to the list and tried to find the answer to my questions in 
several locations (including the list archive). So I apologize if I 
haven't searched thoroughly enough and the anwer has been given already.

Here's my problem:

We use a standoff annotation format for storing multiple annotated text 
files. The text files are used for defining a:span elements which 
delimit the textual information annotated by means of start and end 
positions (see example below).
The annotation is stored separately as children of the a:data element. 
In principle, everything is allowed underneath the a:data element (in 
the underlying XSD 'a.xsd' the a:data element is a wrapper for elements 
derived from a different namespace), however, there won't be any text 
nodes, only elements containing other elements or empty elements. So I 
won't have any information about the hierarchy of the children of a:data.
Connection between annotation and the annotated text is saved by the 
a:span attributes (which is declared as xs:IDREF in the XSD).

<a:collection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://www.example.org/a a.xsd"
   xmlns="http://www.example.org/a" xmlns:a="http://www.example.org/a">
   <a:entry xml:id="c1" type="text">
     <a:spans>
       <a:span xml:id="seg1" start="0" end="20"/>
       <a:span xml:id="seg2" start="0" end="20"/>
       <a:span xml:id="to1" start="0" end="4"/>
       <a:span xml:id="to2" start="5" end="8"/>
     </a:spans>
     <a:data xmlns:b="http://www.example.org/b"
       xsi:schemaLocation="http://www.example.org/b b.xsd">
       <b:text a:span="seg1">
         <b:para a:span="seg1"/>
       </b:text>
     </a:data>
     <a:data xmlns:c="http://www.example.org/c"
       xsi:schemaLocation="http://www.example.org/c c.xsd">
       <c:sentence id="w35" a:span="seg2">
         <c:word a:span="to1" id="w36"/>
         <c:word a:span="to2" id="w37"/>
         <!-- ... -->
       </c:sentence>
     </a:data>
   </a:entry>
</a:collection>

When I try to use an XQuery to subsum all annotation that corresponds to 
a specific a:span element with the following XQuery example, I receive 
the output below.

declare namespace a="http://www.example.org/a";
declare namespace b="http://www.example.org/b";
declare namespace c="http://www.example.org/c";
element resultset
{
let $d := doc('instance.xml')
for $s in $d/a:collection/a:entry/a:spans/a:span
return
   <result span="{$s/@xml:id}" start="{$s/@start}" end="{$s/@end}">
     { $d/a:collection/a:entry/a:data//*[@a:span = $s/@xml:id] }
   </result>
}

<resultset>
   <result start="0" end="20" span="seg1">
     <b:text xmlns:b="http://www.example.org/b"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns="http://www.example.org/a"
        xmlns:a="http://www.example.org/a"
        a:span="seg1">
       <b:para a:span="seg1"/>
     </b:text>
     <b:para xmlns:b="http://www.example.org/b"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns="http://www.example.org/a"
        xmlns:a="http://www.example.org/a"
        a:span="seg1"/>
   </result>
   <result start="0" end="20" span="seg2">
     <c:sentence xmlns:c="http://www.example.org/c"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns="http://www.example.org/a"
        xmlns:a="http://www.example.org/a"
        id="w35"
        a:span="seg2">
       <c:word a:span="to1" id="w36"/>
       <c:word a:span="to2" id="w37"/>
       <!-- ... -->
      </c:sentence>
    </result>
   <result start="0" end="4" span="to1">
     <c:word xmlns:c="http://www.example.org/c"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns="http://www.example.org/a"
        xmlns:a="http://www.example.org/a" a:span="to1"
        id="w36"/>
   </result>
   <result start="5" end="8" span="to2">
     <c:word xmlns:c="http://www.example.org/c"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns="http://www.example.org/a"
        xmlns:a="http://www.example.org/a" a:span="to2"
        id="w37"/>
   </result>
</resultset>

Several things are not perfect here:
- Is there any way to suppress the output of the namespaces in each 
element? Or to be more specific: what do I have to change to output all 
namespaces once (and only once) in the resultset element?

- The biggest issue is that the b:para element is output twice: as child 
element of the b:text element (which is quite fine) and alone. The same 
problem appears when looking at the c:word elements: they should not be 
included as children of the c:sentence element because they are related 
to different spans, but only as children of the respective result element.

- The third question I'd like to ask concerns the use of the fn:idref 
function in XQuery. My first examples of the query used idref() to 
select all those nodes underneath a:data that are related to a certain 
span -- but I didn't manage to get any output although all XSD files are 
available (I use Saxon-SA 9). What has to be changed in the XQuery to 
use the idref function?

Again I apologize for asking three questions in my first post to the list.

Kind regards,

Maik Stührenberg




More information about the talk mailing list