[xquery-talk] Omission in near() when used in mixed content
Stefan Majewski
stefan.majewski at univie.ac.at
Wed Jan 21 14:03:26 PST 2009
Dear all,
we are currently seeing Problems with near() when used with words span
over element boundaries. We have a fulltext index with content="mixed"
defined for the collection. We know that the index as such works, as
near() works as expected with single words, even when they overlap
element tags. Nevertheless when searching for a succession of multiple
words the search fails if at least one of the words is split by an element.
Assume the following xql:
---
declare namespace tei = "http://www.tei-c.org/ns/1.0";
let $q := "mixed test"
return //tei:u[near(. , $q)]
---
and this sample document:
---
<?xml version="1.0" encoding="utf-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<!-- snipped header -->
<text>
<body>
<div>
<u xml:id="u1"> this the first mi<seg type="overlap">xed test
</seg> </u>
<u xml:id="u2"> this the second mi<anchor/>xed test </u>
<u xml:id="u3"> this is the third <seg type="overlap"> mixed
</seg> test </u>
<u xml:id="u4"> this is last <seg type="overlap"> mixed test
</seg> </u>
</div>
</body>
</text>
</TEI>
---
several searches yield very different results, even though they should
imho be equal
1) $q="mixed" returns tei:u with id u1,u2,u3,u4
2) $2="mixed test" only returns tei:u with id u3,u4
Does anybody see a different behaviour? I might have misinterpreted
something in the docs, such that the assumption that the second search
should return the same four tei:u elements is wrong, or maybe there
could also be a bug in near() or the fulltext index causing this issue.
However it might be, I would be very glad to get some hints how I could
circumvent this issue as I currently implement searches over highly
segmented texts.
cheers,
Stefan
More information about the talk
mailing list