[xquery-talk] Function and Query Evaluation with No XML Tags Error

Kevin Grover kevin at kevingrover.net
Thu Feb 28 22:25:33 PST 2008


On Thu, Feb 28, 2008 at 9:06 PM, Kevin Grover <kevin at kevingrover.net> wrote:
> On Thu, Feb 28, 2008 at 6:31 PM, Wei, Alice J. <ajwei at indiana.edu> wrote:
>  > On Thu, Feb 28, 2008 at 8:06 AM, Wei, Alice J. <ajwei at indiana.edu> wrote:
>  >  > Hi, Kevin:
>  >  >
>  >  >    I was referring to the fact that the way the functions inside the definition is written seems to be different even though it is attempting to achieve the same functionality for those queries that do not use user-defined functions
>  >  >
>  >  >    As for the code, I am not sure if it is because the structure you provided me is different from what I am doing here, but I still get duplicates.
>  >  >
>  >  >
>  >  > declare function local:unique-nodes-by-value($seq as element()*) as element()*
>  >  >  {
>  >  >   for $r in $seq[not(string(.)=string((child::*)[1]))]
>  >
>  >  you used 'child::' instead of 'preceding-sibling::' (from my example).
>  >   That's why.
>  >
>  >  >    return $r
>  >  >
>  >  >  };
>  >  >
>  >
>  >    Well, apparently I have just changed my code to
>  >
>  >
>  >  declare function local:unique-nodes-by-value($seq as element()*) as element()*
>  >  {
>  >   for $r in $seq[not(string(.)=string((preceding-sibling::*)[1]))]
>
>  I think this test is broken if the data (e.g. sequence being testing)
>  is not sorted.  I need to compare that string(.) is not equal to ANY
>  of the preceding-siblings (not just the first one).  I don't know yet
>  how to do this (and I can't look at it now...)
>
>
>  >   order by $r
>  >    return $r
>  >
>  >  };
>  >  for $r in local:unique-nodes-by-value(collection("xmldb:exist://db/cbml")//ad/head[contains(upper-case(.), 'STAMP')])
>  >
>  >  let $para := $r/ancestor::ad//(div|p)
>  >   let $head := $r/ancestor::ad/child::head
>  >   let $note := $r/ancestor::ad/note
>  >  return <ad>{$head}
>  >             {$para}
>  >             {$note}
>  >        </ad>
>  >
>  >  Like you said, this was what was provided in your example, but my final output was still 44 when the actual count of after using distinct-values should be
>  >
>  >  declare function local:count-distinct-values($seq as xs:anyAtomicType*)
>  >      as element(statistics) {
>  >   <statistics>{count(distinct-values($seq))}</statistics>
>  >  };
>  >  local:count-distinct-values(collection("xmldb:exist://db/cbml")//ad/head[contains(upper-case(.),'STAMPS')])
>  >
>  >  Output: <statistics>5</statistics>
>  >  Should I use something else?
>  >
>
>  I don't know.  I can't 'see' the problem without looking at the data.
>  And I don't have time right now.
>
>  Have you tried just displaying the results of various steps? For
>  example, you can save to results of running
>
>
>  collection("xmldb:exist://db/cbml")//ad/head[contains(upper-case(.),'STAMPS')]
>
>  to a file, and play with that.  (I.e. run queries against it and see
>  what happens in intermediate steps - e.g. the results of subqueries).
>  You may have some logic problems like before (e.g. getting a string
>  where you expect and element, or something).
>
>
>
>
>
>  >
>  >
>  >  >  Snippet of output:
>  >  >
>  >  >  <ad>
>  >  >  <head>Stamp Collecting Outfit</head>
>  >  >  <p>Packet of world stamps, 9 Triangles, 2 Diamonds, animals, insects, flowers, ships, etc. Plus packet of hinges, perf. guage. Only 25c. Plus, stamps for your examination from our approval service which can be cancelled anytime. Buy what you want or none and return those not wanted in 10 days.
>  >  >  <address>
>  >  >  <addressLine>L.W. Brown.</addressLine>
>  >  >  <addressLine>Dent. C</addressLine>
>  >  >  <addressLine>Marion. Mich. 49665</addressLine>
>  >  >  </address>
>  >  >  </p>
>  >  >  </ad>
>  >  >  <ad>
>  >  >  <head>Stamp Collecting Outfit</head>
>  >  >  <p>Packet of world stamps, 9 Triangles, 2 Diamonds, animals, insects, flowers, ships, etc. Plus packet of hinges, perf. guage. Only 25c. Plus, stamps for your examination from our approval service which can be cancelled anytime. Buy what you want or none and return those not wanted in 10 days.
>  >  >  <address>
>  >  >  <addressLine>L.W. Brown.</addressLine>
>  >  >  <addressLine>Dent. C</addressLine>
>  >  >  <addressLine>Marion. Mich. 49665</addressLine>
>  >  >  </address>
>  >  >  </p>
>  >  >  </ad>
>  >  >
>  >  >  XML:
>  >  >
>  >  >  <ad>
>  >  >  <head>Stamp Collecting Outfit</head>
>  >  >  <p>Packet of world stamps, 9 Triangles, 2 Diamonds, animals, insects, flowers, ships, etc. Plus packet of hinges, perf. guage. Only 25c. Plus, stamps for your examination from our approval service which can be cancelled anytime. Buy what you want or none and return those not wanted in 10 days.
>  >  >  <address>
>  >  >  <addressLine>L.W. Brown.</addressLine>
>  >  >  <addressLine>Dent. C</addressLine>
>  >  >  <addressLine>Marion. Mich. 49665</addressLine>
>  >  >  </address>
>  >  >  </p>
>  >  >  </ad>
>  >  >  <ad>
>  >  >  <head type ="main">1c</head>
>  >  >  <head type ="sub">Thousands of Beautiful Stamps</head>
>  >  >  <p>1c each and up&why pay more when you can get the best for less. Write today for approvals. PENNY STAMP Service.
>  >  >  <address>
>  >  >  <addressLine>P.O. Box 898,</addressLine>
>  >  >  <addressLine>Mariposa California 95338</addressLine>
>  >  >  </address>
>  >  >  </p>
>  >  >  </ad>
>  >  >  <ad>
>  >  >  <head>Stamp Collecting Outfit</head>
>  >  >  <p>Packet of world stamps, 9 Triangles, 2 Diamonds, animals, insects, flowers, ships, etc. Plus packet of hinges, perf. guage. Only 25c. Plus, stamps for your examination from our approval service which can be cancelled anytime. Buy what you want or none and return those not wanted in 10 days.
>  >  >  <address>
>  >  >  <addressLine>L.W. Brown.</addressLine>
>  >  >  <addressLine>Dent. C</addressLine>
>  >  >  <addressLine>Marion. Mich. 49665</addressLine>
>  >  >  </address>
>  >  >  </p>
>  >  >  </ad>
>  >  >  ======================================================
>  >  >
>  >  > Alice Wei
>  >  >  MIS 2008
>  >  >  School of Library and Information Science
>  >  >  Indiana University Bloomington
>  >  >  ajwei at indiana.edu
>  >  >  ________________________________________
>  >  >  From: kogrover at gmail.com [kogrover at gmail.com] On Behalf Of Kevin Grover [kevin at kevingrover.net]
>  >  >  Sent: Wednesday, February 27, 2008 6:40 PM
>  >  >
>  >  > To: Wei, Alice J.
>  >  >  Cc: talk at x-query.com
>  >  >  Subject: Re: [xquery-talk] Function and Query Evaluation with No XML Tags Error
>  >  >
>  >  >
>  >  >
>  >  > On Wed, Feb 27, 2008 at 2:41 AM, Wei, Alice J. <ajwei at indiana.edu> wrote:
>  >  >  Hi, Kevin:
>  >  >
>  >  >   Thanks, this does bring some help to what I am trying to work on now. To certain extent, I have this feeling that the syntax between XQuery functions and XQuery user-defined functions are somewhat different.
>  >  >
>  >  >  I'm not sure I understand this statement.  It seems that you're implying that the built-in functions and user function behave differently?  Or are used differently?  I don't think this the case.  I could for, example, write my own function that behaved like the built-in distinct-values function.
>  >  >
>  >  >
>  >  >   Your code does point out quite a few different options and the variety of errors I could be getting when I am not doing them correctly. It does do the trick.
>  >  >
>  >  >  Thanks for your help.
>  >  >
>  >  >
>  >  >  You're welcome.  I'm glad it helped.  I've learned a great deal about XQuery myself in the last couple of days (by trying to help someone else).
>  >  >
>  >  >  - Kevin
>  >  >
>  >  >
>  >
>

I figured out the function.  This one appears to actually do what I
want (regardless of whether the data is sorted or not)

declare function local:unique-nodes-by-value($seq as element()*) as element()*
{
  $seq[not(string(.)=preceding-sibling::*/string(.))]
};

The key was where I used the string() function on the second
component.  I wanted to basically do where str1 NOT IN str_list, but
did not yet know the syntax.  After looking at the specs, I came up
with the above.  There may be better ways to do it.

I used this xquery to test the function:

---------------- start: t.xq
declare variable $data := <data>
  <a><b>1</b><c>2</c></a>
  <a><b>3</b><c>4</c></a>
  <a><b>5</b><c>6</c></a>
  <a><b>1</b><c>2</c></a>
  <a><b>3</b><c>4</c></a>
  <a><b>5</b><c>6</c></a>
  <a><b>5</b><c>6</c></a>
  <a><b>5</b><c>6</c></a>
</data>;

declare variable $s := $data//a;

declare function local:unique-nodes-by-value($seq as element()*) as element()*
{
  $seq[not(string(.)=preceding-sibling::*/string(.))]
};

'&#10;',
'Strings: ',
for $v in $s return string($v)
,'&#10;',
'Distinct Values: ',
distinct-values($s)
,'&#10;',
'local:unique-nodes-by-value: ',
local:unique-nodes-by-value($s)
---------------- end: t.xq

And it generated this output:

<?xml version="1.0" encoding="UTF-8"?>
 Strings:  12 34 56 12 34 56 56 56
 Distinct Values:  12 34 56
 local:unique-nodes-by-value: <a>
   <b>1</b>
   <c>2</c>
</a>
<a>
   <b>3</b>
   <c>4</c>
</a>
<a>
   <b>5</b>
   <c>6</c>
</a>

Which is what I expected.

As for your count problem, I dropped your data is and run it and it gave this:
(NOTE: I had to fix an unquoted & in the middle element before it was
really legal XML)

<ad>
   <head>Stamp Collecting Outfit</head>
   <p>Packet of world stamps, 9 Triangles, 2 Diamonds, animals,
insects, flowers, ships, etc. Plus packet of hinges, perf. guage. Only
25c. Plus, stamps for your examination from our approval service which
can be cancelled anytime. Buy what you want or none and return those
not wanted in 10 days.
<address>
         <addressLine>L.W. Brown.</addressLine>
         <addressLine>Dent. C</addressLine>
         <addressLine>Marion. Mich. 49665</addressLine>
      </address>
   </p>
</ad>
<ad>
   <head type="main">1c</head>
   <head type="sub">Thousands of Beautiful Stamps</head>
   <p>1c each and up&amp;why pay more when you can get the best for
less. Write today for approvals. PENNY STAMP Service.
<address>
         <addressLine>P.O. Box 898,</addressLine>
         <addressLine>Mariposa California 95338</addressLine>
      </address>
   </p>
</ad>


- Kevin


More information about the talk mailing list