[xquery-talk] Find count of a string in an xml file

Michael Kay mike at saxonica.com
Fri Jun 6 09:57:57 PDT 2008


Your XML isn't well-formed - displaySubject uses "Day" as an attribute with
no attribute name. Assuming you meant <display subject="Day">, the answer
would be
 
count((//display/@subject | //alt)[. = $word]
 
where the variable $word is initialized to the word you are looking for.
 
That looks for $word as the whole of the element or attribute value. If
you're interested in matching substrings of the value, it would be
 
count((//display/@subject | //alt)[contains(., $word)]
 
I want to find all the strings that occur more than 50 times in the
document.

Again this depends rather whether you are looking for strings that make up
the whole of an element or attribute value, or for substrings, If the
latter, you need to define how they are delimited (e.g. on word boundaries).
The naive solution to this is something like this:
 
let $allWords := for $i in (//*, //@*) return tokenize(., '\W+')
let $distinctWords := distinct-values($allWords)
where count($allWords[. = $distinctWords] gt 50)
return $distinctWords
 
But this could be horrendously inefficient unless your XQuery engine has a
rather clever optimizer. There's an XSLT 2.0 solution on page 19 of my XSLT
2.0 Programmers Reference (4th edition) that makes use of built-in grouping
facilities in XSLT, and is likely to run much faster.
 
Michael Kay
http://www.saxonica.com/


  _____  

From: talk-bounces at x-query.com [mailto:talk-bounces at x-query.com] On Behalf
Of Mudita Nain
Sent: 04 June 2008 17:21
To: talk at x-query.com
Subject: [xquery-talk] Find count of a string in an xml file


Hi all,
 
I am using SQL Server 2005. I have a table with xml column in which I have
loaded an XML file. I want to write an xquery which finds the number of
occurrences of a string in the document in some defined tags.
 
The structure of the document is as follows:
 
<subject>
<displaySubject = "Day">
<alt> </alt>
<alt> </alt>
<alt> </alt>
<alt> </alt>
</displaySubject>
</subject>
<subject>
<displaySubject>
<alt> </alt>
<alt> </alt>
<alt> </alt>
<alt> </alt>
</displaySubject>
</subject>
 
 
So, I want to find how many time the string "Day" occurs whether in
displaySubject or alt anywhere in the document.
Also, the "Day" string is not known. I want to find all the strings that
occur more than 50 times in the document.
 
I hope I am clear.
I would appreciate any help from you.
 
Thanks
Mudita

 
 


  _____  

Instantly invite friends from Facebook and other social networks to join you
on Windows LiveT Messenger. Invite friends now!
<https://www.invite2messenger.net/im/?source=TXT_EML_WLH_InviteFriends>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://x-query.com/pipermail/talk/attachments/20080606/4d02ed83/attachment.htm


More information about the talk mailing list