[xquery-talk] Count a specific word in a document
Michael Strasser
M.Strasser at gpo.com
Wed Jun 13 16:02:19 PDT 2007
Ron
Thanks, your simpler version works fine (with eXist):
let $lords := $words[. = 'Lord']
return count($lords)
Michael Strasser
Ronald Bourret wrote:
> Try:
>
> for $elijah in doc("/db/mjs/ElijahLibretto.xhtml")/html
> let $elijah-para := $elijah//td/p[i/text() = 'Elijah']
> let $txt := string-join($elijah-para/text(), " ")
> let $words := tokenize($txt,"(\s|[,.!:;]|[n][b][s][p][;])+")
> let $lord_tokens := (for $word in $words
> where $word = 'Lord'
> return $word)
> return fn:count($lord_tokens)
>
> I assume the following would also work:
>
> let $lord_tokens := $words[. = 'Lord']
>
> -- Ron
>
> Michael Strasser wrote:
>
>> I am learning XQuery and have set myself a little task that currently
>> I can't manage.
>>
>> I have an XHTML document with the complete text of Mendelssohn's
>> oratorio "Elijah" and wanted to use XQuery to count the number of
>> times the character of Elijah sings the word "Lord". I was inspired
>> by Jonathan Robie's blog post last year about word counts of DocBook
>> documents. (I copied his tokenize() example without fully
>> understanding it yet.)
>>
>> I have isolated Elijah's speeches and converted the words to a
>> sequence of string tokens:
>>
>> for $elijah in doc("/db/mjs/ElijahLibretto.xhtml")/html
>> let $elijah-para := $elijah//td/p[i/text() = 'Elijah']
>> let $txt := string-join($elijah-para/text(), " ")
>> let $words := tokenize($txt,"(\s|[,.!:;]|[n][b][s][p][;])+")
>>
>> I can't figure out how to count the number of string tokens that are
>> 'Lord'. I can get them with:
>>
>> for $word in $words
>> return $word[$word = 'Lord']
>>
>> but I can't seem to get the count of them.
>>
>> Thanks in advance for any help.
>>
>>
>> Michael Strasser
>> Brisbane Australia
>
>
More information about the talk
mailing list