[xquery-talk] Count a specific word in a document
Ronald Bourret
rpbourret at rpbourret.com
Tue Jun 12 22:41:02 PDT 2007
Try:
for $elijah in doc("/db/mjs/ElijahLibretto.xhtml")/html
let $elijah-para := $elijah//td/p[i/text() = 'Elijah']
let $txt := string-join($elijah-para/text(), " ")
let $words := tokenize($txt,"(\s|[,.!:;]|[n][b][s][p][;])+")
let $lord_tokens := (for $word in $words
where $word = 'Lord'
return $word)
return fn:count($lord_tokens)
I assume the following would also work:
let $lord_tokens := $words[. = 'Lord']
-- Ron
Michael Strasser wrote:
> I am learning XQuery and have set myself a little task that currently I
> can't manage.
>
> I have an XHTML document with the complete text of Mendelssohn's
> oratorio "Elijah" and wanted to use XQuery to count the number of times
> the character of Elijah sings the word "Lord". I was inspired by
> Jonathan Robie's blog post last year about word counts of DocBook
> documents. (I copied his tokenize() example without fully understanding
> it yet.)
>
> I have isolated Elijah's speeches and converted the words to a sequence
> of string tokens:
>
> for $elijah in doc("/db/mjs/ElijahLibretto.xhtml")/html
> let $elijah-para := $elijah//td/p[i/text() = 'Elijah']
> let $txt := string-join($elijah-para/text(), " ")
> let $words := tokenize($txt,"(\s|[,.!:;]|[n][b][s][p][;])+")
>
> I can't figure out how to count the number of string tokens that are
> 'Lord'. I can get them with:
>
> for $word in $words
> return $word[$word = 'Lord']
>
> but I can't seem to get the count of them.
>
> Thanks in advance for any help.
>
>
> Michael Strasser
> Brisbane Australia
More information about the talk
mailing list