[xquery-talk] Count a specific word in a document

Ronald Bourret rpbourret at rpbourret.com
Tue Jun 12 22:41:02 PDT 2007


Try:

  for $elijah in doc("/db/mjs/ElijahLibretto.xhtml")/html
  let $elijah-para := $elijah//td/p[i/text() = 'Elijah']
  let $txt := string-join($elijah-para/text(), " ")
  let $words := tokenize($txt,"(\s|[,.!:;]|[n][b][s][p][;])+")
  let $lord_tokens := (for $word in $words
                       where $word = 'Lord'
                       return $word)
  return fn:count($lord_tokens)

I assume the following would also work:

  let $lord_tokens := $words[. = 'Lord']

-- Ron

Michael Strasser wrote:

> I am learning XQuery and have set myself a little task that currently I 
> can't manage.
> 
> I have an XHTML document with the complete text of Mendelssohn's 
> oratorio "Elijah" and wanted to use XQuery to count the number of times 
> the character of Elijah sings the word "Lord". I was inspired by 
> Jonathan Robie's blog post last year about word counts of DocBook 
> documents. (I copied his tokenize() example without fully understanding 
> it yet.)
> 
> I have isolated Elijah's speeches and converted the words to a sequence 
> of string tokens:
> 
>  for $elijah in doc("/db/mjs/ElijahLibretto.xhtml")/html
>  let $elijah-para := $elijah//td/p[i/text() = 'Elijah']
>  let $txt := string-join($elijah-para/text(), " ")
>  let $words := tokenize($txt,"(\s|[,.!:;]|[n][b][s][p][;])+")
> 
> I can't figure out how to count the number of string tokens that are 
> 'Lord'. I can get them with:
> 
>  for $word in $words
>  return $word[$word = 'Lord']
> 
> but I can't seem to get the count of them.
> 
> Thanks in advance for any help.
> 
> 
> Michael Strasser
> Brisbane Australia



More information about the talk mailing list