[xquery-talk] Count a specific word in a document

Michael Strasser M.Strasser at gpo.com
Wed Jun 13 15:11:11 PDT 2007


I am learning XQuery and have set myself a little task that currently I 
can't manage.

I have an XHTML document with the complete text of Mendelssohn's 
oratorio "Elijah" and wanted to use XQuery to count the number of times 
the character of Elijah sings the word "Lord". I was inspired by 
Jonathan Robie's blog post last year about word counts of DocBook 
documents. (I copied his tokenize() example without fully understanding 
it yet.)

I have isolated Elijah's speeches and converted the words to a sequence 
of string tokens:

  for $elijah in doc("/db/mjs/ElijahLibretto.xhtml")/html
  let $elijah-para := $elijah//td/p[i/text() = 'Elijah']
  let $txt := string-join($elijah-para/text(), " ")
  let $words := tokenize($txt,"(\s|[,.!:;]|[n][b][s][p][;])+")

I can't figure out how to count the number of string tokens that are 
'Lord'. I can get them with:

  for $word in $words
  return $word[$word = 'Lord']

but I can't seem to get the count of them.

Thanks in advance for any help.


Michael Strasser
Brisbane Australia


More information about the talk mailing list