[xquery-talk] Count a specific word in a document

Michael Strasser M.Strasser at gpo.com
Wed Jun 13 16:02:19 PDT 2007


Ron


Thanks, your simpler version works fine (with eXist):

  let $lords := $words[. = 'Lord']
  return count($lords)


Michael Strasser


Ronald Bourret wrote:
> Try:
>
>  for $elijah in doc("/db/mjs/ElijahLibretto.xhtml")/html
>  let $elijah-para := $elijah//td/p[i/text() = 'Elijah']
>  let $txt := string-join($elijah-para/text(), " ")
>  let $words := tokenize($txt,"(\s|[,.!:;]|[n][b][s][p][;])+")
>  let $lord_tokens := (for $word in $words
>                       where $word = 'Lord'
>                       return $word)
>  return fn:count($lord_tokens)
>
> I assume the following would also work:
>
>  let $lord_tokens := $words[. = 'Lord']
>
> -- Ron
>
> Michael Strasser wrote:
>
>> I am learning XQuery and have set myself a little task that currently 
>> I can't manage.
>>
>> I have an XHTML document with the complete text of Mendelssohn's 
>> oratorio "Elijah" and wanted to use XQuery to count the number of 
>> times the character of Elijah sings the word "Lord". I was inspired 
>> by Jonathan Robie's blog post last year about word counts of DocBook 
>> documents. (I copied his tokenize() example without fully 
>> understanding it yet.)
>>
>> I have isolated Elijah's speeches and converted the words to a 
>> sequence of string tokens:
>>
>>  for $elijah in doc("/db/mjs/ElijahLibretto.xhtml")/html
>>  let $elijah-para := $elijah//td/p[i/text() = 'Elijah']
>>  let $txt := string-join($elijah-para/text(), " ")
>>  let $words := tokenize($txt,"(\s|[,.!:;]|[n][b][s][p][;])+")
>>
>> I can't figure out how to count the number of string tokens that are 
>> 'Lord'. I can get them with:
>>
>>  for $word in $words
>>  return $word[$word = 'Lord']
>>
>> but I can't seem to get the count of them.
>>
>> Thanks in advance for any help.
>>
>>
>> Michael Strasser
>> Brisbane Australia
>
>


More information about the talk mailing list