[xquery-talk] XQuery and \w, \W in regex (Saxon 8)

David Sewell dsewell at virginia.edu
Wed Nov 16 12:22:23 PST 2005


Given this code:

  let $string1 := '"quoted"'
  let $string2 := "“quoted”"
  return
  ( replace($string1, "\W", ""),
    replace($string2, "\W", "")
  )

Saxon 8.6b returns

  quoted
  "quoted"

(where the " " in the second line are Unicode curly quotation marks).

Is this a bug in the regex handling? U+201C and U+201D should be treated
as separators, no? (Likewise single curly quotes, U+2018 and U+2019; I
haven't tried other punctuation in that code block.)

-- 
David Sewell, Editorial and Technical Manager
Electronic Imprint, The University of Virginia Press
PO Box 400318, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: dsewell at virginia.edu   Tel: +1 434 924 9973
Web: http://www.ei.virginia.edu/


More information about the talk mailing list