[xquery-talk] Convert diacritics to low-ascii

Tue Jun 21 05:58:15 PDT 2011

Thanks!

I now notice I just didn’t looked close enough. The XQuery engine was returning: "abcde&#x0308;f", I didn't notice the 'e' in front of the &#x..;

:)

-----Oorspronkelijk bericht-----
Van: Martin Honnen [mailto:Martin.Honnen at gmx.de] 
Verzonden: dinsdag 21 juni 2011 14:55
Aan: Geert Josten
Onderwerp: Re: [xquery-talk] Convert diacritics to low-ascii

Geert Josten wrote:

> Works just fine in XQuery too. But have to admit that it looks a bit
> funny to me. Replace something with nothing and still end up with all
> characters? Can anyone explain what this \p{M} is matching? Unicode
> spec isn't making it much clearer to me.. :-P

Well when you do e.g.
   normalize-unicode('äé', 'NFD')
you get a string with four characters 'a', ' ̈', 'e', and '́'. And the
   replace(normalize-unicode('abcdëf', 'NFD'), '[\p{M}]', '')
removes the second and fourth character.

-- 

	Martin Honnen --- MVP Data Platform Development
	http://msmvps.com/blogs/martin_honnen/