[xquery-talk] Convert diacritics to low-ascii

Andrew Welch andrew.j.welch at gmail.com
Tue Jun 21 05:53:21 PDT 2011


On 21 June 2011 13:33, Geert Josten <geert.josten at daidalos.nl> wrote:
> Thanx Andy!
>
> Works just fine in XQuery too. But have to admit that it looks a bit funny to me. Replace something with nothing and still end up with all characters? Can anyone explain what this \p{M} is matching? Unicode spec isn't making it much clearer to me.. :-P
>

Taken from:

http://www.regular-expressions.info/unicode.html

"\p{M} or \p{Mark}: a character intended to be combined with another
character (e.g. accents, umlauts, enclosing boxes, etc.)."

When NFD or NFKD is used, then the diacritics are represented by a
character following the letter, so for example e accute is the letter
e followed by the character for the accute...  so you can just remove
that character and be left with the e.  (in my post earlier you can
see the accute is unicode character 769)



-- 
Andrew Welch
http://andrewjwelch.com


More information about the talk mailing list