I would like to write a regular expression that will match all accented forms of a particular character in text encoding using some Unicode encoding, without explicitly listing all such forms in a character class.
So, for example, if I wanted to match any accented version of a , [aàáâãäå] not enough, since it only gets a that live in ISO-8859-1, and it might be nice to be other accents that aren't there. What would be acceptable is something like \p{Base_Character: a} , were there such things defined in Unicode. Something that does this?
Edit: I cannot ASCIIfy a string at first --- the string is in a database to which I do not have direct access. In fact, I do not have access to all levels of code. The only input I can give is a regular expression.
source share