Remove accents in the line except for "-"

I have the following code example:

var inputString = "ñaáme";
inputString = inputString.Replace('ñ', '\u00F1');
var normalizedString = inputString.Normalize(NormalizationForm.FormD);
var result = Regex.Replace(normalizedString, @"[^ñÑa-zA-Z0-9\s]*", string.Empty);
return result.Replace('\u00F1', 'ñ'); // naame :(

I need to normalize the text without deleting "ñ" s

I followed this example. But this is for Java, and it did not work for me.

I want your result to be: "ñaame".

+4
source share
1 answer

You can match any Unicode letter other than your letter ñand ASCII letter (which do not need normalization) with a regular expression (?i)[\p{L}-[ña-z]]+and normalize it. Then also remove any combination of labels from the string.

Using

var inputString = "ñaáme";
var result = string.Concat(Regex.Replace(inputString, @"(?i)[\p{L}-[ña-z]]+", m => 
        m.Value.Normalize(NormalizationForm.FormD)
    )
    .Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));
Console.Write(result);

See C # demo

Template Description

  • (?i) - ignore case modifier
  • [ - start of character class
    • \p{L} - any Unicode letter
    • -[ - Besides
      • ña-z - ñ ASCII
    • ] -
  • ]+ - 1 .
+5

Source: https://habr.com/ru/post/1689824/


All Articles