C # regular expressions for character equivalents

How to search a string in C # using Regex, ignoring accents;

For example, in Notepad ++ for ancient Greek regular expression searches: [[= α =]] will return: α, ἀ ἁ, ᾶ, ὰ, ά, ᾳ, ....

I know that Notepad ++ uses the PCRE standard. How to do it in C #? Is there an equivalence syntax?

Edit

I already tried string normalization. Doesn't work in Greek. for example: "ᾶ" .Normalize (NormalizationForm.FormC) will return ᾶ. It seems that normalization removes emphasis only in the case of "Combining characters." The is character is a separate character in Unicode!

+4
source share
2 answers

System.String.Normalize - .

using System;
using System.Text;
using System.Text.RegularExpressions;
using System.Globalization;
using System.Linq;

public class Program
{
    public static void Main()
    {
        string rawInput = "ἀἁἂἃἄἅἆἇὰάᾀᾁᾂᾃᾄᾅᾆᾇᾰᾱᾲᾳᾴᾶᾷ";
        Console.WriteLine(rawInput);
        string normalizedInput = Utility.RemoveDiacritics(rawInput);    
        string pattern = "α+";

        var result = Regex.Matches(normalizedInput, pattern);
        if(result.Count > 0)
            Console.WriteLine(result[0]);    
    }
}

public static class Utility
{
    public static string RemoveDiacritics(this string str)
    {
        if (null == str) return null;
        var chars =
            from c in str.Normalize(NormalizationForm.FormD).ToCharArray()
            let uc = CharUnicodeInfo.GetUnicodeCategory(c)
            where uc != UnicodeCategory.NonSpacingMark
            select c;

        return new string(chars.ToArray()).Normalize(NormalizationForm.FormC);
    }
}

:

ἀἁἂἃἄἅἆἇὰάᾀᾁᾂᾃᾄᾅᾆᾇᾰᾱᾲᾳᾴᾶᾷᾶ
αααααααααααααααααααααααααα

:

static string RemoveDiacritics(string text) 
{
    var normalizedString = text.Normalize(NormalizationForm.FormD);
    var stringBuilder = new StringBuilder();        
    foreach (var c in normalizedString)
    {
        var unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(c);
        if (unicodeCategory != UnicodeCategory.NonSpacingMark)
        {
            stringBuilder.Append(c);
        }
    }       
    return stringBuilder.ToString().Normalize(NormalizationForm.FormC);
}

:

PS: , PCRE.NET, Lucas Trzesniewski.NET PCRE () POSIX.

+2
0

Source: https://habr.com/ru/post/1695882/


All Articles