Highlight a list of lines ignoring spaces, diacritics, and cases

The following list of lines is provided:

string[] Itens = new string[] { "hi", " hi   ", "HI", "hí", " Hî", "hi hi", " hí hí ", "olá", "OLÁ", " olá   ", "", "ola", "hola", " holà    ", "aaaa", "áâàa", " aâàa     ", "áaàa", "áâaa ", "aaaa ", "áâaa", "áâaa", };

The result of the Distinct operation should be:

hi, hi hi, olá, , hola, aaaa

C # The excellent operation available for IEnumerable takes IEqualityComparer as a parameter so that we can personalize the comparison.

The following implementations get the job done

class LengthHash : IEqualityComparer<string>
{
    public bool Equals(string x, string y)
    {
        if (x == null || y == null) return x == y;

        var xt = x.Trim();
        var yt = y.Trim();

        return xt.Length == yt.Length && Culture.CompareInfo.IndexOf(xt, yt, CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreCase) >= 0;
    }

    public int GetHashCode(string obj) => obj?.Trim().Length ?? 1;
}

If GetHashCode is different, Equals doesn't even execute, so it's important to have a good implementation.

I tried changing GetHashCode for another 2 different approaches.

Ignorehash

public int GetHashCode(string obj) => 1;

Normalizedhash

public int GetHashCode(string obj) => obj?.Trim().Normalize().ToUpperInvariant().GetHashCode() ?? 1;
// obs: This approach doesn't produce the same output.

Besides using a personalized IEqualityComparer, I also tried to crop the list before doing StringComparer.InvariantCultureIgnoreCase, but it produces the same result as for the Normalize and Upper versions.

Distinct, StringComparer.InvariantCultureIgnoreCase 3 :

                              Method |       Mean |    StdErr |    StdDev |     Median |
------------------------------------ |----------- |---------- |---------- |----------- |
                          RunDefault |  2.2224 us | 0.0242 us | 0.2391 us |  2.1414 us |
                     RunHashAsLength |  6.0765 us | 0.0515 us | 0.1857 us |  6.1235 us |
                       RunIgnoreHash |  6.4078 us | 0.0640 us | 0.6140 us |  6.1982 us |
                   RunNormalizedHash | 14.5941 us | 0.0742 us | 0.3556 us | 14.4983 us |
 RunTrimAndCompareWithStringComparer | 14.4935 us | 0.0213 us | 0.0768 us | 14.5352 us |

:

21 Default: hi,  hi   , HI, hí,  Hî, hi hi,  hí hí , olá, OLÁ,  olá   , , ola, hola,  holà    , aaaa, áâàa,  aâàa     , áaàa, áâaa , aaaa , áâaa
6 HashAsLength: hi, hi hi, olá, , hola, aaaa
6 IgnoreHash: hi, hi hi, olá, , hola, aaaa
15 NormalizedHash: hi, hí,  Hî, hi hi,  hí hí , olá, , ola, hola,  holà    , aaaa, áâàa,  aâàa     , áaàa, áâaa
15 RunTrimAndCompareWithStringComparer: hi, hí, Hî, hi hi, hí hí, olá, , ola, hola, holà, aaaa, áâàa, aâàa, áaàa, áâaa

https://gist.github.com/Flash3001/d50a6b43bba7bc61e3d85734e40dbed9

: ? GetHashCode, Equals IEqualityComparer.

+5
1

, CompareInfo, Compare GetHashCode. , , . . .

class StringEqualityComparer : IEqualityComparer<string>
{
    private CultureInfo _cultureInfo;
    private CompareOptions _options;
    private bool _trim;

    public StringEqualityComparer(CultureInfo cultureInfo,
        CompareOptions options, bool trim)
    {
        _cultureInfo = cultureInfo;
        _options = options;
        _trim = trim;
    }

    public bool Equals(string x, string y)
    {
        if (_trim) { x = x?.Trim(); y = y?.Trim(); }
        return _cultureInfo.CompareInfo.Compare(x, y, _options) == 0;
    }

    public int GetHashCode(string obj)
    {
        if (_trim) obj = obj?.Trim();
        return _cultureInfo.CompareInfo.GetHashCode(obj, _options);
    }
}

:

var comparer = new StringEqualityComparer(CultureInfo.InvariantCulture,
    CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreCase, true);
var items = new string[] { "hi", " hi   ", "HI", "hí", " Hî", "hi hi", " hí hí ",
    "olá", "OLÁ", " olá   ", "", "ola", "hola", " holà    ", "aaaa", "áâàa",
    " aâàa     ", "áaàa", "áâaa ", "aaaa ", "áâaa", "áâaa", };
Console.WriteLine($"Distinct: {String.Join(", ", items.Distinct(comparer))}");

:

: , , ola, hola, aaaa

0

Source: https://habr.com/ru/post/1673553/


All Articles