The space in the .NET string returned by the string. Does formatting not match the space declared in the source code - multiple views?

The string returned by string.Format seems to use some weird encoding. The spaces contained in the format string are represented using different byte values ​​compared to the spaces contained in the strings declared in the source code.

The following test case demonstrates the problem:

[Test]
public void FormatSize_Regression() 
{
  string size1023 = FileHelper.FormatSize(1023);
  Assert.AreEqual("1 023 Bytes", size1023);
}

Fails:

    String lengths are both 11. Strings differ at index 1.
    Expected: "1,023 Bytes"
    But was: "1,023 Bytes"
    ------------ ^

FormatSize Method:

public static string FormatSize(long size) 
{
  if (size < 1024)
     return string.Format("{0:N0} Bytes", size);
  else if (size < 1024 * 1024)
     return string.Format("{0:N2} KB", (double)((double)size / 1024));
  else
     return string.Format("{0:N2} MB", (double)((double)size / (1024 * 1024)));
}

From the VS direct window, when the breakpoint is set on the Assert line:

size1023
"1 023 Bytes"

System.Text.Encoding.UTF8.GetBytes(size1023)
{byte[12]}
    [0]: 49
    [1]: 194 <--------- space is 194/160 here? Unicode bytes indicate that space should be the 160. What is the 194 then?
    [2]: 160
    [3]: 48
    [4]: 50
    [5]: 51
    [6]: 32
    [7]: 66
    [8]: 121
    [9]: 116
    [10]: 101
    [11]: 115
System.Text.Encoding.UTF8.GetBytes("1 023 Bytes")
{byte[11]}
    [0]: 49
    [1]: 32  <--------- space is 32 here
    [2]: 48
    [3]: 50
    [4]: 51
    [5]: 32
    [6]: 66
    [7]: 121
    [8]: 116
    [9]: 101
    [10]: 115

System.Text.Encoding.Unicode.GetBytes(size1023)
{byte[22]}
    [0]: 49
    [1]: 0
    [2]: 160 <----------- 160,0 here
    [3]: 0
    [4]: 48
    [5]: 0
    [6]: 50
    [7]: 0
    [8]: 51
    [9]: 0
    [10]: 32
    [11]: 0
    [12]: 66
    [13]: 0
    [14]: 121
    [15]: 0
    [16]: 116
    [17]: 0
    [18]: 101
    [19]: 0
    [20]: 115
    [21]: 0
System.Text.Encoding.Unicode.GetBytes("1 023 Bytes")
{byte[22]}
    [0]: 49
    [1]: 0
    [2]: 32 <----------- 32,0 here
    [3]: 0
    [4]: 48
    [5]: 0
    [6]: 50
    [7]: 0
    [8]: 51
    [9]: 0
    [10]: 32
    [11]: 0
    [12]: 66
    [13]: 0
    [14]: 121
    [15]: 0
    [16]: 116
    [17]: 0
    [18]: 101
    [19]: 0
    [20]: 115
    [21]: 0

Question: How is this possible?

+2
6

, "" - U + 00A0, . , ... , , :

The size of the file is 1
023 bytes.

The size of the file is
1 023 bytes.

"1,023". , FormatSize ? , , , unit test . , :

internal static void WithInvariantCulture(Action action)
{
    WithCulture(CultureInfo.InvariantCulture, action);
}

internal static void WithCulture(CultureInfo culture, Action action)
{
    CultureInfo original = Thread.CurrentThread.CurrentCulture;
    try
    {
        Thread.CurrentThread.CurrentCulture = culture;
        action();
    }
    finally
    {
        Thread.CurrentThread.CurrentCulture = original;
    }            
}

:

WithInvariantCulture(() =>
{
    // Body of test
};

.

, , :

Assert.AreEqual("1\u00A0023 Bytes", size1023);
+12

Unicode 160 UTF8 160, . , , 194 + 160.

, Unicode 127 .

, CultureInfo (160) , (32), .

+4

194, 160 - utf8 codepoint 160: - &nbsp; html.

, , .

, - ! , unit test, ; CultureInfo - . - , , CultureInfo .

+2

, Assert.Equal, CultureInfo.CurrentCulture.NumberFormat.NumberGroupSeparator ?

+2

160 is not breaking space, which makes sense because you do not want your number to be split between lines. But 194 ... Oh yes. UTF8 double-byte.

+1
source

First of all, all strings in .NET are Unicode, so getting UTF8 bytes is useless. Secondly, when comparing strings, you should specify culture information, and when using string.format you should use IFormatProvider. This way you determine which characters are used in these functions.

0
source

Source: https://habr.com/ru/post/1748033/


All Articles