Starting with changes in Windows Server 2012

Edit: Initially, I thought this was related to the .NET Framework 4.5. It turned out that this applies to the .NET Framework 4.0.

It has changed how strings are processed in Windows Server 2012, which I'm trying to understand better. StartsWith's behavior seems to have changed. This issue is reproducible using both the .NET Framework 4.0 and 4.5.

With the .NET Framework 4.5 on Windows 7, the program below prints β€œFalse, t”. In Windows Server 2003, it prints "True, t."

internal class Program { private static void Main(string[] args) { string byteOrderMark = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble()); Console.WriteLine("test".StartsWith(byteOrderMark)); Console.WriteLine("test"[0]); } } 

In other words, StartsWith (ByteOrderMark) returns true regardless of the contents of the string. If you have code that tries to undo byte ordering using the following method, this code will work fine on Windows 7 but will print "est" on Windows 2012.

 internal class Program { private static void Main(string[] args) { string byteOrderMark = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble()); string someString = "Test"; if (someString.StartsWith(byteOrderMark)) someString = someString.Substring(1); Console.WriteLine("{0}", someString); Console.ReadKey(); } 

}

I understand that you already did something wrong if you have byte bytes in the string, but we integrate with legacy code that has this. I know that I can solve this specific problem by doing something like below, but I want to better understand the problem.

 someString = someString.Trim(byteOrderMark[0]); 

Hans Passsant suggested using the UTF8Encoding constructor, which allows me to directly point to emitting the UTF8 identifier. I tried this, but it gives the same result. The code below is different from the output between Windows 7 and Windows Server 2012. On Windows 7, it prints "Result: False". In Windows Server 2012, it prints "Result: True."

  private static void Main(string[] args) { var encoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier: true); string byteOrderMark = encoding.GetString(encoding.GetPreamble()); Console.WriteLine("Result: " + "Hello".StartsWith(byteOrderMark)); Console.ReadKey(); } 

I also tried the following option, which prints False, False, False in Windows 7, but True, True, False in Windows Server 2012, which confirms it related to the implementation of StartsWith on Windows Server 2012.

  private static void Main(string[] args) { var encoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier: true); string byteOrderMark = encoding.GetString(encoding.GetPreamble()); Console.WriteLine("Hello".StartsWith(byteOrderMark)); Console.WriteLine("Hello".StartsWith('\ufeff'.ToString())); Console.WriteLine("Hello"[0] == '\ufeff'); Console.ReadKey(); } 
+11
string c # unicode
Oct 21 '13 at 13:02
source share
1 answer

It turns out I can reproduce this by running a test program in Windows 8.1. It is in the same β€œfamily” as the 2012 server.

The most likely source of the problem is that the rules for comparing culture sensitivity have changed. They can be erm, flaky and may have strange results on these types of characters. The specification is a zero-width space. Recognizing this requires the same mental gymnastics as understanding why "abc" .StartsWith ("") returns true :)

You need to solve your problem using StringComparison.Ordinal. This produced False, False, False:

 private static void Main(string[] args) { var encoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier: true); string byteOrderMark = encoding.GetString(encoding.GetPreamble()); Console.WriteLine("Hello".StartsWith(byteOrderMark, StringComparison.Ordinal)); Console.WriteLine("Hello".StartsWith("\ufeff", StringComparison.Ordinal)); Console.WriteLine("Hello"[0] == '\ufeff'); Console.ReadKey(); } 
+10
Oct 21 '13 at
source share



All Articles