Edit: Initially, I thought this was related to the .NET Framework 4.5. It turned out that this applies to the .NET Framework 4.0.
It has changed how strings are processed in Windows Server 2012, which I'm trying to understand better. StartsWith's behavior seems to have changed. This issue is reproducible using both the .NET Framework 4.0 and 4.5.
With the .NET Framework 4.5 on Windows 7, the program below prints βFalse, tβ. In Windows Server 2003, it prints "True, t."
internal class Program { private static void Main(string[] args) { string byteOrderMark = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble()); Console.WriteLine("test".StartsWith(byteOrderMark)); Console.WriteLine("test"[0]); } }
In other words, StartsWith (ByteOrderMark) returns true regardless of the contents of the string. If you have code that tries to undo byte ordering using the following method, this code will work fine on Windows 7 but will print "est" on Windows 2012.
internal class Program { private static void Main(string[] args) { string byteOrderMark = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble()); string someString = "Test"; if (someString.StartsWith(byteOrderMark)) someString = someString.Substring(1); Console.WriteLine("{0}", someString); Console.ReadKey(); }
}
I understand that you already did something wrong if you have byte bytes in the string, but we integrate with legacy code that has this. I know that I can solve this specific problem by doing something like below, but I want to better understand the problem.
someString = someString.Trim(byteOrderMark[0]);
Hans Passsant suggested using the UTF8Encoding constructor, which allows me to directly point to emitting the UTF8 identifier. I tried this, but it gives the same result. The code below is different from the output between Windows 7 and Windows Server 2012. On Windows 7, it prints "Result: False". In Windows Server 2012, it prints "Result: True."
private static void Main(string[] args) { var encoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier: true); string byteOrderMark = encoding.GetString(encoding.GetPreamble()); Console.WriteLine("Result: " + "Hello".StartsWith(byteOrderMark)); Console.ReadKey(); }
I also tried the following option, which prints False, False, False in Windows 7, but True, True, False in Windows Server 2012, which confirms it related to the implementation of StartsWith on Windows Server 2012.
private static void Main(string[] args) { var encoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier: true); string byteOrderMark = encoding.GetString(encoding.GetPreamble()); Console.WriteLine("Hello".StartsWith(byteOrderMark)); Console.WriteLine("Hello".StartsWith('\ufeff'.ToString())); Console.WriteLine("Hello"[0] == '\ufeff'); Console.ReadKey(); }