Byte characters of StreamWriter and UTF-8

I have a problem with StreamWriter and Byte Order tags. The documentation seems to indicate that Encoding.UTF8 encoding has byte-order marks, but when files are written, some have marks, while others don't.

I create a stream entry as follows:

this.Writer = new StreamWriter( this.Stream , System.Text.Encoding.UTF8 ); 

Any ideas on what might happen would be appreciated.

+43
c # file-encodings
Mar 10 2018-11-21T00:
source share
8 answers

As someone already pointed out, calling without an encoding argument does the trick. However, if you want to be explicit, try the following:

 using (var sw = new StreamWriter("text.txt", new UTF8Encoding(false))) 

The key is to create a new UTF8Encoding (false) instead of using Encoding.UTF8Encoding. This is to control whether a specification needs to be added or not.

This is the same as calling StreamWriter without an encoding argument, internally it just does the same.

+61
Jul 25 '12 at 17:14
source share

The only time I saw that the constructor does not add the UTF-8 specification is if the thread is not at position 0 when you call it. For example, in the code below, the specification is not written:

 using (var s = File.Create("test2.txt")) { s.WriteByte(32); using (var sw = new StreamWriter(s, Encoding.UTF8)) { sw.WriteLine("hello, world"); } } 

As others have said, if you use the StreamWriter(stream) constructor without specifying an encoding, you will not see the specification.

+11
Mar 10 '11 at 9:45 a.m.
source share

The problem is that you are using the static UTF8 property in the Encoding class.

When the GetPreamble method GetPreamble called on an instance of the Encoding class returned by the UTF8 property, it returns the byte order sign (a byte array of three characters) and writes to the stream before any other content is written to the stream (assuming a new stream).

You can avoid this by creating an instance of the UTF8Encoding class, for example:

 // As before. this.Writer = new StreamWriter(this.Stream, // Create yourself, passing false will prevent the BOM from being written. new System.Text.UTF8Encoding()); 

According to the documentation for the defaultlessless constructor (my highlight):

This constructor creates an instance that does not provide a Unicode byte order mark and does not throw an exception when an incorrect encoding is detected.

This means that the GetPreamble call will return an empty array, and therefore no specification will be written to the base stream.

+11
Mar 23 '13 at 5:27
source share

My answer is based on HelloSam, which contains all the necessary information. Only I believe what the OP is asking for - how to make sure the specification is selected in the file.

So instead of passing false to UTF8Encoding ctor, you need to pass true.

  using (var sw = new StreamWriter("text.txt", new UTF8Encoding(true))) 

Try the code below, open the resulting files in a hex editor and see which one contains the specification and which does not.

 class Program { static void Main(string[] args) { const string nobomtxt = "nobom.txt"; File.Delete(nobomtxt); using (Stream stream = File.OpenWrite(nobomtxt)) using (var writer = new StreamWriter(stream, new UTF8Encoding(false))) { writer.WriteLine("Hello"); } const string bomtxt = "bom.txt"; File.Delete(bomtxt); using (Stream stream = File.OpenWrite(bomtxt)) using (var writer = new StreamWriter(stream, new UTF8Encoding(true))) { writer.WriteLine("Hello"); } } 
+8
Mar 19 '14 at 19:31
source share

Do you use the same StreamWriter constructor for each file? Since the documentation states:

To create a StreamWriter using UTF-8 encoding and specifications, consider using a constructor that defines the encoding, for example StreamWriter (String, Boolean, Encoding).

I was in a similar situation a while ago. In the end, I used the Stream.Write method instead of StreamWriter and wrote the result of Encoding.GetPreamble() before writing Encoding.GetBytes(stringToWrite)

+5
Mar 10 '11 at 21:40
source share

It seems that if the file already exists and does not contain the specification, it will not contain the specification when overwriting, in other words, StreamWriter preserves the specification (or its absence) when overwriting the file.

+2
Jun 23 '11 at 1:59 april
source share

I found this answer useful (thanks @Philipp Grathwohl and @Nik), but in my case I use FileStream to complete the task, so the code that generates the specification looks like this:

 using (FileStream vStream = File.Create(pfilePath)) { // Creates the UTF-8 encoding with parameter "encoderShouldEmitUTF8Identifier" set to true Encoding vUTF8Encoding = new UTF8Encoding(true); // Gets the preamble in order to attach the BOM var vPreambleByte = vUTF8Encoding.GetPreamble(); // Writes the preamble first vStream.Write(vPreambleByte, 0, vPreambleByte.Length); // Gets the bytes from text byte[] vByteData = vUTF8Encoding.GetBytes(pTextToSaveToFile); vStream.Write(vByteData, 0, vByteData.Length); vStream.Close(); } 
+2
Dec 02 '14 at 14:28
source share

Could you show the situation when it does not produce? The only case where the preamble is missing that I can find is when nothing was ever written to the writer (Jim Michel seems to have found another, logical and most likely your problem, see Answer).

My test code is:

 var stream = new MemoryStream(); using(var writer = new StreamWriter(stream, System.Text.Encoding.UTF8)) { writer.Write('a'); } Console.WriteLine(stream.ToArray() .Select(b => b.ToString("X2")) .Aggregate((i, a) => i + " " + a) ); 
0
Mar 10 '11 at 9:45 a.m.
source share



All Articles