Saving text file encoding (ASCII, UTF-8, UTF-16)

I have a simple text file processing tool written in C #, the skeleton looks like this:

using (StreamReader reader = new StreamReader(absFileName, true)) // auto detect encoding using (StreamWriter writer = new StreamWriter(tmpFileName, false, reader.CurrentEncoding)) // open writer with the same encoding as reader { string line; while ((line = reader.ReadLine()) != null) { // do something with line writer.WriteLine(line); } } 

Most of the files it runs on are ASCII files, sometimes UTF-16 here and there. I want to save the encoding of the file, the newly created file should have the same encoding as the read file, so I open StreamWriter with CurrentEncoding for reading.

My problem is that some of the UTF-16 files do not have a preamble, and after opening StreamReader it has the value CurrentEncoding set to UTF-8, which leads to opening the record in UTF-8 mode. When debugging, I see that the reader changes his CurrentEncoding property to UTF-16 after the first call to ReadLine, but by then the author is already open.

I can come up with several workarounds (opening the writer later or switching to the source file twice - the first to detect the encoding), but I thought I'd ask the experts first. Please note that I am not interested in the code pages of ASCII files, I am only interested in the ASCII / UTF-8 / UTF-16 encodings.

+4
source share
1 answer

I would try to make reader.Peek() before opening the author - this should be enough in your case, I think.

+4
source

Source: https://habr.com/ru/post/1301807/


All Articles