I have a simple text file processing tool written in C #, the skeleton looks like this:
using (StreamReader reader = new StreamReader(absFileName, true)) // auto detect encoding using (StreamWriter writer = new StreamWriter(tmpFileName, false, reader.CurrentEncoding)) // open writer with the same encoding as reader { string line; while ((line = reader.ReadLine()) != null) { // do something with line writer.WriteLine(line); } }
Most of the files it runs on are ASCII files, sometimes UTF-16 here and there. I want to save the encoding of the file, the newly created file should have the same encoding as the read file, so I open StreamWriter with CurrentEncoding for reading.
My problem is that some of the UTF-16 files do not have a preamble, and after opening StreamReader it has the value CurrentEncoding set to UTF-8, which leads to opening the record in UTF-8 mode. When debugging, I see that the reader changes his CurrentEncoding property to UTF-16 after the first call to ReadLine, but by then the author is already open.
I can come up with several workarounds (opening the writer later or switching to the source file twice - the first to detect the encoding), but I thought I'd ask the experts first. Please note that I am not interested in the code pages of ASCII files, I am only interested in the ASCII / UTF-8 / UTF-16 encodings.
source share