How to use ReadAllText with unknown file encoding

Im reading a file with ReadAllText

String[] values = File.ReadAllText(@"c:\\c\\file.txt").Split(';'); int i = 0; foreach (String s in values) { System.Console.WriteLine("output: {0} {1} ", i, s); i++; } 

If I try to read some files, I sometimes get the wrong character (for Γ–ΓœΓ„Γ€ ...). The result is similar to "?", Because there are some encoding problems:

 output: 0 TEST output: 1 A??O? 

One solution would be to set the encoding to ReadAllText, you could say something like ReadAllText(@"c:\\c\\file.txt", Encoding.UTF8) , which could fix the problem. But what if I still will? "how is the way out? What if I don’t know the encoding of the file? And what if each file received a different encoding? What would be the best way to do this with C #? Thank you

+6
source share
2 answers

The only way to reliably do this is to find the byte order bytes at the beginning of the text file. (This blob is generally a character encoding encoding, as well as encoding, for example, UTF8, UTF16, UTF32). Unfortunately, this method only works for Unicode-based encodings, and nothing before that (for which you need to use much less reliable methods).

The StreamReader type supports the detection of these labels to determine the encoding - you just need to pass the flag to the parameter as such:

 new System.IO.StreamReader("path", true) 

You can then check the value of stremReader.CurrentEncoding to determine the encoding used by the file. Note, however, that if no byte encoding exists, then CurrentEncoding will default to Encoding.Default .

Refer to codeproject solution to find encoding

+6
source

First you need to check the encoding of the files. try it

 System.Text.Encoding enc = null; System.IO.FileStream file = new System.IO.FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read); if (file.CanSeek) { byte[] bom = new byte[4]; // Get the byte-order mark, if there is one file.Read(bom, 0, 4); if ((bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) || // utf-8 (bom[0] == 0xff && bom[1] == 0xfe) || // ucs-2le, ucs-4le, and ucs-16le (bom[0] == 0xfe && bom[1] == 0xff) || // utf-16 and ucs-2 (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff)) // ucs-4 { enc = System.Text.Encoding.Unicode; } else { enc = System.Text.Encoding.ASCII; } // Now reposition the file cursor back to the start of the file file.Seek(0, System.IO.SeekOrigin.Begin); } else { // The file cannot be randomly accessed, so you need to decide what to set the default to // based on the data provided. If you're expecting data from a lot of older applications, // default your encoding to Encoding.ASCII. If you're expecting data from a lot of newer // applications, default your encoding to Encoding.Unicode. Also, since binary files are // single byte-based, so you will want to use Encoding.ASCII, even though you'll probably // never need to use the encoding then since the Encoding classes are really meant to get // strings from the byte array that is the file. enc = System.Text.Encoding.ASCII; } 
+5
source

Source: https://habr.com/ru/post/916658/


All Articles