Read / Output C # UTF8

I'm trying to do something that, in my opinion, should be fairly simple, but I spent too much time on it, and I tried several different approaches, which I explored, but to no avail.

Basically, I have a huge list of names that have β€œspecial” characters from UTF8 encoding in them.

My ultimate goal is to read each name and then make an HTTP request using that name in the URL as a GET variable.

My first goal was to read one name from a file and put it in the standard one to confirm that I could read and write UTF8 correctly before creating lines and doing all HTTP requests.

The test1.txt file I made contains only this content:

Ownage

Then I used this C # code to read in the file. I set the encoding to StreamReader and Console.OutputEncoding to UTF8 .

 static void Main(string[] args) { Console.OutputEncoding = System.Text.Encoding.UTF8; using (StreamReader reader = new StreamReader("test1.txt",System.Text.Encoding.UTF8)) { string line; while ((line = reader.ReadLine()) != null) { Console.WriteLine(line); } } Console.ReadLine(); } 

To my surprise, I get this conclusion:

enter image description here

The expected result matches the original contents of the file.

How can I be sure that the lines that I am going to create to create HTTP requests will be correct if I cannot even perform a simple task like UTF8 read / write lines?

+6
source share
3 answers

Your program is fine (assuming the input file is actually UTF-8). If you debug your program and use the "Watch" window to look at the lines ( line variable), you will find that this is correct. This way you can be sure that you will send the correct HTTP requests (or whatever you do with the strings).

What you see is an error in the Windows console.

Fortunately, this only affects bitmap fonts. If you change the console window to use the TrueType font, for example. Console or Lucida Console, the problem goes away.

screenshot

You can set this for all future windows using the "Default" menu item:

screenshot

+6
source

See Reading Unicode from the Console

If you are using .NET 4, you will need to use

  Console.InputEncoding = Encoding.Unicode; Console.OutputEncoding = Encoding.Unicode; 

and make sure you use the Lucida console as a console font.

If you are using .NET 3.5, you are probably out of luck.

To effectively read lines from a file, I would probably use:

 foreach(var line in File.ReadAllLines(path, Encoding.UTF8)) { // do stuff } 
+3
source

To read all characters, as you mention, you should use the default encoding, such as

 new StreamReader(@"E:\database.txt", System.Text.Encoding.Default)) 
+1
source

Source: https://habr.com/ru/post/910092/


All Articles