Utf8 compression and encoding

can someone tell me why i am losing information by doing this process? Some utf8 characters are not decoded: "Biography":"\u003clink type=... or Steve Blunt \u0026 Marty Kelley but others do: "Name":"朱敬

 // Creating a 64bit string containing gzip data string bar; using (MemoryStream ms = new MemoryStream()) { using (GZipStream gzip = new GZipStream(ms, CompressionMode.Compress)) using (StreamWriter writer = new StreamWriter(gzip, System.Text.Encoding.UTF8)) { writer.Write(s); } ms.Flush(); bar = Convert.ToBase64String(ms.ToArray()); } // Reading it string foo; byte[] itemData = Convert.FromBase64String(bar); using (MemoryStream src = new MemoryStream(itemData)) using (GZipStream gzs = new GZipStream(src, CompressionMode.Decompress)) using (MemoryStream dest = new MemoryStream(itemData.Length*2)) { gzs.CopyTo(dest); foo = Encoding.UTF8.GetString(dest.ToArray()); } Console.WriteLine(foo); 
+6
source share
2 answers

This may be because you are writing a string using StreamWriter , but reading it using CopyTo() and Encoding.GetString() .

What happens if you try this?

 // Reading it string foo; byte[] itemData = Convert.FromBase64String(bar); using (MemoryStream src = new MemoryStream(itemData)) using (GZipStream gzs = new GZipStream(src, CompressionMode.Decompress)) using (StreamReader reader = new StreamReader(gzs, Encoding.UTF8)) { foo = reader.ReadLine(); } 

Although I think you should use BinaryReader and BinaryWriter :

 string s = "Biography:\u003clink type..."; string bar; using (MemoryStream ms = new MemoryStream()) { using (GZipStream gzip = new GZipStream(ms, CompressionMode.Compress)) using (var writer = new BinaryWriter(gzip, Encoding.UTF8)) { writer.Write(s); } ms.Flush(); bar = Convert.ToBase64String(ms.ToArray()); } // Reading it string foo; byte[] itemData = Convert.FromBase64String(bar); using (MemoryStream src = new MemoryStream(itemData)) using (GZipStream gzs = new GZipStream(src, CompressionMode.Decompress)) using (var reader = new BinaryReader(gzs, Encoding.UTF8)) { foo = reader.ReadString(); } Console.WriteLine(foo); 
+2
source

The problem was that the characters were already encoded in the original string.

Ps: Credit goes to rik for this answer :)

Edit: I also had a problem with StreamReader matthew-watson .

0
source

Source: https://habr.com/ru/post/969945/


All Articles