C # string compression and python compression

I am trying to compress a large string in a client program in C # (.net 4) and send it to the server (django, python 2.7) using a PUT request. Ideally, I want to use the standard library from both ends, so I'm trying to use gzip.

My C # code:

public static string Compress(string s) { var bytes = Encoding.Unicode.GetBytes(s); using (var msi = new MemoryStream(bytes)) using (var mso = new MemoryStream()) { using (var gs = new GZipStream(mso, CompressionMode.Compress)) { msi.CopyTo(gs); } return Convert.ToBase64String(mso.ToArray()); } } 

Python code:

 s = base64.standard_b64decode(request) buff = cStringIO.StringIO(s) with gzip.GzipFile(fileobj=buff) as gz: decompressed_data = gz.read() 

It almost works, but the output is {▯ "▯c▯h▯a▯n▯g▯e▯d▯" ▯} when it should be {"changed"}, i.e. any other letter is something strange, If I output every other character by executing unpacked_dates [:: 2], then it works, but it’s a little hacked, and obviously something else is wrong.

I am wondering if I need base64 to encode it at all for a PUT request? Is this only necessary for POST?

+4
source share
2 answers

I think the main problem may be C # uses UTF-16 encoded strings. This can lead to a problem similar to yours. Like any other encoding problem, we may need a little luck, but I think you can solve this by doing:

 decompressed_data = gz.read().decode('utf-16') 

There, decpressed_data should be Unicode , and you can consider it as such for further work.

UPDATE: This worked for me:

C sharp

 static void Main(string[] args) { FileStream f = new FileStream("test", FileMode.CreateNew); using (StreamWriter w = new StreamWriter(f)) { w.Write(Compress("hello")); } } public static string Compress(string s) { var bytes = Encoding.Unicode.GetBytes(s); using (var msi = new MemoryStream(bytes)) using (var mso = new MemoryStream()) { using (var gs = new GZipStream(mso, CompressionMode.Compress)) { msi.CopyTo(gs); } return Convert.ToBase64String(mso.ToArray()); } } 

Python

 import base64 import cStringIO import gzip f = open('test','rb') s = base64.standard_b64decode(f.read()) buff = cStringIO.StringIO(s) with gzip.GzipFile(fileobj=buff) as gz: decompressed_data = gz.read() print decompressed_data.decode('utf-16') 

Without decode('utf-16) it prints to the console:

 >>>hello 

everything turned out with him:

 >>>hello 

Good luck, hope this helps!

+4
source

It almost works, but the conclusion is: {▯ "▯c▯h▯a▯n▯g▯e▯d▯" ▯} when it should be {"changed"}

This is because you are using Encoding.Unicode to convert a string to bytes to begin with.

If you can tell Python which encoding to use, you can do it - otherwise you need to use the encoding on the C # side, which corresponds to what Python expects.

If you can specify it on both sides, I would suggest using UTF-8 rather than UTF-16. Despite the fact that you are compressing, it will not hurt to make the data half the size (in many cases) to start with :)

I am also somewhat suspicious of this line:

 buff = cStringIO.StringIO(s) 

s really not textual data - they are compressed binary data and should be treated as such. This may be good - it’s just worth checking if there is a better way.

+2
source

Source: https://habr.com/ru/post/1490896/


All Articles