Python Struct Unpack
I have this little problem that has been listening to me for the last hour or so.
string = b'-' t = struct.pack(">h%ds" % len(string), len(string), string) print(t) the result of this package is b '\ x00 \ x01 -'
The problem I am facing is that I canβt figure out how to unzip the result b '\ x00 \ x01-', so that it just "-", "Yes". I know I can just remove the crap from the front, but it gets a little complicated. I tried to simplify it here. Hope someone can help me. :)
Normally you would not use struct.pack to put the length header and value together. Instead, you simply execute struct.pack(">h", len(data)) , send it line by line (for example, in the network protocol), and then send the data. No need to create a new buffer.
In your case, you can simply do:
dataLength, = struct.unpack(">h", t[:2]) data = t[2:2+dataLength] but, as I said, if you have a socket based application, for example:
header = receive(2) dataLength, = struct.unpack(">h", header) data = receive(dataLength) import struct string = b'-' fmt=">h%ds" % len(string) Here you pack both the length and the string:
t = struct.pack(fmt, len(string), string) print(repr(t)) # '\x00\x01-' Therefore, when you unpack, you should expect to return two values, i.e. length and string:
length,string2=struct.unpack(fmt,t) print(repr(string2)) # '-' In general, if you do not know how the string was packed, then there is no reliable way to recover data. You just have to guess!
If you know the data consists of the length of the string, and then the string itself, then you can try a test error:
import struct string = b'-' fmt=">h%ds" % len(string) t = struct.pack(fmt, len(string), string) print(repr(t)) for endian in ('>','<'): for fmt,size in (('b',1),('B',1),('h',2),('H',2),('i',4),('I',4), ('l',4),('L',4),('q',8),('Q',8)): fmt=endian+fmt try: length,=struct.unpack(fmt,t[:size]) except struct.error: pass else: fmt=fmt+'{0}s'.format(length) try: length,string2=struct.unpack(fmt,t) except struct.error: pass else: print(fmt,length,string2) # ('>h1s', 1, '-') # ('>H1s', 1, '-') Perhaps it is possible to make an ambiguous string t , which has several valid decompressions, which will lead to different string2 s. I'm not sure.
Suppose data is a large chunk of bytes and you have successfully analyzed the first posn bytes. The documentation for this byte fragment says that the next element is a string of bytes preceded by a 16-bit signed (unlikely, but you said the format is h ) bigendian integer. Here's what to do:
nbytes, = struct.unpack('>h', data[posn:posn+2] posn += 2 the_string = data[posn:posn+nbytes] posn += nbytes and now you are ready for the next element.
Note. If you are writing code limited to Python 2.5 or later, you can use unpack_from()
How exactly do you unpack?
>>> string = b'-' >>> format = '>h%ds' % len(string) >>> format '>h1s' >>> struct.calcsize(format) 3 For unpack(fmt, string) , len(string) should be equal to struct.calcsize(fmt) . Thus, it is not possible for the decompressed data to be just '-' .
But:
>>> t = b'\x00\x01-' >>> length, data = struct.unpack(format, t) >>> length, data (1, '-') Now you can use data .