Unpacking a structure ending in ASCIIZ

I am trying to use struct.unpack() to split a data record that ends with an ASCII string.

The record (this seems to be a TomTom ov2 record) has this format (stored little-endian):

  • 1 byte
  • 4 int bytes for the total record size (including this field)
  • 4 bytes int
  • 4 bytes int
  • variable-length string terminating zero

unpack() requires that the length of the string be included in the format you are passing. I can use the second field and the known size of the rest of the record - 13 bytes - to get the length of the string:

 str_len = struct.unpack("<xi", record[:5])[0] - 13 fmt = "<biii{0}s".format(str_len) 

then go to full decompression, but since the line ends with zero, I really want unpack() do this for me. It would also be nice to have this, I have to work through a framework that does not include its own size.

How can i do this?

+6
source share
2 answers

A struct.calcsize() entry is pretty easy to handle, as struct.calcsize() will tell you the length it expects. You can use this and the actual data length to build a new format string for unpack() , which includes the correct string length.

This function is just a wrapper for unpack() , allowing the new format character at the last position to discard the NUL terminal:

 import struct def unpack_with_final_asciiz(fmt, dat): """ Unpack binary data, handling a null-terminated string at the end (and only at the end) automatically. The first argument, fmt, is a struct.unpack() format string with the following modfications: If fmt last character is 'z', the returned string will drop the NUL. If it is 's' with no length, the string including NUL will be returned. If it is 's' with a length, behavior is identical to normal unpack(). """ # Just pass on if no special behavior is required if fmt[-1] not in ('z', 's') or (fmt[-1] == 's' and fmt[-2].isdigit()): return struct.unpack(fmt, dat) # Use format string to get size of contained string and rest of record non_str_len = struct.calcsize(fmt[:-1]) str_len = len(dat) - non_str_len # Set up new format string # If passed 'z', treat terminating NUL as a "pad byte" if fmt[-1] == 'z': str_fmt = "{0}sx".format(str_len - 1) else: str_fmt = "{0}s".format(str_len) new_fmt = fmt[:-1] + str_fmt return struct.unpack(new_fmt, dat) 

 >>> dat = b'\x02\x1e\x00\x00\x00z\x8eJ\x00\xb1\x7f\x03\x00Down by the river\x00' >>> unpack_with_final_asciiz("<biiiz", dat) (2, 30, 4886138, 229297, b'Down by the river') 
+5
source

I made two new functions that should be used as a replacement for standard packages and unpacking. Both of them support the character "z" for packing / unpacking the ASCIIZ string. There are no restrictions on the location or number of occurrences of the character "z" in the format string:

 import struct def unpack (format, buffer) : while True : pos = format.find ('z') if pos < 0 : break asciiz_start = struct.calcsize (format[:pos]) asciiz_len = buffer[asciiz_start:].find('\0') format = '%s%dsx%s' % (format[:pos], asciiz_len, format[pos+1:]) return struct.unpack (format, buffer) def pack (format, *args) : new_format = '' arg_number = 0 for c in format : if c == 'z' : new_format += '%ds' % (len(args[arg_number])+1) arg_number += 1 else : new_format += c if c in 'cbB?hHiIlLqQfdspP' : arg_number += 1 return struct.pack (new_format, *args) 

Here is an example of how to use them:

 >>> from struct_z import pack, unpack >>> line = pack ('<izizi', 1, 'Hello', 2, ' world!', 3) >>> print line.encode('hex') 0100000048656c6c6f000200000020776f726c64210003000000 >>> print unpack ('<izizi',line) (1, 'Hello', 2, ' world!', 3) >>> 
+7
source

Source: https://habr.com/ru/post/922304/


All Articles