Read little endian 16-bit unsigned integer

I am looking at parsing terminfo , which are a type of binary. You can read about your storage format on your own and confirm the problem that I am facing.

The manual says -

The header section starts the file. This section contains six short integers in the format described below. These integers

(1) magic number (octal 0432);

...

...

Short integers are stored in two 8-bit bytes. The first byte contains the least significant 8 bits of the value, and the second byte contains the most important 8 bits. (Thus, the represented value is 256 * seconds + first.) The value -1 is represented by two bytes 0377, 0377; Other negative values ​​are illegal . This value usually means that the corresponding ability is absent in this Terminal. Machines where this does not match the hardware must read the integers as two bytes and calculate the little-endian value .


  • The first problem when analyzing this type of input is that it captures the size up to 8 bits, so the plain old char cannot be used, since it does not guarantee that the size will be exactly 8 bits. So, I watched Integers with a fixed width ', but again ran into the problem of choosing b / w int8_t or uint8_t , which clearly stated - "only provided if the implementation directly supports the type". So, what should I choose to make the type portable enough.

  • The second problem is that the C ++ standard library does not have a buffer.readInt16LE() method that can read 16 bytes of data in Little Endian format. So, how should I continue implementing this feature again in a portable and safe way.

I already tried reading it with char data type, but it definitely produces garbage on my machine. The correct input can be read with the infocmp command, for example, $ infocmp xterm .


 #include <fstream> #include <iostream> #include <vector> int main() { std::ifstream db( "/usr/share/terminfo/g/gnome", std::ios::binary | std::ios::ate); std::vector<unsigned char> buffer; if (db) { auto size = db.tellg(); buffer.resize(size); db.seekg(0, std::ios::beg); db.read(reinterpret_cast<char*>(&buffer.front()), size); } std::cout << "\n"; } 

 $1 = std::vector of length 3069, capacity 3069 = {26 '\032', 1 '\001', 21 '\025', 0 '\000', 38 '&', 0 '\000', 16 '\020', 0 '\000', 157 '\235', 1 '\001', 193 '\301', 4 '\004', 103 'g', 110 'n', 111 'o', 109 'm', 101 'e', 124 '|', 71 'G', 78 'N', 79 'O', 77 'M', 69 'E', 32 ' ', 84 'T', 101 'e', 114 'r', 109 'm', 105 'i', 110 'n', 97 'a', 108 'l', 0 '\000', 0 '\000', 1 '\001', 0 '\000', 0 '\000', 1 '\001', 0 '\000', 0 '\000', 0 '\000', 0 '\000', 0 '\000', 0 '\000', 0 '\000', 0 '\000', 1 '\001', 1 '\001', 0 '\000', .... .... 
+3
source share
1 answer

The first problem when analyzing this type of input is that it captures the size up to 8 bits, so the plain old char cannot be used, since it does not guarantee that the size will be exactly 8 bits.

Any integer of at least 8 bits is fine. While char not guaranteed to be exactly 8 bits, at least 8 bits are required, since the size depends on the size, there is no problem, except that you can in some cases mask high bits if they exist. However, char may not be unsigned, and you do not want octets to be interpreted as signed values, so use unsigned char instead.

The second problem is the lack of the buffer.readInt16LE () method in the C ++ standard library, which could read 16 bytes of data in Little Endian format. So, how should I continue implementing this feature again in a portable and safe way.

Read one octet at a time in an unsigned char . Assign the first octet to a variable (large enough to represent at least 16 bits). Shift the bits of the second octet left by 8 and assign the variable using a bitwise or join.

Or better yet, do not reinstall it, but use a third-party existing library.

I already tried reading it with char data type, but it definitely creates garbage on my machine.

Then your attempt was a mistake. There is no problem inherent in char that will cause garbage output. I recommend using a debugger to solve this problem.

+2
source

Source: https://habr.com/ru/post/1261850/


All Articles