Save one decimal digit

I have a problem regarding a large number of small integers (actually decimal digits). What is a spatial efficient way to store such data?

Is it possible to use std::bitset<4> to store a decimal digit?

+5
source share
3 answers

Depending on how effective it is in space and how effective the search is, I see two possibilities:

  • Since the vector std::bitset<4> (as far as I know) is stored in the unpacked setting (each bit is stored in a memory word, 32 or 64 bits), you should at least use a packed representation, for example, using 64-bit word for storing 16 digits:

     store (if the digit was not stored before): block |= digit << 4 * index load: digit = (block >> 4 * index) & 0xF reset: block &= ~(0xF << 4 * index); 

The vector of these 64-bit words (uint64_t) along with some access methods should be easily implemented.

  • If your space requirements are even more stringent, you can, for example, try packing 3 digits in 10 bits (no more than 1024) using divisions and a module, which will be much less economical. In addition, alignment with 64-bit words is much more complicated, so I would recommend this only if you need to get the final 16 percent improvement, at most you can get something like 3.3 bits per digit.
+3
source

If you want a very compact way, then no, using bitset<4> is a bad idea, because bitset<4> will use at least one byte instead of 4 bits.

I would recommend using std::vector<std::uint32_t>

You can store multiple digits in uint32_t. Two common ways:

  • Use for 4 bits for each digit and use bit operations. This way you can store 8 digits in 4 bytes. Here set / get operations are pretty fast. Efficiency: 4 bit / digit
  • Use base 10 encoding. The maximum value of uint32_t is 256 ^ 4-1, which can store 9 digits in 4 bytes. Efficiency: 3.55 bit / digit. Here, if you need to set / get all 9 digits, then it is almost as fast as the previous version (since dividing by 10 will be optimized by a good compiler, the actual division will not be performed by the CPU). If you need random access, installation / retrieval will be slower than the previous version (you can speed it up with libdivide ).

If you use uint64_t instead of uint32_t , you can save 16 digits at a glance (the same as 4 bits / bit), and 19 digits with the second method: 3.36 bits / digit of efficiency, which is pretty close to the theoretical minimum: ~ 3, 3219 bit / digit

+3
source

Is it possible to use std :: bitset <4> to store a decimal digit?

Yes, in principle, a good idea. This is a well-known optimization called BCD .

(actually decimal digits). What is a spatial efficient way to store such data?

You can write a representation of a decimal digit using one piece of a busy byte. Maths, optimized, or ASCII representations of numbers or those may also be applied.

std::bitset<4> will not be used to compress data.
std::bitset<4> will still occupy the full byte.

An alternative data structure that I could think of is a bitfield

 // Maybe #pragma pack(push(1)) struct TwoBCDDecimalDigits { uint8_t digit1 : 4; uint8_t digit2 : 4; }; // Maybe #pragma pack(pop) 

There is even a library to convert this format to a normalized numerical format supported in your target CPU architecture:


Another way I could think of is to write my own class:

 class BCDEncodedNumber { enum class Sign_t : char { plus = '+' , minus = '-' }; std::vector<uint8_t> doubleDigitsArray; public: BCDEncodedNumber() = default; BCDEncodedNumber(int num) { AddDigits(num); // Implements math operation + against the // current BCD representation stored in // doubleDigitsArray. } }; 
+1
source

Source: https://habr.com/ru/post/1269631/


All Articles