Binary String in C ++ Hex

Question

Binary String in C ++ Hex

When changing the binary string to hex, I could only do this to a certain size, based on the answers I found. But I want to change the MASSIVE binary strings to their full hexadecimal copy in a more efficient way than this, which is the only way I came across that does it completely:

for(size_t i = 0; i < (binarySubVec.size() - 1); i++){ string binToHex, tmp = "0000"; for (size_t j = 0; j < binaryVecStr[i].size(); j += 4){ tmp = binaryVecStr[i].substr(j, 4); if (!tmp.compare("0000")) binToHex += "0"; else if (!tmp.compare("0001")) binToHex += "1"; else if (!tmp.compare("0010")) binToHex += "2"; else if (!tmp.compare("0011")) binToHex += "3"; else if (!tmp.compare("0100")) binToHex += "4"; else if (!tmp.compare("0101")) binToHex += "5"; else if (!tmp.compare("0110")) binToHex += "6"; else if (!tmp.compare("0111")) binToHex += "7"; else if (!tmp.compare("1000")) binToHex += "8"; else if (!tmp.compare("1001")) binToHex += "9"; else if (!tmp.compare("1010")) binToHex += "A"; else if (!tmp.compare("1011")) binToHex += "B"; else if (!tmp.compare("1100")) binToHex += "C"; else if (!tmp.compare("1101")) binToHex += "D"; else if (!tmp.compare("1110")) binToHex += "E"; else if (!tmp.compare("1111")) binToHex += "F"; else continue; } hexOStr << binToHex; hexOStr << " "; }

Its complete and absolute, but slow.

Is there an easier way to do this?

+6

c ++ binary hex

fakeaccount Mar 23 '15 at 12:22

source share

9 answers

UPDATE2 See here for my perfect hash solution . This decision will have my preference because

It compiles much faster
It has a more predictable runtime (there is zero allocation, since all data is static)

EDIT . Benchmarking has now shown that an ideal hash solution will be approximately 340 x faster than the Spirit approach. Look here:

UPDATE

Added Trie-based solution .

The lookup table uses the built-in implementation of Trie for Boost Spirit for quick searches.

Of course replace out , for example. vector back_inserter or ostreambuf_iterator<char> to your string stream if you prefer. Right now, it does not even select 4 characters (although, of course, the lookup table is allocated once).

You can also trivially replace input iterators with whatever input range you have, without changing the line of the rest of the code.

Live on coliru

 #include <iostream> #include <boost/spirit/include/qi.hpp> #include <boost/spirit/include/phoenix.hpp> namespace qi = boost::spirit::qi; int main() { std::ostreambuf_iterator<char> out(std::cout); qi::symbols<char, char> lookup_table; lookup_table.add ("0000", '0') ("0001", '1') ("0010", '2') ("0011", '3') ("0100", '4') ("0101", '5') ("0110", '6') ("0111", '7') ("1000", '8') ("1001", '9') ("1010", 'a') ("1011", 'b') ("1100", 'c') ("1101", 'd') ("1110", 'e') ("1111", 'f') ; boost::spirit::istream_iterator bof(std::cin), eof; if (qi::parse(bof, eof, +lookup_table [ *boost::phoenix::ref(out) = qi::_1 ])) return 0; else return 255; }

Old, basic answer:

Reading input from a stream and outputting one character every 4 bytes.

Here's the gist:

 char nibble[4]; while (std::cin.read(nibble, 4)) { std::cout << "0123456789abcdef"[ (nibble[0]!='0')*8 + (nibble[1]!='0')*4 + (nibble[2]!='0')*2 + (nibble[3]!='0')*1 ]; }

You can do the conversion of the lookup table really. Do not use the map, as it is based on a tree, and ends up with many pointers. However, boost::flat_map might be fine.

+3

sehe Mar 23 '15 at 12:34

source share

Here's how I would do it:

Find the smallest positive integer n such that these integers have different residues modulo n :
0x30303030 0x30303031 0x30303130 0x30303131 0x30313030 0x30313031 0x30313130 0x30313131 0x31303030 0x31303031 0x31303130 0x31303131 0x31313030 0x31313031 0x3131313030

These are ASCII representations of "0000", "0001", etc. I put them in order, considering that your machine is large; if it is low in intensity, the representation, for example, “0001” will be 0x31303030, not 0x30303031. You only need to do this once. n will not be very large - I expect it to be less than 100.

Create a char HexChar[n] table char HexChar[n] with HexChar[0x30303030 % n] = '0', HexChar[0x30303031 % n] = '1' , etc. (or HexChar[0x31303030 % n] = '1' , etc., if your machine is unimportant).

Now the conversion is lightning fast (I take sizeof (int) = 4 ):

 unsigned int const* s = binaryVecStr[a].c_str(); for (size_t i = 0; i < binaryVecStr[a].size(); i += 4, s++) hexOStr << HexChar[*s % n];

+3

Tonyk Mar 23 '15 at 13:55

source share

I have such a strange feeling that I must miss something important in this matter. At first glance, it looks like this should work:

 template <class RanIt, class OutIt> void make_hex(RanIt b, RanIt e, OutIt o) { static const char rets[] = "0123456789ABCDEF"; if ((eb) %4 != 0) throw std::runtime_error("Length must be a multiple of 4"); while (b != e) { int index = ((*(b + 0) - '0') << 3) | ((*(b + 1) - '0') << 2) | ((*(b + 2) - '0') << 1) | ((*(b + 3) - '0') << 0); *o++ = rets[index]; b += 4; } }

At least because of this, it seems that it should be about as fast as anything you want - it seems to me that it is approaching the minimum processing at each input needed to exit.

To maximize speed, it minimizes error checking at the input (and possibly lower) of the minimum minimum. You could of course assure that each character in the input was “0” or “1” before depending on the result of subtraction. Alternatively, you can easily use (*(b + 0) != '0') << 3 to treat 0 as 0 and everything else as 1 . Similarly, you can use: (*(b + 0) == '1') << 3 to get 1 processed as 1, and everything else as 0.

The code avoids the dependencies between the 4 calculations needed to calculate each index value, so it should be possible for the smart compiler to run these calculations in parallel.

Since it only works with iterators, it avoids creating additional copies of the input, such as (for example), it can do almost anything with substr (especially with the implementation of std::string , which does not include short string optimization).

In any case, using this will look something like this:

 int main() { char input[] = "0000000100100011010001010110011110001001101010111100110111101111"; make_hex(input, input+64, std::ostream_iterator<char>(std::cout)); }

Since it uses iterators, it can just as easily (just for one obvious example) take input from istreambuf_iterator to process data directly from a file. This is rarely the fastest way to do this - you usually get the best speed using istream::read to read a large chunk, and ostream::write to write a large chunk at the same time. This would not affect the actual conversion code - you would just pass pointers to the input and output buffers, and they would use them as iterators.

+3

Jerry Coffin Mar 23 '15 at 18:41

source share

This seems to work.

 std::vector<char> binaryVecStr = { '0', '0', '0', '1', '1', '1', '1', '0' }; string binToHex; binToHex.reserve(binaryVecStr.size()/4); for (uint32_t * ptr = reinterpret_cast<uint32_t *>(binaryVecStr.data()); ptr < reinterpret_cast<uint32_t *>(binaryVecStr.data()) + binaryVecStr.size() / 4; ++ptr) { switch (*ptr) { case 0x30303030: binToHex += "0"; break; case 0x31303030: binToHex += "1"; break; case 0x30313030: binToHex += "2"; break; case 0x31313030: binToHex += "3"; break; case 0x30303130: binToHex += "4"; break; case 0x31303130: binToHex += "5"; break; case 0x30313130: binToHex += "6"; break; case 0x31313130: binToHex += "7"; break; case 0x30303031: binToHex += "8"; break; case 0x31303031: binToHex += "9"; break; case 0x30313031: binToHex += "A"; break; case 0x31313031: binToHex += "B"; break; case 0x30303131: binToHex += "C"; break; case 0x31303131: binToHex += "D"; break; case 0x30313131: binToHex += "E"; break; case 0x31313131: binToHex += "F"; break; default: // invalid input binToHex += "?"; } } std::cout << binToHex;

Beware, uses a few assumptions:

1) char has 8 bits (does not apply to all platforms)

2) it requires a little endian (this means that it works at least on x86, x86_64)

BinaryVecStr is supposed to be std :: vector, but will also work with string. It is assumed that binaryVecStr.size() % 4 == 0

+2

Paladin Mar 23 '15 at 12:58

source share

something like this is possible

 #include <iostream> #include <string> #include <iomanip> #include <sstream> int main() { std::cout << std::hex << std::stoll("100110110010100100111101010001001101100101010110000101111111111",NULL, 2) << std::endl; std::stringstream ss; ss << std::hex << std::stoll("100110110010100100111101010001001101100101010110000101111111111", NULL, 2); std::cout << ss.str() << std::endl; return 0; }

+2

user4703182 Mar 23 '15 at 13:00

source share

This is the fastest I could come up with:

 #include <iostream> int main(int argc, char** argv) { char buffer[4096]; while (std::cin.read(buffer, sizeof(buffer)), std::cin.gcount() > 0) { size_t got = std::cin.gcount(); char* out = buffer; for (const char* it = buffer; it < buffer + got; it += 4) { unsigned long r; r = it[3]; r += it[2] * 2; r += it[1] * 4; r += it[0] * 8; *out++ = "0123456789abcdef"[r - 15*'0']; } std::cout.write(buffer, got / 4); } }

This is faster than anything else on this subject, according to @sehe benchmarks.

+2

orlp Mar 24 '15 at 1:57

source share

You can try the binary decision tree:

 string binToHex; for (size_t i = 0; i < binaryVecStr[a].size(); i += 4) { string tmp = binaryVecStr[a].substr(i, 4); if (tmp[0] == '0') { if (tmp[1] == '0') { if (tmp[2] == '0') { if tmp[3] == '0') { binToHex += "0"; } else { binToHex += "1"; } } else { if tmp[3] == '0') { binToHex += "2"; } else { binToHex += "3"; } } } else { if (tmp[2] == '0') { if tmp[3] == '0') { binToHex += "4"; } else { binToHex += "5"; } } else { if tmp[3] == '0') { binToHex += "6"; } else { binToHex += "7"; } } } } else { if (tmp[1] == '0') { if (tmp[2] == '0') { if tmp[3] == '0') { binToHex += "8"; } else { binToHex += "9"; } } else { if tmp[3] == '0') { binToHex += "A"; } else { binToHex += "B"; } } } else { if (tmp[2] == '0') { if tmp[3] == '0') { binToHex += "C"; } else { binToHex += "D"; } } else { if tmp[3] == '0') { binToHex += "E"; } else { binToHex += "F"; } } } } } hexOStr << binToHex;

You can also consider a more compact representation of the same decision tree, for example

 string binToHex; for (size_t i = 0; i < binaryVecStr[a].size(); i += 4) { string tmp = binaryVecStr[a].substr(i, 4); binToHex += (tmp[0] == '0' ? (tmp[1] == '0' ? (tmp[2] == '0' ? (tmp[3] == '0' ? "0" : "1") : (tmp[3] == '0' ? "2" : "3")) : (tmp[2] == '0' ? (tmp[3] == '0' ? "4" : "5") : (tmp[3] == '0' ? "6" : "7"))) : (tmp[1] == '0' ? (tmp[2] == '0' ? (tmp[3] == '0' ? "8" : "9") : (tmp[3] == '0' ? "A" : "B")) : (tmp[2] == '0' ? (tmp[3] == '0' ? "C" : "D") : (tmp[3] == '0' ? "E" : "F")))); } hexOStr << binToHex;

Update: At the heart of ASCII-integer solutions:

 unsigned int nibble = static_cast<unsigned int*>(buffer); nibble &= 0x01010101; // 0x31313131 --> 0x01010101 nibble |= (nibble >> 15); // 0x01010101 --> 0x01010303 nibble |= (nibble >> 6); // 0x01010303 --> 0x0105070C char* hexDigit = hexDigitTable[nibble & 15];

The contents of hexDigitTable (such as char[16] ) will depend on whether you are on a machine with small or large tones.

+1

David K Mar 23 '15 at 12:44

source share

As for the easy way to do this, I think it's pretty neat:

 std::string bintxt_2_hextxt(const std::string &bin) { std::stringstream reader(bin); std::stringstream result; while (reader) { std::bitset<8> digit; reader >> digit; result << std::hex << digit.to_ulong(); } return result.str(); }

I do not know where your data should be read from, so I used std::string as input; but if it is from a text file or data stream, there should not be a headache to change the reader as std::ifstream .

Caution! I do not know what can happen if the characters of the stream are not divisible by 8, and I also did not check the performance of this code.

Living example

+1

Paula_plus_plus Mar 23 '15 at 15:45

source share

sehe · Accepted Answer · 2015-03-23T16:07:48+0000

UPDATE Added comparison and tests at the end

Here is another example based on an ideal hash. The ideal hash was generated using gperf (as described here: Is it possible to match a string with int faster than using hashmap? ).

I optimized even more by moving the local statics function to the side and marking hexdigit() and hash() as constexpr . This eliminates unnecessary initialization overhead and gives the compiler a full room for optimizing /

I do not expect everything to be much faster than that.

You can try to read, for example. 1024 blocks at a time, if possible, and give the compiler the ability to vectorize operations using the AVX / SSE instruction sets. (I did not check the generated code to see if this would happen.)

Full sample code for translating std::cin to std::cout in streaming mode:

 #include <iostream> int main() { char buffer[4096]; while (std::cin.read(buffer, sizeof(buffer)), std::cin.gcount()) { size_t got = std::cin.gcount(); char* out = buffer; for (auto it = buffer; it < buffer+got; it += 4) *out++ = Perfect_Hash::hexchar(it); std::cout.write(buffer, got/4); } }

Here's the Perfect_Hash class, slightly edited and extended with hexchar search. Note that it checks the input in DEBUG strings using assert :

Live on coliru

 #include <array> #include <algorithm> #include <cassert> class Perfect_Hash { /* C++ code produced by gperf version 3.0.4 */ /* Command-line: gperf -L C++ -7 -C -E -m 100 table */ /* Computed positions: -k'1-4' */ /* maximum key range = 16, duplicates = 0 */ private: static constexpr unsigned char asso_values[] = { 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 15, 7, 3, 1, 0, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27}; template <typename It> static constexpr unsigned int hash(It str) { return asso_values[(unsigned char)str[3] + 2] + asso_values[(unsigned char)str[2] + 1] + asso_values[(unsigned char)str[1] + 3] + asso_values[(unsigned char)str[0]]; } static constexpr char hex_lut[] = "???????????fbead9c873625140"; public: #ifdef DEBUG template <typename It> static char hexchar(It binary_nibble) { assert(Perfect_Hash::validate(binary_nibble)); // for DEBUG only return hex_lut[hash(binary_nibble)]; // no validation! } #else template <typename It> static constexpr char hexchar(It binary_nibble) { return hex_lut[hash(binary_nibble)]; // no validation! } #endif template <typename It> static bool validate(It str) { static constexpr std::array<char, 4> vocab[] = { {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'1', '1', '1', '1'}}, {{'1', '0', '1', '1'}}, {{'1', '1', '1', '0'}}, {{'1', '0', '1', '0'}}, {{'1', '1', '0', '1'}}, {{'1', '0', '0', '1'}}, {{'1', '1', '0', '0'}}, {{'1', '0', '0', '0'}}, {{'0', '1', '1', '1'}}, {{'0', '0', '1', '1'}}, {{'0', '1', '1', '0'}}, {{'0', '0', '1', '0'}}, {{'0', '1', '0', '1'}}, {{'0', '0', '0', '1'}}, {{'0', '1', '0', '0'}}, {{'0', '0', '0', '0'}}, }; int key = hash(str); if (key <= 26 && key >= 0) return std::equal(str, str+4, vocab[key].begin()); else return false; } }; constexpr unsigned char Perfect_Hash::asso_values[]; constexpr char Perfect_Hash::hex_lut[]; #include <iostream> int main() { char buffer[4096]; while (std::cin.read(buffer, sizeof(buffer)), std::cin.gcount()) { size_t got = std::cin.gcount(); char* out = buffer; for (auto it = buffer; it < buffer+got; it += 4) *out++ = Perfect_Hash::hexchar(it); std::cout.write(buffer, got/4); } }

Demo output, for example. od -A none -to /dev/urandom | tr -cd '01' | dd bs=1 count=4096 | ./test

46f58339d660b8151c91bddf82b4096

Landmarks

I came up with three different approaches:

naive.cpp (no hacks, no libraries) ; Live demo at Godbolt
spirit.cpp (Trie) ; ~~Expand~~ on pastebin
and this answer: perfect.cpp hash ; Live demo at Godbolt

To make some comparisons, I

compiled them all with the same compiler (GCC 4.9) and flags ( -O3 -march=native -g0 -DNDEBUG )
optimized input / output, so it is not readable by four characters / writes single characters
created a large input file (1 gigabyte)

Here are the results:

Surprisingly, the naive approach from the first answer works pretty well
The spirit is really terrible here; it sets 3.4 MB / s so that the whole file takes 294 seconds (!!!). We left it with graphs.
The average throughput is ~ 720 MB / s for naive.cpp and ~ 1.14 GB / s for perfect.cpp
This makes the ideal hash approach about 50% faster than the naive approach.

* Summary I would say that the naive approach was good enough when I published it on a whim 10 hours ago. If you really want high bandwidth, the perfect hash is a good start, but consider manually applying a SIMD solution.

Binary String in C ++ Hex

I do not expect everything to be much faster than that.

Landmarks

Reading input from a stream and outputting one character every 4 bytes.

More articles: