UPDATE Added comparison and tests at the end
Here is another example based on an ideal hash. The ideal hash was generated using gperf (as described here: Is it possible to match a string with int faster than using hashmap? ).
I optimized even more by moving the local statics function to the side and marking hexdigit() and hash() as constexpr . This eliminates unnecessary initialization overhead and gives the compiler a full room for optimizing /
I do not expect everything to be much faster than that.
You can try to read, for example. 1024 blocks at a time, if possible, and give the compiler the ability to vectorize operations using the AVX / SSE instruction sets. (I did not check the generated code to see if this would happen.)
Full sample code for translating std::cin to std::cout in streaming mode:
#include <iostream> int main() { char buffer[4096]; while (std::cin.read(buffer, sizeof(buffer)), std::cin.gcount()) { size_t got = std::cin.gcount(); char* out = buffer; for (auto it = buffer; it < buffer+got; it += 4) *out++ = Perfect_Hash::hexchar(it); std::cout.write(buffer, got/4); } }
Here's the Perfect_Hash class, slightly edited and extended with hexchar search. Note that it checks the input in DEBUG strings using assert :
Live on coliru
#include <array> #include <algorithm> #include <cassert> class Perfect_Hash { /* C++ code produced by gperf version 3.0.4 */ /* Command-line: gperf -L C++ -7 -C -E -m 100 table */ /* Computed positions: -k'1-4' */ /* maximum key range = 16, duplicates = 0 */ private: static constexpr unsigned char asso_values[] = { 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 15, 7, 3, 1, 0, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27}; template <typename It> static constexpr unsigned int hash(It str) { return asso_values[(unsigned char)str[3] + 2] + asso_values[(unsigned char)str[2] + 1] + asso_values[(unsigned char)str[1] + 3] + asso_values[(unsigned char)str[0]]; } static constexpr char hex_lut[] = "???????????fbead9c873625140"; public: #ifdef DEBUG template <typename It> static char hexchar(It binary_nibble) { assert(Perfect_Hash::validate(binary_nibble)); // for DEBUG only return hex_lut[hash(binary_nibble)]; // no validation! } #else template <typename It> static constexpr char hexchar(It binary_nibble) { return hex_lut[hash(binary_nibble)]; // no validation! } #endif template <typename It> static bool validate(It str) { static constexpr std::array<char, 4> vocab[] = { {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'1', '1', '1', '1'}}, {{'1', '0', '1', '1'}}, {{'1', '1', '1', '0'}}, {{'1', '0', '1', '0'}}, {{'1', '1', '0', '1'}}, {{'1', '0', '0', '1'}}, {{'1', '1', '0', '0'}}, {{'1', '0', '0', '0'}}, {{'0', '1', '1', '1'}}, {{'0', '0', '1', '1'}}, {{'0', '1', '1', '0'}}, {{'0', '0', '1', '0'}}, {{'0', '1', '0', '1'}}, {{'0', '0', '0', '1'}}, {{'0', '1', '0', '0'}}, {{'0', '0', '0', '0'}}, }; int key = hash(str); if (key <= 26 && key >= 0) return std::equal(str, str+4, vocab[key].begin()); else return false; } }; constexpr unsigned char Perfect_Hash::asso_values[]; constexpr char Perfect_Hash::hex_lut[]; #include <iostream> int main() { char buffer[4096]; while (std::cin.read(buffer, sizeof(buffer)), std::cin.gcount()) { size_t got = std::cin.gcount(); char* out = buffer; for (auto it = buffer; it < buffer+got; it += 4) *out++ = Perfect_Hash::hexchar(it); std::cout.write(buffer, got/4); } }
Demo output, for example. od -A none -to /dev/urandom | tr -cd '01' | dd bs=1 count=4096 | ./test
46f58339d660b8151c91bddf82b4096
Landmarks
I came up with three different approaches:
To make some comparisons, I
- compiled them all with the same compiler (GCC 4.9) and flags (
-O3 -march=native -g0 -DNDEBUG ) - optimized input / output, so it is not readable by four characters / writes single characters
- created a large input file (1 gigabyte)
Here are the results:

- Surprisingly, the
naive approach from the first answer works pretty well - The spirit is really terrible here; it sets 3.4 MB / s so that the whole file takes 294 seconds (!!!). We left it with graphs.
- The average throughput is ~ 720 MB / s for naive.cpp and ~ 1.14 GB / s for perfect.cpp
- This makes the ideal hash approach about 50% faster than the naive approach.
* Summary I would say that the naive approach was good enough when I published it on a whim 10 hours ago. If you really want high bandwidth, the perfect hash is a good start, but consider manually applying a SIMD solution.