(Edit: I added a new answer to achieve a 95% reduction in wall time.)
I made a minimal working example that illustrates the problem you are trying to solve. This is what you should always do in your questions.
Then I deleted the unsigned long long int material and replaced its uint64_t with the cstdint library. This ensures that we work with the same data size, since unsigned long long int can mean almost everything, which depends on which computer / compiler you use.
The resulting MWE looks like this:
#include <chrono> #include <cstdint> #include <cstdio> #include <deque> #include <functional> #include <iostream> #include <random> #include <unordered_map> #include <vector> typedef std::unordered_map<uint64_t, char> table_t; const int TEST_TABLE_SIZE = 10000000; void Save(const table_t &map){ std::cout<<"Save. "; const auto start = std::chrono::steady_clock::now(); FILE *f = fopen("/z/map", "wb"); for(auto iter=map.begin(); iter!=map.end(); iter++){ fwrite(&(iter->first), 8, 1, f); fwrite(&(iter->second), 1, 1, f); } fclose(f); const auto end = std::chrono::steady_clock::now(); std::cout<<"Save time = "<< std::chrono::duration<double, std::milli> (end-start).count() << " ms" << std::endl; } //Take advantage of the limited range of values to save time void SaveLookup(const table_t &map){ std::cout<<"SaveLookup. "; const auto start = std::chrono::steady_clock::now(); //Create a lookup table std::vector< std::deque<uint64_t> > lookup(256); for(auto &kv: map) lookup.at(kv.second+128).emplace_back(kv.first); //Save lookup table header FILE *f = fopen("/z/map", "wb"); for(const auto &row: lookup){ const uint32_t rowsize = row.size(); fwrite(&rowsize, 4, 1, f); } //Save values for(const auto &row: lookup) for(const auto &val: row) fwrite(&val, 8, 1, f); fclose(f); const auto end = std::chrono::steady_clock::now(); std::cout<<"Save time = "<< std::chrono::duration<double, std::milli> (end-start).count() << " ms" << std::endl; } //Take advantage of the limited range of values and contiguous memory to //save time void SaveLookupVector(const table_t &map){ std::cout<<"SaveLookupVector. "; const auto start = std::chrono::steady_clock::now(); //Create a lookup table std::vector< std::vector<uint64_t> > lookup(256); for(auto &kv: map) lookup.at(kv.second+128).emplace_back(kv.first); //Save lookup table header FILE *f = fopen("/z/map", "wb"); for(const auto &row: lookup){ const uint32_t rowsize = row.size(); fwrite(&rowsize, 4, 1, f); } //Save values for(const auto &row: lookup) fwrite(row.data(), 8, row.size(), f); fclose(f); const auto end = std::chrono::steady_clock::now(); std::cout<<"Save time = "<< std::chrono::duration<double, std::milli> (end-start).count() << " ms" << std::endl; } void Load(table_t &map){ std::cout<<"Load. "; const auto start = std::chrono::steady_clock::now(); FILE *f = fopen("/z/map", "rb"); uint64_t key; char val; while(fread(&key, 8, 1, f)){ fread(&val, 1, 1, f); map[key] = val; } fclose(f); const auto end = std::chrono::steady_clock::now(); std::cout<<"Load time = "<< std::chrono::duration<double, std::milli> (end-start).count() << " ms" << std::endl; } void Load2(table_t &map){ std::cout<<"Load with Reserve. "; map.reserve(TEST_TABLE_SIZE+TEST_TABLE_SIZE/8); const auto start = std::chrono::steady_clock::now(); FILE *f = fopen("/z/map", "rb"); uint64_t key; char val; while(fread(&key, 8, 1, f)){ fread(&val, 1, 1, f); map[key] = val; } fclose(f); const auto end = std::chrono::steady_clock::now(); std::cout<<"Load time = "<< std::chrono::duration<double, std::milli> (end-start).count() << " ms" << std::endl; } //Take advantage of the limited range of values to save time void LoadLookup(table_t &map){ std::cout<<"LoadLookup. "; map.reserve(TEST_TABLE_SIZE+TEST_TABLE_SIZE/8); const auto start = std::chrono::steady_clock::now(); FILE *f = fopen("/z/map", "rb"); //Read the header std::vector<uint32_t> inpsizes(256); for(int i=0;i<256;i++) fread(&inpsizes[i], 4, 1, f); uint64_t key; for(int i=0;i<256;i++){ const char val = i-128; for(int v=0;v<inpsizes.at(i);v++){ fread(&key, 8, 1, f); map[key] = val; } } fclose(f); const auto end = std::chrono::steady_clock::now(); std::cout<<"Load time = "<< std::chrono::duration<double, std::milli> (end-start).count() << " ms" << std::endl; } //Take advantage of the limited range of values and contiguous memory to save time void LoadLookupVector(table_t &map){ std::cout<<"LoadLookupVector. "; map.reserve(TEST_TABLE_SIZE+TEST_TABLE_SIZE/8); const auto start = std::chrono::steady_clock::now(); FILE *f = fopen("/z/map", "rb"); //Read the header std::vector<uint32_t> inpsizes(256); for(int i=0;i<256;i++) fread(&inpsizes[i], 4, 1, f); for(int i=0;i<256;i++){ const char val = i-128; std::vector<uint64_t> keys(inpsizes[i]); fread(keys.data(), 8, inpsizes[i], f); for(const auto &key: keys) map[key] = val; } fclose(f); const auto end = std::chrono::steady_clock::now(); std::cout<<"Load time = "<< std::chrono::duration<double, std::milli> (end-start).count() << " ms" << std::endl; } int main(){ //Perfectly horrendous way of seeding a PRNG, but we'll do it here for brevity auto generator = std::mt19937(12345); //Combination of my luggage //Generate values within the specified closed intervals auto key_rand = std::bind(std::uniform_int_distribution<uint64_t>(0,std::numeric_limits<uint64_t>::max()), generator); auto val_rand = std::bind(std::uniform_int_distribution<int>(std::numeric_limits<char>::lowest(),std::numeric_limits<char>::max()), generator); std::cout<<"Generating test data..."<<std::endl; //Generate a test table table_t map; for(int i=0;i<TEST_TABLE_SIZE;i++) map[key_rand()] = (char)val_rand(); //Low chance of collisions, so we get quite close to the desired size Save(map); { table_t map2; Load (map2); } { table_t map2; Load2(map2); } SaveLookup(map); SaveLookupVector(map); { table_t map2; LoadLookup (map2); } { table_t map2; LoadLookupVector(map2); } }
In the test dataset that I use, this gives me a 1982 ms write time and a read time (using source code) of 7467 ms. Reading time seemed to be the biggest bottleneck, so I created a new Load2 function that reserves enough space for unordered_map before reading. This reduced read time to 4700 ms (savings of 37%).
Change 1
Now, I note that the values of your unordered_map can only take 255 different values. That way I can easily convert unordered_map to some lookup table in RAM. That is, instead of having:
123123 1 234234 0 345345 1 237872 1
I can change the data to look like this:
0 234234 1 123123 345345 237872
What is the advantage of this? This means that I no longer need to write the value to disk. This saves 1 byte per entry in the table. Since each table entry consists of 8 bytes for the key and 1 byte for the value, this should give me 11% savings in both read time and write minus the cost of reordering memory (which, I believe, will be low because RAM).
Finally, as soon as I performed the above regrouping, if I have a lot of backup memory on the machine, I can pack everything into a vector and read / write adjacent data to disk.
Doing all this gives the following points:
Save. Save time = 1836.52 ms Load. Load time = 7114.93 ms Load with Reserve. Load time = 4277.58 ms SaveLookup. Save time = 1688.73 ms SaveLookupVector. Save time = 1394.95 ms LoadLookup. Load time = 3927.3 ms LoadLookupVector. Load time = 3739.37 ms
Please note that the transition from Save to SaveLookup gives 8% acceleration, and the transition from Load with Reserve to LoadLookup also gives 8% acceleration. This is correct according to our theory!
The use of continuous memory also gives a total of 24% acceleration compared to the original storage time and a total of 47% acceleration compared to the original load time.