C ++ slow read / search

In my program, I read in the file (here is only a test file of 200,000 data points, after which there will be millions.) Now I do this:

for (int i=0;i<n;i++) { fid.seekg(4,ios_base::cur); fid.read((char*) &x[i],8); fid.seekg(8,ios_base::cur); fid.read((char*) &y[i],8); fid.seekg(8,ios_base::cur); fid.read((char*) &z[i],8); fid.read((char*) &d[i],8); d[i] = (d[i] - p)/p; z[i] *= cc; } 

Moreover, n denotes the number of points to be read.

Subsequently, I write them again using

 for(int i=0;i<n;i++){ fid.write((char*) &d[i],8); fid.write((char*) &z[i],8); temp = (d[i] + 1) * p; fid.write((char*) &temp,8); } 

In this case, recording is faster than the reading. (time measured using clock_t)

My question is now. I made some pretty stupid mistake while reading or can this be expected?

I am using Win XP with a magnetic disk.

your magu _

+4
source share
3 answers

You use seekg too often. I see that you use it to skip bytes, but you can also read the full buffer and then skip the bytes in the buffer:

 char buffer[52]; for (int i=0;i<n;i++) { fid.read(buffer, sizeof(buffer)); memcpy(&x[i], &buffer[4], sizeof(x[i])); memcpy(&y[i], &buffer[20], sizeof(y[i])); // etc } 

However, you can define a structure that represents the data in your file:

 #pragma pack(push, 1) struct Item { char dummy1[4]; // skip 4 bytes __int64 x; char dummy2[8]; // skip 8 bytes __int64 y; char dummy3[8]; // skip 8 bytes __int64 z; __int64 d; }; #pragma pack(pop) 

then declare an array of these structures and immediately read all the data:

 Item* items = new Item[n]; fid.read(items, n * sizeof(Item)); // read all data at once will be amazing fast 

(note: I do not know the types x , y , z and d , so I assume __int64 here)

+13
source

I personally (at least) would do this:

 for (int i=0;i<n;i++) { char dummy[8]; fid.read(dummy,4); fid.read((char*) &x[i],8); fid.read(dummy,8); fid.read((char*) &y[i],8); fid.read(dummy,8); fid.read((char*) &z[i],8); fid.read((char*) &d[i],8); d[i] = (d[i] - p)/p; z[i] *= cc; } 

Executing a structure or reading large amounts of data at a time (say adding a second level where you read 4 KB at a time), and then using a couple of functions that β€œskip” and β€œextract” different fields will be a bit more work, but probably , much faster).

Another option is to use mmap on Linux or MapViewOfFile on Windows. This method reduces the overhead of reading a file by a small portion, since fewer copies are required to transfer data to the application.

Edit: I have to add β€œMake sure you are making comparative measurements,” and if your application is designed to run on many machines, make sure you are taking measurements on multiple machines, with different alternatives for the drive, processor, and memory. You really do not want to configure the code so that it runs on your computer 50% faster, but 25% slower on another machine.

+4
source

Assert () statements are the most important part of this code, so if your platform ever changes and the width of your native types changes, statements will fail. Instead of looking, I would read in a fictitious area. P * variables make it easy to read code, IMO.

 assert(sizeof x[0] == 8); assert(sizeof y[0] == 8); assert(sizeof z[0] == 8); assert(sizeof d[0] == 8); for (int i=0;i<n;i++) { char unused[8]; char * px = (char *) &x[i]; char * py = (char *) &y[i]; char * pz = (char *) &z[i]; char * pd = (char *) &d[i]; fid.read(unused, 4); fid.read(px, 8); fid.read(unused, 8); fid.read(py, 8); fid.read(unused, 8); fid.read(pz, 8); fid.read(pd, 8); d[i] = (d[i] - p)/p; z[i] *= cc; } 
+1
source

Source: https://habr.com/ru/post/1483164/


All Articles