As stated in other answers, it might be a good idea to build a file map. The way I do it (in pseudo-code) will be:
let offset be a unsigned 64 bit int =0; for each line in the file read the line write offset to a binary file (as 8 bytes rather as chars) offset += length of line in bytes
Now you have a βMapβ file, which is a list of 64-bit ints (one for each line in the file). To read the map, you simply calculate where the record for the line is located on the map:
offset = desired_line_number * 8 // where line number starts at 0 offset2 = (desired_line_number+1) * 8 data_position1 = load bytes [offset through offset + 8] as a 64bit int from map data_position2 = load bytes [offset2 through offset2 + 8] as a 64bit int from map data = load bytes[data_position1 through data_position2-1] as a string from data.
The idea is that you read the data file once and write the byte offset in the file where each line begins, and then store the offsets in sequence in a binary file using an integer type of a fixed size. The map file must be size number_of_lines * sizeof(integer_type_used) . Then you just need to search for the map file, calculating the offset where you saved the line number offset, and read that offset, as well as the offset of the next lines. From there you have the number range in bytes where your data should be located.
Example:
Data:
hello\n world\n (\n newline at end of file)
Create a map.
Map: each grouping [number] will represent a length of 8 bytes in a file
[0][7][14] //or in binary 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000111 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00001110
Now say I need line 2:
line offset = 2-1 * 8 // offset is 8
So, since we are using the base system 0, which will be the 9th byte in the file. Thus, the number consists of bytes 9-17, which:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000111 //or as decimal 7
So now we know that the output line should start at offset 7 in our data file (this offset is the base 1, it would be 6 if we started the count from 0).
Then we perform the same process to get the initial offset of the next line, which is 14.
Finally, we look at the range of bytes 7-14 (base 1, 6-13 base 0) and save this as a string and get world\n .
C ++ implementation:
#include <iostream>