The file does not fit into memory. This is over 100 GB, and I want to access certain lines by line number. I do not want to read line by line until I get to it.
I read http://docstore.mik.ua/orelly/perl/cookbook/ch08_09.htm
When I built the index using the following methods, the row return works to a specific point. As soon as the line number is very large, the returned line will be the same. When I go to a specific line in the file, the same line is returned. This seems to work for line numbers from 1 to 350,000 (approximately);
sub build_index {
my $data_file = shift;
my $index_file = shift;
my $offset = 0;
while (<$data_file>) {
print $index_file pack("N", $offset);
$offset = tell($data_file);
}
}
sub line_with_index {
my $data_file = shift;
my $index_file = shift;
my $line_number = shift;
my $size;
my $i_offset;
my $entry;
my $d_offset;
$size = length(pack("N", 0));
$i_offset = $size * ($line_number-1);
seek($index_file, $i_offset, 0) or return;
read($index_file, $entry, $size);
$d_offset = unpack("N", $entry);
seek($data_file, $d_offset, 0);
return scalar(<$data_file>);
}
DB_file, , , , . , , DB_RECNO , . " Tie ?