Reading a specific line by line number in a very large file

The file does not fit into memory. This is over 100 GB, and I want to access certain lines by line number. I do not want to read line by line until I get to it.

I read http://docstore.mik.ua/orelly/perl/cookbook/ch08_09.htm

When I built the index using the following methods, the row return works to a specific point. As soon as the line number is very large, the returned line will be the same. When I go to a specific line in the file, the same line is returned. This seems to work for line numbers from 1 to 350,000 (approximately);

 # usage: build_index(*DATA_HANDLE, *INDEX_HANDLE) 
    sub build_index {
        my $data_file  = shift;
        my $index_file = shift;
        my $offset     = 0;

        while (<$data_file>) {
            print $index_file pack("N", $offset);
            $offset = tell($data_file);
        }
    }

    # usage: line_with_index(*DATA_HANDLE, *INDEX_HANDLE, $LINE_NUMBER)
    # returns line or undef if LINE_NUMBER was out of range
    sub line_with_index {
        my $data_file   = shift;
        my $index_file  = shift;
        my $line_number = shift;

        my $size;               # size of an index entry
        my $i_offset;           # offset into the index of the entry
        my $entry;              # index entry
        my $d_offset;           # offset into the data file

        $size = length(pack("N", 0));
        $i_offset = $size * ($line_number-1);
        seek($index_file, $i_offset, 0) or return;
        read($index_file, $entry, $size);
        $d_offset = unpack("N", $entry);
        seek($data_file, $d_offset, 0);
        return scalar(<$data_file>);
    }

DB_file, , , , . , , DB_RECNO , . " Tie ?

0
1

pack N 32- . 32- 4 , , 100 .

64- . j.

32- . tell , 8 388 608 . F.

:

use Config qw( %Config );
my $off_t = $Config{lseeksize} > $Config{ivsize} ? 'F' : 'j';

...
print $index_file pack($off_t, $offset);
...

. , Perl, (, , , ). , .

+4

Source: https://habr.com/ru/post/1536795/


All Articles