Very fast Perl searches: are reload hash values ​​possible?

I have about 100 million lines, for example:

A : value of A B : value of B | | | Z : value of Z upto 100 million unique entries 

Currently, every time I run my program, I upload the entire file as a hash, which takes some time. During runtime, I need access to the value A, B if I know A, B, etc.

I am wondering if I can make a hash once and save it as a binary data structure or index a file. What would be possible in perl with the least programming.

Thanks! -Abhi

+4
source share
3 answers

I suggest a key / value database on disk. Thanks to the Perl binding function , they can be used the same way with regular in-memory hashes. They will be faster than Perl read / write hashes if your hash is very large and they automatically save / load to disk.

BerkeleyDB is an old favorite:

 use BerkeleyDB; # Make %db an on-disk database stored in database.dbm. Create file if needed tie my %db, 'BerkeleyDB::Hash', -Filename => "database.dbm", -Flags => DB_CREATE or die "Couldn't tie database: $BerkeleyDB::Error"; $db{foo} = 1; # get value print $db{foo}, "\n"; # set value for my $key (keys %db) { print "$key -> $db{$key}\n"; # iterate values } %db = (); # wipe 

Changes to the database are automatically saved to disk and will be saved through several calls to your script.

Check perldoc parameters for parameters, but the most important are:

 # Increase memory allocation for database (increases performance), eg 640 MB tie my %db, 'BerkeleyDB::Hash', -Filename => $filename, -CacheSize => 640*1024*1024; # Open database in readonly mode tie my %db, 'BerkeleyDB::Hash', -Filename => $filename, -Flags => DB_RDONLY; 

A more sophisticated, but much faster database library will be Tokyo Cabinet , and of course there are many other possibilities (after all, this is Perl ...)

+9
source

Take a look at Storable - it should do what you need and is extremely easy to use:

 use Storable; store \%table, 'file'; $hashref = retrieve('file'); 

This helps if your program is actually limited by processor speed, of course. Because your data structure is very simple, you can analyze it faster than reading it from disk. In this case, Storable will not help you.

+7
source

I recommend using Tie :: File , since it is included in the kernel, and also does not load your entire data structure into memory, but access individual records from disk as needed.

+1
source

Source: https://habr.com/ru/post/1397868/


All Articles