I'm having problems using BerkeleyDB. I have several instances of the same code pointing to the same DB file repository, and everything works fine for 5-32 hours, and then a deadlock suddenly occurs. Command prompts stop right before making a db_get or db_put call or a call to create a cursor. So I just ask for the proper way to handle these calls. Here is my general layout:
Here's how to create an environment and databases:
my $env = new BerkeleyDB::Env ( -Home => "$dbFolder\\" , -Flags => DB_CREATE | DB_INIT_CDB | DB_INIT_MPOOL) or die "cannot open environment: $BerkeleyDB::Error\n"; my $unsortedHash = BerkeleyDB::Hash->new ( -Filename => "$dbFolder/Unsorted.db", -Flags => DB_CREATE, -Env => $env ) or die "couldn't create: $!, $BerkeleyDB::Error.\n";
One instance of this code is launched, goes to the site and saves the URLs for analysis by another instance (I have a flag set so that each database is locked when it is locked):
$lk = $unsortedHash->cds_lock(); while(@urlsToAdd){ my $currUrl = shift @urlsToAdd; $unsortedHash->db_put($currUrl, '0'); } $lk->cds_unlock();
It periodically checks to see if a certain number of elements are in Unsorted:
$refer = $unsortedHash->db_stat(); $elements = $refer->{'hash_ndata'};
Before adding an element to any database, it first checks all the databases to see if this element is present:
if ($unsortedHash->db_get($search, $value) == 0){ $value = "1:$value"; }elsif ($badHash->db_get($search, $value) == 0){ $value = "2:$value"; ....
The following code appears after, and many instances of it run in parallel. Firstly, it receives the next element in unsorted (which does not have a busy value of '1'), then sets the value to busy โ1โ, then does something with it, then completely moves the database record to another database (this is deleted from unsorted and saved to another database):
my $pageUrl = ''; my $busy = '1'; my $curs; my $lk = $unsortedHash->cds_lock();
And in any other place, if I call db_put or db_del in ANY database, it is wrapped with a lock as follows:
print "\n\nBad.\n\n"; $lk = $badHash->cds_lock(); $badHash->db_put($pageUrl, '0'); $unsortedHash->db_del($pageUrl); $lk->cds_unlock(); $lk = undef;
However, my db_get commands float freely without locking because I don't think reading requires locking.
I looked at this code a million times and the algorithm is tight. So I just wonder if I am implementing any part of this wrong action using the wrong locks, etc. Or, if there is a better way to prevent blocking (or even diagnose a deadlock) with BerkeleyDB and Strawberry Perl?
UPDATE . To be more specific, the problem occurs on a Windows 2003 server (1.5 GB RAM, not sure if this is important). I can fully run this setting on my Windows 7 machine (4 GB RAM). I also started to print blocking statistics using the following:
Adding this flag to create an environment:
-MsgFile => "$dbFolder/lockData.txt"
And then calling it every 60 seconds:
my $status = $env->lock_stat_print(); print "Status:$status:\n";
Status always returns as 0, which is successful. Here is the latest status report:
29 Last allocated locker ID 0x7fffffff Current maximum unused locker ID 5 Number of lock modes 1000 Maximum number of locks possible 1000 Maximum number of lockers possible 1000 Maximum number of lock objects possible 40 Number of lock object partitions 24 Number of current locks 42 Maximum number of locks at any one time 5 Maximum number of locks in any one bucket 0 Maximum number of locks stolen by for an empty partition 0 Maximum number of locks stolen for any one partition 29 Number of current lockers 29 Maximum number of lockers at any one time 6 Number of current lock objects 13 Maximum number of lock objects at any one time 1 Maximum number of lock objects in any one bucket 0 Maximum number of objects stolen by for an empty partition 0 Maximum number of objects stolen for any one partition 3121958 Total number of locks requested 3121926 Total number of locks released 0 Total number of locks upgraded 24 Total number of locks downgraded 9310 Lock requests not available due to conflicts, for which we waited 0 Lock requests not available due to conflicts, for which we did not wait 8 Number of deadlocks 1000000 Lock timeout value 0 Number of locks that have timed out 1000000 Transaction timeout value 0 Number of transactions that have timed out 792KB The size of the lock region 59 The number of partition locks that required waiting (0%) 46 The maximum number of times any partition lock was waited for (0%) 0 The number of object queue operations that required waiting (0%) 27 The number of locker allocations that required waiting (0%) 0 The number of region locks that required waiting (0%) 1 Maximum hash bucket length
What I'm afraid of this:
8 Number of deadlocks
How did these deadlocks happen and how were they resolved? (all parts of the code are still running). What exactly is a dead end, in this case?