What is the correct way to access BerkeleyDB with Perl?

I'm having problems using BerkeleyDB. I have several instances of the same code pointing to the same DB file repository, and everything works fine for 5-32 hours, and then a deadlock suddenly occurs. Command prompts stop right before making a db_get or db_put call or a call to create a cursor. So I just ask for the proper way to handle these calls. Here is my general layout:

Here's how to create an environment and databases:

my $env = new BerkeleyDB::Env ( -Home => "$dbFolder\\" , -Flags => DB_CREATE | DB_INIT_CDB | DB_INIT_MPOOL) or die "cannot open environment: $BerkeleyDB::Error\n"; my $unsortedHash = BerkeleyDB::Hash->new ( -Filename => "$dbFolder/Unsorted.db", -Flags => DB_CREATE, -Env => $env ) or die "couldn't create: $!, $BerkeleyDB::Error.\n"; 

One instance of this code is launched, goes to the site and saves the URLs for analysis by another instance (I have a flag set so that each database is locked when it is locked):

  $lk = $unsortedHash->cds_lock(); while(@urlsToAdd){ my $currUrl = shift @urlsToAdd; $unsortedHash->db_put($currUrl, '0'); } $lk->cds_unlock(); 

It periodically checks to see if a certain number of elements are in Unsorted:

 $refer = $unsortedHash->db_stat(); $elements = $refer->{'hash_ndata'}; 

Before adding an element to any database, it first checks all the databases to see if this element is present:

 if ($unsortedHash->db_get($search, $value) == 0){ $value = "1:$value"; }elsif ($badHash->db_get($search, $value) == 0){ $value = "2:$value"; .... 

The following code appears after, and many instances of it run in parallel. Firstly, it receives the next element in unsorted (which does not have a busy value of '1'), then sets the value to busy โ€œ1โ€, then does something with it, then completely moves the database record to another database (this is deleted from unsorted and saved to another database):

 my $pageUrl = ''; my $busy = '1'; my $curs; my $lk = $unsortedHash->cds_lock(); #lock, change status to 1, unlock ########## GET AN ELEMENT FROM THE UNSORTED HASH ####### while(1){ $busy = '1'; $curs = $unsortedHash->db_cursor(); while ($busy){ $curs->c_get($pageUrl, $busy, DB_NEXT); print "$pageUrl:$busy:\n"; if ($pageUrl eq ''){ $busy = 0; } } $curs->c_close(); $curs = undef; if ($pageUrl eq ''){ print "Database empty. Sleeping...\n"; $lk->cds_unlock(); sleep(30); $lk = $unsortedHash->cds_lock(); }else{ last; } } ####### MAKE THE ELEMENT 'BUSY' AND DOWNLOAD IT $unsortedHash->db_put($pageUrl, '1'); $lk->cds_unlock(); $lk = undef; 

And in any other place, if I call db_put or db_del in ANY database, it is wrapped with a lock as follows:

 print "\n\nBad.\n\n"; $lk = $badHash->cds_lock(); $badHash->db_put($pageUrl, '0'); $unsortedHash->db_del($pageUrl); $lk->cds_unlock(); $lk = undef; 

However, my db_get commands float freely without locking because I don't think reading requires locking.

I looked at this code a million times and the algorithm is tight. So I just wonder if I am implementing any part of this wrong action using the wrong locks, etc. Or, if there is a better way to prevent blocking (or even diagnose a deadlock) with BerkeleyDB and Strawberry Perl?

UPDATE . To be more specific, the problem occurs on a Windows 2003 server (1.5 GB RAM, not sure if this is important). I can fully run this setting on my Windows 7 machine (4 GB RAM). I also started to print blocking statistics using the following:

Adding this flag to create an environment:

 -MsgFile => "$dbFolder/lockData.txt" 

And then calling it every 60 seconds:

 my $status = $env->lock_stat_print(); print "Status:$status:\n"; 

Status always returns as 0, which is successful. Here is the latest status report:

 29 Last allocated locker ID 0x7fffffff Current maximum unused locker ID 5 Number of lock modes 1000 Maximum number of locks possible 1000 Maximum number of lockers possible 1000 Maximum number of lock objects possible 40 Number of lock object partitions 24 Number of current locks 42 Maximum number of locks at any one time 5 Maximum number of locks in any one bucket 0 Maximum number of locks stolen by for an empty partition 0 Maximum number of locks stolen for any one partition 29 Number of current lockers 29 Maximum number of lockers at any one time 6 Number of current lock objects 13 Maximum number of lock objects at any one time 1 Maximum number of lock objects in any one bucket 0 Maximum number of objects stolen by for an empty partition 0 Maximum number of objects stolen for any one partition 3121958 Total number of locks requested 3121926 Total number of locks released 0 Total number of locks upgraded 24 Total number of locks downgraded 9310 Lock requests not available due to conflicts, for which we waited 0 Lock requests not available due to conflicts, for which we did not wait 8 Number of deadlocks 1000000 Lock timeout value 0 Number of locks that have timed out 1000000 Transaction timeout value 0 Number of transactions that have timed out 792KB The size of the lock region 59 The number of partition locks that required waiting (0%) 46 The maximum number of times any partition lock was waited for (0%) 0 The number of object queue operations that required waiting (0%) 27 The number of locker allocations that required waiting (0%) 0 The number of region locks that required waiting (0%) 1 Maximum hash bucket length 

What I'm afraid of this:

 8 Number of deadlocks 

How did these deadlocks happen and how were they resolved? (all parts of the code are still running). What exactly is a dead end, in this case?

+6
source share
3 answers

In short, you need to do lock detection. I see two possibilities to do this. First, you can use the db_deadlock utility. Secondly, and perhaps more conveniently, you can specify the -LockDetect flag when opening your environment, a flag that is not fully described in Perl docs for BerkeleyDB.pm .

Both options look great to me in version 4.5.20. (What's your version , by the way?)

Now for the details.

Specifying the -LockDetect flag -LockDetect true. There are several options to choose from. I selected DB_LOCK_DEFAULT and it seems to work fine. With more details about what is happening, you can certainly get more imagination.

Running the db_deadlock utility can be done as follows:

 db_deadlock -h your/env/dir -v -t 3 # run as daemon, check every 3 seconds db_deadlock -h your/env/dir -v # run once 

Here is a quote from the db_deadlock :

This utility should be launched as a background daemon, or the basic Berkeley DB lock detection interfaces should be called in some other way, when there are several threads or processes accessing the database, and at least one of them modifies it.

I came to the conclusion that both methods work fine, repeatedly performing a test with two scriptwriters and one reader, which is delayed a couple of times, while quickly inserting new records into the database (100 per second) or through the cursor of all keys in the database.

The flag method seems to cope with deadlocks very quickly, they did not become noticeable in my tests.

On the other hand, running the db_deadlock utility with detailed output in parallel with scripts is instructive in that you see how they are locked, and then continue after the locks have been interrupted, especially in combination with the db_stat utility :

 db_stat -Cl # Locks grouped by lockers db_stat -Co # Locks grouped by object db_stat -Cp # need_dd = 1 ? db_stat -CA # all of the above plus more 

I do not have enough experience to explain all the details, but you can see that there are certain entries in blocked situations, but not in others. Also see the Berkeley DB Concurrent Storage Lock Section (what is IWRITE ?) In the Berkeley DB Programmer's Reference Guide .

You ask how these deadlocks occurred. I canโ€™t say for sure, but I see that they happen with simultaneous access. You also ask how they were resolved. I have no idea. In my test scripts, locked scripts just freeze. Maybe someone in your script ran deadlock detection without knowing it?

For completeness, your application may simply freeze because the thread did not close resources before exiting. It can happen if you just have a Ctrl-C process and there is no cleanup handler to close resources. But that is not your problem.

If this becomes your problem, you should read the section Handling Failure in Data Warehouses and Concurrent Storage Applications in the Reference Guide.

CDS and DS do not have a recovery concept. Because CDS and DS do not support transactions and do not support recovery logs, they cannot start recovery. If the database is damaged in DS or CDS, it can be deleted and restored. (Taken wordlessly from the Berkeley DB Book by Himanshu Yadava .)

Finally, there are video tutorials on the Oracle site, including one on the use of CDS by Margot Zeltser .

+3
source

However, my db_get commands float freely without locking because I don't think reading requires locking.

This assumption is incorrect. Since http://pybsddb.sourceforge.net/ref/lock/page.html says that BerkeleyDB should do a read lock on its own, because otherwise you could get undefined behavior if the reader tried to read data that was from under him. Therefore, reading can easily be part of a deadlock situation.

This is especially true in the presence of cursors. Reading cursors supports locking everything that has been read until the cursor is closed. See http://pybsddb.sourceforge.net/ref/lock/am_conv.html for ways you can get into a dead end (in fact, you can even get into a dead end).

+4
source

Although this is not a BerkeleyDB solution, you can use an alternative lock, although Win32 :: Mutex, which uses basic Windows mutexes. The following is a very simple example:

 #!perl -w use strict; use warnings; use Win32::Mutex; # from Win32::IPC my $mutex = Win32::Mutex->new(0, 'MyAppBerkeleyLock'); for (1..10) { $mutex->wait(10*1000) or die "Failed to lock mutex $!"; print "$$ has lock\n"; sleep(rand(7)); $mutex->release(); } 
+1
source

Source: https://habr.com/ru/post/886454/


All Articles