Postgres 9.3: problem with Sharelock with simple INSERT

Update: potential solution below

I have a large corpus of configuration files consisting of key / value pairs that I am trying to insert into the database. Many keys and values ​​are repeated in configuration files, so I store data using 3 tables. One for all unique key values, one for all unique key values ​​and one list of all key / value pairs for each file.

Problem: I use several parallel processes (and therefore connections) to add raw data to the database. Unfortunately, I get a lot of deadlocks when trying to add values ​​to the key and value tables. I tried several different ways to insert data (shown below), but always ends up with a deadlock error

TransactionRollbackError: deadlock detected
DETAILS: Process 26755 is waiting for ShareLock on transaction 689456; blocked by process 26754. Process 26754 is waiting for ShareLock on transaction 689467; blocked process 26755.

I was wondering if anyone could shed light on what might cause these deadlocks, and maybe point me to some way to solve the problem. Looking at the SQL queries that I use (listed below), I really don't understand why there is any kind of joint dependency.

Thank you for reading!

Example configuration file:

example_key this_is_the_value other_example other_value third example yet_another_value 

Table Definitions:

  CREATE TABLE keys ( id SERIAL PRIMARY KEY, hash UUID UNIQUE NOT NULL, key TEXT); CREATE TABLE values ( id SERIAL PRIMARY KEY, hash UUID UNIQUE NOT NULL, key TEXT); CREATE TABLE keyvalue_pairs ( id SERIAL PRIMARY KEY, file_id INTEGER REFERENCES filenames, key_id INTEGER REFERENCES keys, value_id INTEGER REFERENCES values); 

SQL expressions:

I initially tried to use this statement to avoid any exceptions:

  WITH s AS ( SELECT id, hash, key FROM keys WHERE hash = 'hash_value'; ), i AS ( INSERT INTO keys (hash, key) SELECT 'hash_value', 'key_value' WHERE NOT EXISTS (SELECT 1 FROM s) returning id, hash, key ) SELECT id, hash, key FROM i UNION ALL SELECT id, hash, key FROM s; 

But even something as simple as this causes dead ends:

  INSERT INTO keys (hash, key) VALUES ('hash_value', 'key_value') RETURNING id; 
  • In both cases, if I get an exception because the inserted hash value is not unique, I use savepoints to roll back the changes and another expression to just select the identifier that I follow.
  • I use hashes for a unique field, as some of the keys and values ​​are too long for indexing

Full python code example (using psycopg2) with savepoints:

 key_value = 'this_key' hash_val = generate_uuid(value) try: cursor.execute( ''' SAVEPOINT duplicate_hash_savepoint; INSERT INTO keys (hash, key) VALUES (%s, %s) RETURNING id; ''' (hash_val, key_value) ) result = cursor.fetchone()[0] cursor.execute('''RELEASE SAVEPOINT duplicate_hash_savepoint''') return result except psycopg2.IntegrityError as e: cursor.execute( ''' ROLLBACK TO SAVEPOINT duplicate_hash_savepoint; ''' ) #TODO: Should ensure that values match and this isn't just #a hash collision cursor.execute( ''' SELECT id FROM keys WHERE hash=%s LIMIT 1; ''' (hash_val,) ) return cursor.fetchone()[0] 

Update: Therefore, I believe that I hinted at another stackexchange site:

In particular:

The UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target lines: they will only find target lines that have been committed since the time1 command was run. However, such a target row may already be updated (or deleted or blocked) by another parallel transaction by the time it is found. In this case, the potential updater will wait for the first transaction update to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects and the second updater can continue updating the originally found line. If the first updater completes, the second updater will ignore the row if the first updater has deleted it2, otherwise it will try to apply its operation to the updated version of the row.

While I'm still not entirely sure where the co-dependency is, it seems that processing a large number of key / value pairs without commits can lead to something like this. Of course, if I commit after adding each individual configuration file, a deadlock will not occur.

+6
source share
1 answer

It looks like you are in this situation:

  • The INSERT in table has a primary key (or a unique index of any type).
  • Multiple INSERTs to this table are performed within a single transaction (as opposed to committing immediately after each)
  • Rows to insert at random (in relation to the primary key)
  • Rows are inserted into concurrent transactions.

This situation creates the following opportunity for a dead end:

Assuming there are two sessions, each of them started a transaction.

  • Session number 1: insert a row with PK 'A'
  • Session No. 2: insert a row with PK 'B'
  • Session # 1: try inserting a line with PK 'B' => Session # 1 is waiting until session # 2 completes or rolls back
  • Session number 2: try inserting a line with PK 'A' => Session number 2 is waiting for session # to wait.

Shortly after this, the lock detector finds out that both sessions are now waiting for each other, and terminates one of them with a fatal error of a deadlock detected .

If you are in this scenario, the easiest solution is to COMMIT after adding a new record before trying to insert any new row into the table.

+7
source

Source: https://habr.com/ru/post/982096/


All Articles