Redshift: serializable isolation error (1023) despite LOCK

I perform several ETL batch operations in parallel in my Redshift cluster.

My pipeline does the following:

Do a bunch of things on a temporary staging table. In the end, go to the final table (constant and shared through the process) by doing:

BEGIN; LOCK table X; DELETE FROM X USING stage_table... INSERT INTO X ... END; 

However, when I have several processes in parallel, some fail:

ERROR: 1023 DETAILS: Insulation failure after serialization on the table - 142443, operation> loop formation: 388224, 388226 (pid: 32012)

(where 142443 is table X)

When I start the process one by one, everything works like a charm. I used the lock with success on other processes (and verified that it worked as intended), so I am puzzled here. Any help appreciated!

+6
source share
2 answers

Expected. The transaction isolation level used by Redshift is SERIALIZABLE, as clearly indicated in AWS Doc

Note: READ WHAT IS HAPPENING, READ MANDATORY and REPEAT READ, do not have operational impact and map on SERIALIZABLE in Amazon RedBSD.

Specifically, this means that if you execute parallel SQL queries that are not SERIALIZABLE (can be executed in any order without a difference in the results), you will get an isolation level error.

By the way, Redshift gives you tools to figure out which queries contradict each other. Using the numbers you received in the above log message, you can request the following:

 select query, trim(querytxt) as sqlquery from stl_query where xid = 388224; 

388224 transaction_id that form the loop.

+4
source

Lock your stage_table since the lines affected by DELETE depend on the contents of stage_table.

0
source

Source: https://habr.com/ru/post/987291/


All Articles