Postgres freezes on a large table when trying to update multiple rows with a CASE statement

I am trying to update multiple lines using postgres, I am using this code:

UPDATE foobar SET column_a = CASE WHEN column_b = '123' THEN 1 WHEN column_b = '345' THEN 2 END; 

If I create a new table, it works flawlessly, but when I run this on a large table with 8 million rows, it hangs endlessly. I tried first in Admineer (web interface) as well as in console.

However, this works just fine:

 UPDATE foobar SET column_a=1 WHERE column_b='123'; 

I hesitate to implement this approach in my code, since I will have several thousand updates at once, and I would prefer to put them in one operator. Any ideas as to why the first example hangs postgres and the second will work fine? I just double checked, and I don't have the rules applicable to the table.

+4
source share
2 answers

The problem was ..

Statement:

 CASE WHEN column_b = '123' THEN 1 WHEN column_b = '345' THEN 2 END; 

.. just for:

 CASE WHEN column_b = '123' THEN 1 WHEN column_b = '345' THEN 2 ELSE NULL END 

Meaning, without a WHERE , your UPDATE not just β€œtrying”, it actually updates every row in the table, most of them are NULL .
Perhaps the NOT NULL on the column prevented data loss ...

The best decision:

I will have several thousand updates at once, and I would prefer to put them in one operator.

Much faster (and shorter) for large sets:

 UPDATE foobar f SET column_a = val.a FROM ( VALUES (123, 1) ,(345, 2) ) val(b, a) WHERE f.column_b = val.b 

Joining a set is superior to iterating through a long list of CASE branches for each row. The difference will grow rapidly with longer listings.

Also, be sure to indicate the index on column_b anyway.

You can replace the VALUES expression with any table, view, or subquery, giving the appropriate rows.

Note:
I assume that column_a and column_b are of type integer . In this case, single quotes around '123' in your question have never been helpful. You are better off using a numeric literal instead of a string literal. (Even though it also works with string literals.)

A string literal of type '123' defaults to unknown .
A numeric literal, such as 123 , defaults to integer - or bigint / numeric if the number is too large.

If you were dealing with non-standard data types, you will have to explicitly specify. It will look like this:

 ... FROM ( VALUES ('123'::sometype, '1'::sometype) -- first row defines row type ,('345', '2') ) val(b, a) ... 
+5
source

I ask this question in case anyone encounters this problem.

This request was the culprit:

 UPDATE foobar SET column_a = CASE WHEN column_b = '123' THEN 1 WHEN column_b = '345' THEN 2 END; 

The problem is that it does not have a WHERE statement, so it tries to update all rows. With large databases this can be a problem, in my case it is just timed. As soon as I added a where expression there, he fixed the problem.

Here's the solution:

 UPDATE foobar SET column_a = CASE WHEN column_b = '123' THEN 1 WHEN column_b = '345' THEN 2 END WHERE column_b IN ('123','345') 
+2
source

Source: https://habr.com/ru/post/1502144/


All Articles