Column values ​​in a PostgreSQL column

In a table with rows> 100k, how can I effectively shuffle the values ​​of a specific column?

Table definition:

CREATE TABLE person
(
  id integer NOT NULL,
  first_name character varying,
  last_name character varying,
 CONSTRAINT person_pkey PRIMARY KEY (id)
)

To anonymize the data, I have to shuffle the values ​​of the "first_name" column in place (I am not allowed to create a new table).

My attempt:

with
first_names as (
select row_number() over (order by random()),
       first_name as new_first_name
from person
),
ids as (
select row_number() over (order by random()), 
       id as ref_id
from person
)
update person
set first_name = new_first_name
from first_names, ids
where id = ref_id;

It takes a few hours.

Is there an effective way to do this?

+4
source share
2 answers

The problem with postgres - each update means delete+insert

  • You can check the analysis with SELECTinstead UPDATEto find out what CTE performance is.
  • ,
  • , , , .

.

CREATE TABLE new_table AS 
     SELECT * ....


DROP oldtable;

Rename new_table to old_table

CREATE index and constrains

, : (

EDIT: a_horse_with_no_name

,

with
first_names as (
    select row_number() over (order by random()) rn,
           first_name as new_first_name
    from person
),
ids as (
    select row_number() over (order by random()) rn, 
           id as ref_id
    from person
)
update person
set first_name = new_first_name
from first_names
join ids
  on first_names.rn = ids.rn
where id = ref_id;

, ANALYZE / EXPLAIN.

+4

5 , 500 000 :

with names as (
  select id, first_name, last_name,
         lead(first_name) over w as first_1,
         lag(first_name) over w as first_2
  from person
  window w as (order by random())
)
update person
  set first_name = coalesce(first_1, first_2)
from names 
where person.id = names.id;

, "" . , .

, , , .

SQLFiddle: http://sqlfiddle.com/#!15/15713/1

, ""

+2

Source: https://habr.com/ru/post/1614661/


All Articles