Postgres - Running a large jsonb column

We use the postgres jsonb type in one of our DB tables. The table structure is shown below:

CREATE TABLE T ( id UUID NOT NULL PRIMARY KEY, payload JSONB ); CREATE INDEX ON T USING gin (payload jsonb_path_ops); 

Payload is a complex json string. The following is one example:

 { "business": { "taxId": "626642071", "legalName": "Jon Deli", "phoneNumbers": [ { "phoneType": "Business", "telephoneNumber": "8384407555" }, { "phoneType": "Work", "telephoneNumber": "6032255248" } ], "addresses": [ { "city": "San Francisco", "state": "CA", "postalCode": "94101", "countryCode": "USA", "addressLine1": "123 Market St" } ] }, "stakeholders": [ { "person": { "taxId": "540646815", "firstName": "GdXFouh", "lastName": "IlUAcgCGz", "dateOfBirth": "1980-12-11", "emailAddress": " jywxsijgix@qaqmlz.com ", "phoneNumbers": [ { "phoneType": "Mobile", "telephoneNumber": "4901371573" } ], "addresses": [ { "city": "San Francisco", "state": "CA", "postalCode": "94101", "countryCode": "USA", "addressLine1": "123 Market St" } ] } } ] } 

Note that phoneNumbers , addresses and stakeholders are arrays, which means that there can be several elements in the array.

I am trying to insert a million rows into a table. Each Payload field is randomly generated. Initially, the testing program is very fast. But after inserting about 800,000 lines, it gets stuck every 1000 lines - insert 1000 lines, then the testing program hangs for 2 minutes, then it comes back and inserts another 1000 lines, ...

We suspect this is due to the huge number of jsonb index updates. Since the index for a single row is updated many fields. We just want to confirm that someone has encountered the same problem.


In fact, we do not need to index the entire Payload column. Only certain fields are required: business->taxId , business->phoneNumbers-> telephoneNumber , stakeholders->person->taxId and stakeholders->person->emailAddress .

I tried the following two indexes:

 CREATE INDEX ON T USING gin ((payload->'business'->'taxId') jsonb_path_ops); CREATE INDEX ON T USING gin ((payload ->'stakeholders'->'person'->'taxId') jsonb_path_ops); 

And run two statements:

 explain select * from T where payload->'business'->'taxId' @> '"123456789"'; (1) explain select * from T where payload->'stakeholders'->'person'->'taxId' @> '"123456789"'; (2) 

The first statement uses an index. But the second one does a full table scan, which is very slow. Therefore, we refer to the index of the entire Payload column.


Any suggestion is welcome.

By the way, we are using Postgres 9.5.4.

+5
source share
1 answer

Your request:

 select * from T where payload->'stakeholders'->'person'->'taxId' @> '"123456789"'; 

does not work. This is because β€œstakeholders” are arrays. Completed request:

 select * from T where payload->'stakeholders' @> '[{"person": {"taxId": "54"}}]'::jsonb 

But in this case, postgres can use the usage index for all interested parties.

  QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on t (cost=1388.08..1425.90 rows=10 width=36) (actual time=1.959..1.959 rows=1 loops=1) Recheck Cond: ((payload -> 'stakeholders'::text) @> '[{"person": {"taxId": "54"}}]'::jsonb) Heap Blocks: exact=1 -> Bitmap Index Scan on t_expr_idx3 (cost=0.00..1388.08 rows=10 width=0) (actual time=1.946..1.946 rows=1 loops=1) Index Cond: ((payload -> 'stakeholders'::text) @> '[{"person": {"taxId": "54"}}]'::jsonb) Planning time: 0.071 ms Execution time: 1.978 ms 

To use a more specific index, I use a modified approach: How do you create a Postgresql JSONB array in an array index?

 CREATE OR REPLACE FUNCTION extract_taxids(a_json jsonb). RETURNS jsonb AS $BODY$ SELECT jsonb_agg(j) FROM (SELECT jsonb_array_elements(a_json->'stakeholders')->'person'->'taxId' AS j) AS j $BODY$ LANGUAGE sql IMMUTABLE; CREATE INDEX ON T USING gin (extract_taxids(payload)); 

And vuala:

 EXPLAIN ANALYZE select * from T where extract_taxids(payload) @> '["54"]'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on t (cost=12.08..52.38 rows=10 width=36) (actual time=0.101..0.102 rows=1 loops=1) Recheck Cond: (extract_taxids(payload) @> '["54"]'::jsonb) Heap Blocks: exact=1 -> Bitmap Index Scan on t_extract_taxids_idx (cost=0.00..12.07 rows=10 width=0) (actual time=0.008..0.008 rows=1 loops=1) Index Cond: (extract_taxids(payload) @> '["54"]'::jsonb) Planning time: 0.128 ms Execution time: 0.117 ms 
+3
source

Source: https://habr.com/ru/post/1263712/


All Articles