We use the postgres jsonb type in one of our DB tables. The table structure is shown below:
CREATE TABLE T ( id UUID NOT NULL PRIMARY KEY, payload JSONB ); CREATE INDEX ON T USING gin (payload jsonb_path_ops);
Payload is a complex json string. The following is one example:
{ "business": { "taxId": "626642071", "legalName": "Jon Deli", "phoneNumbers": [ { "phoneType": "Business", "telephoneNumber": "8384407555" }, { "phoneType": "Work", "telephoneNumber": "6032255248" } ], "addresses": [ { "city": "San Francisco", "state": "CA", "postalCode": "94101", "countryCode": "USA", "addressLine1": "123 Market St" } ] }, "stakeholders": [ { "person": { "taxId": "540646815", "firstName": "GdXFouh", "lastName": "IlUAcgCGz", "dateOfBirth": "1980-12-11", "emailAddress": " jywxsijgix@qaqmlz.com ", "phoneNumbers": [ { "phoneType": "Mobile", "telephoneNumber": "4901371573" } ], "addresses": [ { "city": "San Francisco", "state": "CA", "postalCode": "94101", "countryCode": "USA", "addressLine1": "123 Market St" } ] } } ] }
Note that phoneNumbers , addresses and stakeholders are arrays, which means that there can be several elements in the array.
I am trying to insert a million rows into a table. Each Payload field is randomly generated. Initially, the testing program is very fast. But after inserting about 800,000 lines, it gets stuck every 1000 lines - insert 1000 lines, then the testing program hangs for 2 minutes, then it comes back and inserts another 1000 lines, ...
We suspect this is due to the huge number of jsonb index updates. Since the index for a single row is updated many fields. We just want to confirm that someone has encountered the same problem.
In fact, we do not need to index the entire Payload column. Only certain fields are required: business->taxId , business->phoneNumbers-> telephoneNumber , stakeholders->person->taxId and stakeholders->person->emailAddress .
I tried the following two indexes:
CREATE INDEX ON T USING gin ((payload->'business'->'taxId') jsonb_path_ops); CREATE INDEX ON T USING gin ((payload ->'stakeholders'->'person'->'taxId') jsonb_path_ops);
And run two statements:
explain select * from T where payload->'business'->'taxId' @> '"123456789"'; (1) explain select * from T where payload->'stakeholders'->'person'->'taxId' @> '"123456789"'; (2)
The first statement uses an index. But the second one does a full table scan, which is very slow. Therefore, we refer to the index of the entire Payload column.
Any suggestion is welcome.
By the way, we are using Postgres 9.5.4.