Postgresql Offset Behavior with json column

Using postgresql 9.4, we have a simple contact table with (id text not null (as pk), blob json) to experiment with porting the couchdb crm database. In the end, we will divide into more columns, etc., and process the data more ideally for rdbms, but even more so for now.

There are about 100 thousand lines.

I know that hardcore post-congress experts don't recommend using bias, but I can accept a slight performance limitation (happy with something less than 100 ms)

SELECT id FROM couchcontacts OFFSET 10000 LIMIT 10 

As expected, <10ms

 SELECT blob->>'firstName' FROM couchcontacts LIMIT 10 

Also takes <10 ms (suppose 10 json decode ops on blob column here)

 SELECT blob->>'firstName' FROM couchcontacts OFFSET 10000 LIMIT 10 

Takes more than 10 seconds! There is sideshift inefficiency, why is this supposedly causing 10,010 json decode ops? Since the projection has no side effects, I don’t understand why it cannot be quick?

Is this limitation of json functionality relatively new to postgres? and, therefore, the inability to determine the operand ->> gives no side effects?

An interesting rewrite of the request for this returns it less than 10 million seconds

 SELECT jsonblob->>'firstName' FROM couchdbcontacts WHERE id IN (SELECT id FROM couchcontacts OFFSET 10000 LIMIT 10) 

Is there a way to provide bias, not json decodes the biased records? (i.e. do not perform the selected projection)

 "Limit (cost=1680.31..1681.99 rows=10 width=32) (actual time=12634.674..12634.842 rows=10 loops=1)" " -> Seq Scan on couchcontacts (cost=0.00..17186.53 rows=102282 width=32) (actual time=0.088..12629.401 rows=10010 loops=1)" "Planning time: 0.194 ms" "Execution time: 12634.895 ms" 
+6
source share
1 answer

I have done some tests and I see similar behavior. Each of them has insignificant differences in performance:

  • select id ...
  • select indexed_field ...
  • select unindexed_field ...
  • select json_field ...
  • select * ...

This, however, shows a difference in performance:

  • select json_field->>'key' ...

When json_field is null, the performance impact is negligible. When it is empty, it makes the situation a little worse. When it fills, it noticeably worsens. And when a field is loaded with big data, it gets much worse.

In other words, Postgres seems to want to non-initialize json data for every row visited. (Most likely, this is a mistake, and one that greatly affects RoR developers, seeing how they use json.)

Fwiw, I noted that re-assembling the request so that it uses CTE will work around the problem:

 with data as ( select * from table offset 10000 limit 10 ) select json_field->>'key' from data; 

(He can only get a slightly better plan than the id IN (...) request you allocated.)

+5
source

Source: https://habr.com/ru/post/979438/


All Articles