PostgreSQL slows down on a large table with arrays and lots of updates

I have a rather large table (20M records) that has a column index of 3 and an array column. The array column is updated daily (by adding new values) for all rows. There are also inserts, but not as many as updates.

The data in the array are daily measurements corresponding to three keys, something like this: [[date_id_1, my_value_for_date_1], [date_id_2, my_value_for_date_2]] . It is used to plot these daily values. Let's say I want to visualize the value for the key (a, b, c) over time, I do SELECT values FROM t WHERE a = my_a AND b = my_b AND c = my_c . Then I use the values array to draw the graph.

The performance of updates (which occur once per day) has deteriorated significantly over time.

Using PostgreSQL 8.3.8.

Can you give me any hints on where to look for a solution? It can be anything: from setting some parameters in postgres to moving to another database (I think that a non-relational database is better for this particular table, but I have little experience with them). A.

+8
performance optimization postgresql
Jun 23 '10 at 8:32
source share
4 answers

I would look at FILLFACTOR at the table. By default it is set to 100, you can lower it to 70 (for starters). After that, you need to execute VACUUM FULL to rebuild the table.

 ALTER TABLE tablename SET (FILLFACTOR = 70); VACUUM FULL tablename; REINDEX TABLE tablename; 

This gives UPDATE the ability to place an updated copy of the string on the same page as the original, which is more efficient than posting on another page. Or, if your database is already somewhat fragmented due to the large number of previous updates, it may already be quite sparese. Now your database also has the ability to do HOT updates , assuming that the column being updated is not participating in any index.

+16
Jun 23 '10 at 8:56
source share

Not sure if arrays are the way to go here.

Why not save them in a separate table (one value plus keys in a row) then you bulk update will be pure insertion activity.

+2
Jun 23 '10 at 8:37
source share

The problem is in the updates. Change the schema from the array to a few lines per day and the performance problem will disappear.

Now you can add drives to arrays with a kind of cronjob, but avoid updates.

+2
Jun 23 '10 at 11:11
source share

Well, a 3-column index has nothing to worry about. This does not necessarily make it much slower. But this column of the array can really be a problem. You say you add values ​​to this column of the array daily. By adding, you mean adding values ​​for all 20 million. records in the table? Or just notes?

The situation for me is not entirely clear, but I would suggest exploring ways to get rid of this column. For example, make it a separate table. But it depends on your situation and may not be an option. It can only be me, but I always feel "dirty", having such a column in one of my tables. And most of the time there is a better solution to the problem that you are trying to solve with this array column. In this case, of course, there are situations when such a column is valid, but at the moment I can’t think of anything. Of course, not in the table with 20 million number of records.

+1
Jun 23 '10 at 8:38
source share



All Articles