Postgres aggregate performance

I noticed some issues with simple aggregate performance in Postgres (8.3). The problem is that if I have a table (say, 200M rows) that is unique (customer_id, order_id), then the select customer_id,max(order_id) from larger_table group by customer_id request select customer_id,max(order_id) from larger_table group by customer_id more than an order of magnitude slower than a simple Java / program JDBC, which performs the following actions:

1) Initialize an empty HashMap client card (this will display id → maximum order size) 2) execute "select customer_id, order_id from large_table" and get a set of streaming results 3) iterate over the result set in each row, doing the following:

 long id = resultSet.getLong("customer_id"); long order = resultSet.getLong("order_id"); if (!customerMap.containsKey(id)) customerMap.put(id,order); else customerMap.put(id,Math.max(order,customerMap.get(id))); 

Is such a difference in performance expected? I should not think that, as I understand it, this is very close to what is happening inside the country. Is this proof that something is incorrectly / incorrectly configured with db?

+4
source share
1 answer

Your work_mem parameter is work_mem too low. First I would check it out. It recently bit me. The second, most likely problem, is that you are missing the foreign key index.

The following is a summary.

In general, there are a few questions to ask when the database performance looks lower:

  • Are you using the latest version? Each release of points between 7.4 and 9.0 provided a significant performance improvement - if an update is possible, it is recommended.
  • Do you use your test for realistic data? The PostgreSQL query planner will create different plans in the same table with different data or different amounts of data. Make sure you always check for realistic data.
  • What is your PostgreSQL configuration? The work_mem parameter is low out of the box, I myself encountered situations involving GROUP BY , where he artificially chose the wrong plan, because he simply did not think that he had enough working memory to sort the results.
  • Is your java code running on the same machine as your database? If not, you can see the differences between computers, not the differences between approaches.
  • Are you missing an index? PostgreSQL does not automatically create indexes for foreign keys, but only for primary keys. I was bitten by this too, but if you find Google, you can find a script that will detect and add missing foreign key indexes.

Without considering the query plan, it is not recommended to repeat which PostgreSQL implementation strategy you have chosen for the given query.

+6
source

Source: https://habr.com/ru/post/1396551/


All Articles