Fighting query optimization N + 1 in sleep mode

Question

Fighting query optimization N + 1 in sleep mode

I am trying to improve the n + 1 query in a project that I am working on. I use Hibernate with the model shown below, and I want to request all the items related to the portfolio, including the last two prices for each product (the price for this date and previous price).

enter image description here

API Example:

List<Items> items = findItemsWithLatestTwoPrices(portfolio, latestPriceDate);

Currently, I use one query to retrieve all the items associated with the portfolio, and then iterate over these items to request the last two prices for this item (so n + 1).

I tried to express this in my native sql using a correlated subquery, but the performance was terrible. This and the fact that there are new prices every day (so the request becomes slower) made me think that I need a different model, but Im struggling to create a model that is reasonably efficient and constant over time, as the amount of growth prices.

I was thinking of different solutions, including representing prices as linked lists or using some kind of tree, but I think there are better alternatives. Am I missing something? Has anyone working on a similar problem come up with a good solution?

I don’t care if I use HQL or my own SQL, while the performance is decent. I am also open to making changes to the model.

Thanks!

[change]

Since I have more than two years of pricing data, and there may be more than 1000 pr items. portfolio, extracting the entire schedule is probably not a good idea. I also need random access by date, so saving two prices as the fields in the element, unfortunately, are not an option.

+6

optimization data-structures orm hibernate query-optimization

ebaxt Jul 11 '11 at 21:28

source share

4 answers

You should try to get the items and prices in one request. If you do this, you can iterate over items and their prices without making selections for each item. Then your n + 1 problem will disappear.

For example, you can use target selection in your query or the definition of your association.

Regarding your efficiency with rising price targets. Perhaps you can save two prices for one or two additional fields of your item class. Then you can always get additional fields and be lazy to pick up old prices in your collection if you need to.

0

GeorgeG Jul 11 '11 at 10:08

source share

You can try several options

Since your prices are based on date, you can see the breakdown of your data by db by month. This will greatly help your needs, as the number of entries for searching prices will decrease significantly, and will not look at all 2-year prices. After that, try the SQL query. Also run the explanation to make sure you click on the desired indexes, etc.
Have you considered caching (ex: Memcache)? You can preload your product prices for the current and previous prices in the cache. Then you can get a portfolio, items and a search cache for prices, which should be pretty fast.

0

isobar Jul 13 '11 at 3:28

source share

If you use Postgre or Oracle, you can easily use the analytic / windowing function at the prices when you join them, extracting the first two values. While indexing a column for ORDER BY , this should provide good performance.

PS Next time, if you say that you intend to use your own SQL, add the provider / version of DB.

0

TC1 Jul 26 '11 at 12:11

source share

Anders s · Accepted Answer · 2011-07-14T07:09:23+0000

Not sure if I will catch all your problems, but as you probably thought, there is no easy solution to this problem with Hibernate. This will get to your domain modeling. I think it’s better to separate the ordinary case and the special case. You can model them in your regular domain or use special views for special occasions.

To get the n last prizes you tried to set the lot size in relation to? Make an ordered relation (the last one from the top), and then set the packet size to about 10. This would make a Hibernate query for 10 and 10 rows, and with foreign key and order column indices it should work fine in most cases.

It also seems to me that you can keep the extra relationships, as well as the whole set. Do not be afraid to explicitly model important relationships, such as “prices in recent months,” although this will be data duplication. In most cases, duplication in the database should be avoided.

For your date-based random access, it sounds like you are best served a user request instead of accessing through a domain model if they are too slow to consider using second-level caching, but I assume that your access pattern will not benefit from this .

Fighting query optimization N + 1 in sleep mode

More articles: