It’s a good idea to save the subtotal field in the database

I have a MySQL table that represents a list of orders and a related child table that represents the shipments associated with each order (some orders have more than one batch, but most have only one).

Each batch has a number of costs, for example:

  • Itemcost
  • ShippingCost
  • Handlingcost
  • Taxcost

There are many places in the application where I need to get consolidated order information, for example:

  • TotalItemCost
  • TotalShippingCost
  • TotalHandlingCost
  • TotalTaxCost
  • Totalcost
  • TotalPaid
  • Totalprofit

All these fields depend on the aggregated values ​​in the corresponding shipping table. This information is used in other queries, reports, screens, etc., Some of which should quickly return the result for tens of thousands of records for the user.

As I see, there are several basic ways to do this:

  • Use the subquery to calculate these items from the shipment table when they are needed. This complicates the situation for all those requested who need all or part of this information. He is also slow.

  • Create a view that provides subqueries as simple fields. This saves the reports they need.

  • Add these fields to the order table. This will give me the work I'm looking for, due to the need to duplicate the data and calculate it when I make any changes to the delivery record.

One more thing, I use a business layer that provides functions for retrieving this data (e.g. GetOrders (filter)), and I don't need subtotals every time (or just some of them for a while), so generating a subquery every time ( even in terms of) is probably a bad idea.

Are there any recommendations anyone can point me to help me decide which is the best design for this?

By the way, I ended up # 3 mainly for reasons of simplicity of performance and queries.

Update:

Got a lot of great reviews pretty quickly, thanks everyone. To give a little more background, one of the places where the information is displayed is on the admin console, where I have a potentially very long list of orders, and you need to show TotalCost, TotalPaid and TotalProfit for each.

+6
source share
5 answers

There is absolutely nothing wrong with swapping your statistics and storing them to improve application performance. Just keep in mind that you will probably need to create a set of triggers or tasks to synchronize convolutions with the source data.

+4
source

I would probably go about this by caching subtotals in the database for maximum query performance if you read most of the time instead of writing. Create an update trigger to recalculate the subtotal when the row changes.

I would use only a view to calculate them for SELECT , if the number of rows was usually quite small and access is somewhat unclear. Performance will be much better if you cache them.

+3
source

Option 3 is the fastest
If and when you encounter performance problems, and if you cannot solve them in any other way, option # 3 is the way to go.

Use Triggers to Update
You must use triggers after insertion, update, and deletion so that subtotals in the order table synchronize with the underlying data.
Be especially careful when retrospectively changing prices and so on, as this will require a complete recount of all subtotals. Thus, you will need many triggers that usually do not do most of the time.
if the tax changes, it will change in the future, for orders that you do not already have.

If the triggers take a lot of time, make sure that you do these updates after hours.

Perform automatic checks periodically to ensure that the cached values ​​are correct.
You can also save the golden subquery in place, which calculates all the values ​​and checks them for the stored values ​​in the order table.
Run this query every night and report any anomalies so you can see when the denormalized values ​​are out of sync.

Do not invoice orders that have not been processed with a validation request
Add an additional date field to the order table called timeoflastsuccesfullvalidation and set it to null if the check has not been performed.
Only invoice items with dateoflastsuccesfullvalidation less than 24 hours ago.
Of course, you do not need to check orders that are fully processed, but only orders that are expected.

Option 1 may be fast enough
Regarding No. 1

He is also slow.

It depends a lot on how you query the DB.
You specify the subqueries, in the lower main full skeletal query, I don’t see the need for many subqueries, so you are a little perplexing to me.

 SELECT field1,field2,field3 , oifield1,oifield2,oifield3 , NettItemCost * (1+taxrate) as TotalItemCost , TotalShippingCost , TotalHandlingCost , NettItemCost * taxRate as TotalTaxCost , (NettItemCost * (1+taxrate)) + TotalShippingCost + TotalHandlingCost as TotalCost , TotalPaid , somethingorother as TotalProfit FROM ( SELECT o.field1,o.field2, o.field3 , oi.field1 as oifield1, i.field2 as oifield2 ,oi.field3 as oifield3 , SUM(c.productprice * oi.qty) as NettItemCost , SUM(IFNULL(sc.shippingperkg,0) * oi.qty * p.WeightInKg) as TotalShippingCost , SUM(IFNULL(hc.handlingperwhatever,0) * oi.qty) as TotalHandlingCost , t.taxrate as TaxRate , IFNULL(pay.amountpaid,0) as TotalPaid FROM orders o INNER JOIN orderitem oi ON (oi.order_id = o.id) INNER JOIN products p ON (p.id = oi.product_id) INNER JOIN prices c ON (c.product_id = p.id AND o.orderdate BETWEEN c.validfrom AND c.validuntil) INNER JOIN taxes t ON (p.tax_id = t.tax_id AND o.orderdate BETWEEN t.validfrom AND t.validuntil) LEFT JOIN shippingcosts sc ON (o.country = sc.country AND o.orderdate BETWEEN sc.validfrom AND sc.validuntil) LEFT JOIN handlingcost hc ON (hc.id = oi.handlingcost_id AND o.orderdate BETWEEN hc.validfrom AND hc.validuntil) LEFT JOIN (SELECT SUM(pay.payment) as amountpaid FROM payment pay WHERE pay.order_id = o.id) paid ON (1=1) WHERE o.id BETWEEN '1245' AND '1299' GROUP BY o.id DESC, oi.id DESC ) AS sub 

Thinking about this, you need to break this request down into things that correspond to each order and order, but I'm lazy to do it now.

Speed ​​tips
Make sure you have indexes for all the fields involved in the join criteria.
Use the MEMORY table for smaller tables, such as tax and shippingcost , and use the hash index for id in the memory tables.

+3
source

I would avoid # 3 as much as possible. I prefer this for various reasons:

  • It is too difficult to discuss performance without measurement. Imaging user goes shopping adding order items to the order; every time an item is added, you need to update the order record, which may not be needed (some sites display only the total amount when you click on the shopping cart and are ready to place an order).

  • Having a duplicate column asks for errors - you cannot expect that every future developer / maintainer should know about this extra column. Triggers can help, but I think that triggers should only be used as a last resort to eliminate a bad database design.

  • For reporting purposes, a different database schema may be used. The reporting database can be significantly de-normalized in terms of performance without complicating the main application.

  • I tend to put the actual logic of calculating the subtotal at the application level because the subtotal is actually an overloaded thing related to different contexts - sometimes you need an “raw subtotal”, sometimes you need an subtotal after applying the discount. You simply cannot add columns to the order table for different scenarios.

+2
source

A good idea, unfortunately, MySQL does not have some functions that would make it very simple - calculated columns and indexed (materialized views). Perhaps you can simulate it using a trigger.

+1
source

Source: https://habr.com/ru/post/897952/


All Articles