I have a somewhat complicated set of tables for which I need to do some construction / optimization of SQL queries. Currently, most of the logic that is used to get the results we need is performed at the application level, which leads to terrible performance due to complete table crawls, etc. SQL is not my strong suit, so I thought I would go so that the SO crowd could see if anyone could reach out.
Infrastructure History:
- DB is MySQL5
- We access this data through Hibernate using Java
- The contents of most of these tables are relatively static, with the exception of the Salesperson-hourly-performance table, which contains a row for each hour of every day when the seller is active (for example, made or received a call) with the seller’s current performance account for the whole day. Given the number of sellers in the companies in question, this table can increase by 20K + rows per day.
Data objects
I created a simplified version of the table setup that includes the relevant data. The "real" tables contain about 20 companies, 300 branches, 20 thousand sellers and millions of sales performance reports.
CREATE TABLE `so_test`.`company` ( `id` int(10) unsigned NOT NULL auto_increment, `name` varchar(45) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=latin1; INSERT INTO company VALUES (7, 'CompanyXX'); CREATE TABLE `so_test`.`division` ( `id` int(10) unsigned NOT NULL auto_increment, `name` varchar(45) NOT NULL, `campanyId` int(10) unsigned NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=18 DEFAULT CHARSET=latin1; INSERT INTO division VALUES (17, 'APAC #1'); CREATE TABLE `so_test`.`salesperson` ( `id` int(10) unsigned NOT NULL auto_increment, `divisionId` int(10) unsigned NOT NULL, `name` varchar(45) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=213860 DEFAULT CHARSET=latin1; INSERT INTO salesperson VALUES (213859, 'bob jones'); CREATE TABLE `so_test`.`salesperson_hourly_performance` ( `id` int(10) unsigned NOT NULL auto_increment, `timestamp` DATETIME NOT NULL, `salesPersonId` int(10) unsigned NOT NULL, `callsInBound` int(10) unsigned NOT NULL, `callsOutBound` int(10) unsigned NOT NULL, `issuedOrders` int(10) unsigned NOT NULL, `salesRevenue` decimal(10,4) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=552395 DEFAULT CHARSET=latin1; INSERT INTO salesperson_hourly_performance VALUES (552394, '2009-05-03 22:00:00', 213859, 15, 17, 14, 10798.0478), (551254, '2009-05-03 21:00:00', 213859, 14, 16, 13, 9802.3620), (551115, '2009-05-03 20:00:00', 213859, 13, 14, 12, 9183.8250), (550072, '2009-05-03 19:00:00', 213859, 11, 13, 11, 8490.8678), (549613, '2009-05-03 18:00:00', 213859, 10, 11, 9, 7230.1125), (549389, '2009-05-03 17:00:00', 213859, 9, 10, 8, 6486.2173), (548861, '2009-05-03 16:00:00', 213859, 7, 9, 7, 5537.8553), (548059, '2009-05-03 15:00:00', 213859, 6, 8, 6, 4663.8469), (547466, '2009-05-03 14:00:00', 213859, 5, 7, 5, 4082.6388), (546729, '2009-05-03 13:00:00', 213859, 4, 6, 4, 3057.7368), (546611, '2009-05-03 12:00:00', 213859, 3, 5, 2, 1751.6135), (545642, '2009-05-03 11:00:00', 213859, 2, 4, 2, 1751.6135), (545558, '2009-05-03 10:00:00', 213859, 1, 3, 0, 0.0000), (545072, '2009-05-03 09:00:00', 213859, 1, 2, 0, 0.0000), (565071, '2009-05-04 13:00:00', 213859, 19, 17, 6, 4200.1710), (575070, '2009-05-06 14:00:00', 213859, 0, 2, 1, 120.0000);
Business Requirements:
- Configure a set of web-based user-panel user interfaces that provide a separate performance overview for companies, departments, and individual vendors.
- The user interface is much similar to one other, except for the data set: the company control panel combines all the data of all sellers in each of the component divisions and displays a row for each company, while the section toolbar for a particular company aggregates the data of each seller in this division and line per division.
User interfaces allow the user to select a date range for the report panel and sort by any of the columns. Displayed columns include:
(Company | Department | Sales person) Name, Total issued orders, Total sales revenue, Total incoming calls, Total outgoing outgoing calls.
My problem / request for SO:
The “inherited” approach (which was shameful, but sort of sorted-extremely acceptable when the output was for the daily magazine) was to programmatically iterate through performance data for each of the relevant objects (for example, each salesperson in a department in a company ), find the "last" for each of the specified days in the specified date range and summarize the data. However, given the massive dataset and the need to present this data live in the user interface, I need guidance / examples on how to create effective SQL queries against this dataset, which will allow pagination and sorting.
If there is any kind soul, please show me a reasonable request that receives the sum of each column of seller performance data for a certain date range (bearing in mind that for each day, the row used for the amount is the last in date for that day, for this seller).
A query that performs query # 1 across the entire range of sellers (for example, all sellers in a given company) with support for pagination and ordering in a specific column?
Hope I have included enough details to clarify what I ask ... please let me know if you need more information.
Many thanks to SO SQL gods!
UPDATE:
Missing keys from salesPerson → division and from division → company were added. In addition, the fixed timestamp data type must be DATETIME instead of VARCHAR.