How to efficiently use SQL to retrieve data at half hour intervals?

Question

How to efficiently use SQL to retrieve data at half hour intervals?

Task - Effectively get the amount of subtotals in half an hour

I am using MySQL and I have a table containing subtotals with different times. I want to get the sum of these sales in half an hour from 7 to 12 hours. My current solution (below) works, but takes 13 seconds to request about 150,000 records. I intend to have several million records in the future, and my current method is too slow.

How can I make this more efficient or, if possible, replace the PHP component with pure SQL? Also, will this solution help to be more efficient if I use Unix timestamps instead of a date and time column?

Table Name - Receipts

subtotal date time sale_id -------------------------------------------- 6 09/10/2011 07:20:33 1 5 09/10/2011 07:28:22 2 3 09/10/2011 07:40:00 3 5 09/10/2011 08:05:00 4 8 09/10/2011 08:44:00 5 ............... 10 09/10/2011 18:40:00 6 5 09/10/2011 23:05:00 7

Desired Result

An array like this:

Half an hour 1: (from 7:00 to 7:30) => The total amount is 11
Half an hour 2: (from 7:30 to 8:00) => Total amount 3
Half an hour 3: (from 8:00 to 8:30) => Subtotal Sum 5
Half an hour 4: (from 8:30 to 9:00) => The total amount is 8

Current method

The current method uses a for loop, which starts at 7 in the morning and increases by 1800 seconds, which is equivalent to half an hour. As a result, this amounts to about 34 database queries.

 for($n = strtotime("07:00:00"), $e = strtotime("23:59:59"); $n <= $e; $n += 1800) { $timeA = date("H:i:s", $n); $timeB = date("H:i:s", $n+1799); $query = $mySQL-> query ("SELECT SUM(subtotal) FROM Receipts WHERE time > '$timeA' AND time < '$timeB'"); while ($row = $query-> fetch_object()) { $sum[] = $row; } }

Current output

The output is just an array, where:

[0] represents 7 am to 7:30 am
[1] presents from 7:30 to 8:00.
[33] - from 11:30 to 23:59:59.
array ("0" => 10000, "1" => 20000, .............. "33" => 5000);

+6

performance sql php mysql

Pontus trade Aug 1 '12 at 21:35

source share

7 answers

First, I would use a single DATETIME column, but using a DATE and TIME column would work.

You can do all the work in one go using one request:

 select date, hour(`time`) hour_num, IF(MINUTE(`time`) < 30, 0, 1) interval_num, min(`time`) interval_begin, max(`time`) interval_end, sum(subtotal) sum_subtotal from receipts where date='2012-07-31' group by date, hour_num, interval_num;

+4

Justin swanhart Aug 1 '12 at 21:49

source share

UPDATE:

Since you are not interested in any “missing” lines, I am also going to assume (possibly erroneously) that you are not worried that the query may return rows for periods that are not from 7AM to 12AM. This query will return the result set you specified:

 SELECT (HOUR(r.time)-7)*2+(MINUTE(r.time) DIV 30) AS i , SUM(r.subtotal) AS sum_subtotal FROM Receipts r GROUP BY i ORDER BY i

Returns the period (i) index obtained from an expression that references the time column. For the best performance of this query, you probably want to have a coverage index available, for example:

 ON Receipts(`time`,`subtotal`)

If you are going to include the equality predicate in the date column (which does not appear in your solution, but which appears in the solution of the "selected" answer, then it would be nice to have this column as the leading index in the "covering" index.

 ON Receipts(`date`,`time`,`subtotal`)

If you want you to not return rows for periods before 7AM, you can simply add the HAVING i >= 0 to the query. (Lines for periods up to 7AM will generate a negative number for i.)

 SELECT (HOUR(r.time)-7)*2+(MINUTE(r.time) DIV 30) AS i , SUM(r.subtotal) AS sum_subtotal FROM Receipts r GROUP BY i HAVING i >= 0 ORDER BY i

BEFORE:

I suggested that you want the result set to be similar to the one you are returning now, but in one fell swoop. This query will return the same 33 rows that you are currently extracting, but with an additional column defining the period (0 - 33). This is as close as possible to your current solution, which I could get:

 SELECT ti , IFNULL(SUM(r.subtotal),0) AS sum_subtotal FROM (SELECT (d1.i + d2.i + d4.i + d8.i + d16.i + d32.i) AS i , ADDTIME('07:00:00',SEC_TO_TIME((d1.i+d2.i+d4.i+d8.i+d16.i+d32.i)*1800)) AS b_time , ADDTIME('07:30:00',SEC_TO_TIME((d1.i+d2.i+d4.i+d8.i+d16.i+d32.i)*1800)) AS e_time FROM (SELECT 0 i UNION ALL SELECT 1) d1 CROSS JOIN (SELECT 0 i UNION ALL SELECT 2) d2 CROSS JOIN (SELECT 0 i UNION ALL SELECT 4) d4 CROSS JOIN (SELECT 0 i UNION ALL SELECT 8) d8 CROSS JOIN (SELECT 0 i UNION ALL SELECT 16) d16 CROSS JOIN (SELECT 0 i UNION ALL SELECT 32) d32 HAVING i <= 33 ) t LEFT JOIN Receipts r ON r.time >= t.b_time AND r.time < t.e_time GROUP BY ti ORDER BY ti

Some important notes:

It looks like your current solution may be the “missing” lines from the receipts whenever the seconds are exactly “59” or “00”.

It also looks like you are not associated with a date component, you only get one value for all dates. (Perhaps I misunderstood this.) If so, separating the DATE and TIME columns helps in this, because you can reference the column unchanged in your query.

It's easy to add a WHERE clause in the date column. for example, to get interim reports in just one day, for example. add a WHERE clause before GROUP BY .

 WHERE r.date = '2011-09-10'

Coverage Index ON Receipts(time,subtotal) (if you do not already have a coverage index) can help in performance. (If you include an equality predicate in a date column (as in the WHERE clause above, the most suitable coverage index is probably ON Receipts(date,time,subtotal) .

I made the assumption that the time column has a TIME data type. (If this is not the case, then probably a slight adjustment of the request is required (in the embedded view with an alias like t ) so that the data type of the (received) b_time and e_time columns matches the time data type in the receipts.

Some of the proposed solutions in other answers are not guaranteed to return 33 lines if there are no lines in receipts over a period of time. Missing lines may not be a problem for you, but it is a common problem with timers and time period data.

I made the assumption that you would rather have a 33-string return guarantee. The above query returns a total value of zero if no rows were found according to the time period. (I note that your current solution will return NULL in this case. I sent and wrapped this SUM aggregate to the IFNULL function, so that it will return 0 when SUM is NULL.)

So, an inline query with the alias t is an ugly mess, but it works fast. What he does is generate 33 rows with various integer values from 0 to 33. At the same time, he gets the “start time” and “end time”, which will be used to “match” each period with the time column in the Receipts table.

We try not to wrap the time column from the Receipts table in any functions, but refer only to the bare column. And we want us to not have any implicit conversion (so we want the b_time and e__time data types to match. ADDTIME and SEC_TO_TIME function as time returning data types. (We can't get around the execution of matching and GROUP BY operations.)

The "end time" value for this last period is returned as "24:00:00", and we verify that this is the right time to match by running this test:

 SELECT MAKETIME(23,59,59) < MAKETIME(24,0,0)

which is successful (returns 1), so we are good there.

Output columns ( t.b_time and t.e_time ) can also be included in the result set, but they are not needed to create your array and are most likely more efficient if you do not include them.

One final note: for optimal performance, it may be useful to load the inline view named t into the actual table (the temporary table will be fine.), And then you can reference the table instead of the inline view. The advantage of this is that you can create an index in this table.

+2

spencer7593 Aug 1 '12 at 23:16

source share

One way to make it pure SQL is to use a lookup table. I don't know MySql very well, so there may be many code improvements. All my code would be Ms Sql .. I would do it something like this:

  /* Mock salesTable */ Declare @SalesTable TABLE (SubTotal int, SaleDate datetime) Insert into @SalesTable (SubTotal, SaleDate) VALUES (1, '2012-08-01 12:00') Insert into @SalesTable (SubTotal, SaleDate) VALUES (2, '2012-08-01 12:10') Insert into @SalesTable (SubTotal, SaleDate) VALUES (3, '2012-08-01 12:15') Insert into @SalesTable (SubTotal, SaleDate) VALUES (4, '2012-08-01 12:30') Insert into @SalesTable (SubTotal, SaleDate) VALUES (5, '2012-08-01 12:35') Insert into @SalesTable (SubTotal, SaleDate) VALUES (6, '2012-08-01 13:00') Insert into @SalesTable (SubTotal, SaleDate) VALUES (7, '2012-08-01 14:00') /* input data */ declare @From datetime, @To DateTime, @intervall int set @from = '2012-08-01' set @to = '2012-08-02' set @intervall = 30 /* Create lookup table */ DECLARE @lookup TABLE (StartTime datetime, EndTime datetime) DECLARE @tmpTime datetime SET @tmpTime = @from WHILE (@tmpTime <= @To) BEGIN INSERT INTO @lookup (StartTime, EndTime) VALUES (@tmpTime, dateAdd(mi, @intervall, @tmpTime)) set @tmpTime = dateAdd(mi, @intervall, @tmpTime) END /* Get data */ select l.StartTime, l.EndTime, sum(subTotal) from @SalesTable as SalesTable join @lookUp as l on SalesTable.SaleDate >= l.StartTime and SalesTable.SaleDate < l.EndTime group by l.StartTime, l.EndTime

0

Richard L Aug 1 '12 at 21:56

source share

In my request, I accept one datetime field named date. This will give you all the groups starting with what you give him, starting with:

 SELECT ABS(FLOOR(TIMESTAMPDIFF(MINUTE, date, '2011-08-01 00:00:00') / 30)) AS GROUPING , SUM(subtotal) AS subtotals FROM Receipts GROUP BY ABS(FLOOR(TIMESTAMPDIFF(MINUTE, date, '2011-08-01 00:00:00') / 30)) ORDER BY GROUPING

0

databyss Aug 1 '12 at 10:06

source share

Always use the correct data types for your data. In the case of your date / time columns, it is better to store them as (preferred UTC zoning) timestamps. This is especially true in that some times do not exist for some dates (for some timzones, therefore, UTC). You will need an index in this column.

In addition, your date / time range will not give you what you want, namely: you are missing something exactly at the hour (because you are using strict comparisons other than comparisons). Always define ranges as "lower bound inclusive, exclusive upper bound" (so time >= '07:00:00' AND time < '07:30:00' ). This is especially important for timestamps that have an additional number of fields to process.

Since mySQL does not have recursive queries, you will need some additional tables to clear this. I refer to them as "persistent" tables, but of course it is possible to define them in a row.

You will need a calendar table. They are useful for a number of reasons, but here we want them to list dates. This will allow us to display dates with subtotals of 0, if necessary. For the same reasons, you will also need a time value in increments of half an hour.

This will allow you to request your data as follows:

 SELECT division, COALESCE(SUM(subtotal), 0) FROM (SELECT TIMESTAMP(calendar_date, clock_time) as division FROM Calendar CROSS JOIN Clock WHERE calendar_date >= DATE('2011-09-10') AND calendar_date < DATE('2011-09-11')) as divisions LEFT JOIN Sales_Data ON occurredAt >= division AND occurredAt < division + INTERVAL 30 MINUTE GROUP BY division

(A working SQLFiddle example that uses a regular JOIN for short)

0

Clockwork-muse Aug 1 '12 at 22:32

source share

I also found another solution and posted it here for reference if anyone comes across this. Groups with an interval of half an hour.

 SELECT SUM(total), time, date FROM tableName GROUP BY (2*HOUR(time) + FLOOR(MINUTE(time)/30))

Link for more information http://www.artfulsoftware.com/infotree/queries.php#106

0

Pontus trade Aug 2 '12 at 16:39

source share

drew010 · Accepted Answer · 2012-08-01T21:54:15+0000

You can also try this single query, it should return a set of results with totals in 30-minute groups:

 SELECT date, MIN(time) as time, SUM(subtotal) as total FROM `Receipts` WHERE `date` = '2012-07-30' GROUP BY hour(time), floor(minute(time)/30)

To perform this action effectively, add a composite index to the date and time columns.

You should return a result set, for example:

 +---------------------+--------------------+ | time | total | +---------------------+--------------------+ | 2012-07-30 00:00:00 | 0.000000000 | | 2012-07-30 00:30:00 | 0.000000000 | | 2012-07-30 01:00:00 | 0.000000000 | | 2012-07-30 01:30:00 | 0.000000000 | | 2012-07-30 02:00:00 | 0.000000000 | | 2012-07-30 02:30:00 | 0.000000000 | | 2012-07-30 03:00:00 | 0.000000000 | | 2012-07-30 03:30:00 | 0.000000000 | | 2012-07-30 04:00:00 | 0.000000000 | | 2012-07-30 04:30:00 | 0.000000000 | | 2012-07-30 05:00:00 | 0.000000000 | | ... +---------------------+--------------------+

How to efficiently use SQL to retrieve data at half hour intervals?

More articles: