MySQL: Can't select records from specific sections?

I am working with MySQL 5.6 . I created a table with 366 partitions to store daily funds data. After a year, we had a maximum of 366 days, so I created 366 sections on this table. The hash sections were controlled by an integer column that stores from 1 to 366 for each record.

Report_Summary Table:

CREATE TABLE `Report_Summary` ( `PartitionsID` int(4) unsigned NOT NULL, `ReportTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `Amount` int(10) NOT NULL, UNIQUE KEY `UNIQUE` (`PartitionsID`,`ReportTime`), KEY `PartitionsID` (`PartitionsID`), KEY `ReportTime` (`ReportTime`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED /*!50100 PARTITION BY HASH (PartitionsID) PARTITIONS 366 */ 

My current request:

 SELECT DATE(RS.ReportTime) AS ReportDate, SUM(RS.Amount) AS Total FROM Report_Summary RS WHERE RS.ReportTime >= '2014-12-26 00:00:00' AND RS.ReportTime <= '2014-12-30 23:59:59' AND RS.PartitionsID BETWEEN DAYOFYEAR('2014-12-26 00:00:00') AND DAYOFYEAR('2014-12-30 23:59:59') GROUP BY ReportDate; 

The above query works fine and uses p360 partitions for p364 to get data. Now the problem is that I pass fromDate to '2014-12-26' and toDate to '2015-01-01' . Then the above request will not work. Because Day of the year 2015-01-01 is Year 1 , so my conditions have failed.

Now I tried to pass the value to IN , then it works fine in checking the database below the query:

 SELECT DATE(RS.ReportTime) AS ReportDate, SUM(RS.Amount) AS Total FROM Report_Summary RS WHERE RS.ReportTime >= '2014-12-26 00:00:00' AND RS.ReportTime <= '2015-01-01 23:59:59' AND RS.PartitionsID IN (360,361,362,363,364,365,1) GROUP BY ReportDate; 

To generate the above script, I created a function and passed two dates and generated a comma-separated string of identifiers

 SELECT GenerateRange('2014-12-26 00:00:00', '2015-01-01 23:59:59'); 

Which returns my data as:

 '360,361,362,363,364,365,366,1' 

And I tried to use this function in my request, so I changed my request as shown below:

 SELECT DATE(RS.ReportTime) AS ReportDate, SUM(RS.Amount) AS Total FROM Report_Summary RS WHERE RS.ReportTime >= '2014-12-26 00:00:00' AND RS.ReportTime <= '2015-01-01 23:59:59' AND FIND_IN_SET(RS.PartitionsID, GenerateRange('2014-12-26 00:00:00', '2015-01-01 00:00:00')) GROUP BY ReportDate; 

Then I checked the execution plan of the above query using EXPLAIN PARTITION SELECT .... And I found that my condition would not work. It uses all partitions to retrieve data. I want to use only certain sections of these dates. You only need to check these sections 360,361,362,363,364,365,366,1 : p360 to p366 and p1 .

Why is my request not working? And this is the wrong way to implement this, then I want a solution. How can i achieve this?

I know that from coding I can implement this, but I need to write a request to implement it.

Thanks...

+5
source share
4 answers

I got solutions for this, I changed my logic for storing the PartitionsId column in my table. First, I store the DayOfYear (reportTime) column in the PartitionsId column. Now I have changed this logic by saving TO_DAYS (reportTime) and saved in the PartitionsId column.

So my table structure is as follows:

 CREATE TABLE `Report_Summary` ( `PartitionsID` int(10) unsigned NOT NULL, `ReportTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `Amount` int(10) NOT NULL, UNIQUE KEY `UNIQUE` (`PartitionsID`,`ReportTime`), KEY `PartitionsID` (`PartitionsID`), KEY `ReportTime` (`ReportTime`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED /*!50100 PARTITION BY HASH (PartitionsID) PARTITIONS 366 */ INSERT INTO `Report_Summary` (`PartitionsID`, `ReportTime`, `Amount`) VALUES('735928','2014-12-26 11:46:12','100'); INSERT INTO `Report_Summary` (`PartitionsID`, `ReportTime`, `Amount`) VALUES('735929','2014-12-27 11:46:23','50'); INSERT INTO `Report_Summary` (`PartitionsID`, `ReportTime`, `Amount`) VALUES('735930','2014-12-28 11:46:37','44'); INSERT INTO `Report_Summary` (`PartitionsID`, `ReportTime`, `Amount`) VALUES('735931','2014-12-29 11:46:49','15'); INSERT INTO `Report_Summary` (`PartitionsID`, `ReportTime`, `Amount`) VALUES('735932','2014-12-30 11:46:59','56'); INSERT INTO `Report_Summary` (`PartitionsID`, `ReportTime`, `Amount`) VALUES('735933','2014-12-31 11:47:22','68'); INSERT INTO `Report_Summary` (`PartitionsID`, `ReportTime`, `Amount`) VALUES('735934','2015-01-01 11:47:35','76'); INSERT INTO `Report_Summary` (`PartitionsID`, `ReportTime`, `Amount`) VALUES('735935','2015-01-02 11:47:43','88'); INSERT INTO `Report_Summary` (`PartitionsID`, `ReportTime`, `Amount`) VALUES('735936','2015-01-03 11:47:59','77'); 

Check SQL FIDDLE DEMO :

My request:

 EXPLAIN PARTITIONS SELECT DATE(RS.ReportTime) AS ReportDate, SUM(RS.Amount) AS Total FROM Report_Summary RS WHERE RS.ReportTime >= '2014-12-26 00:00:00' AND RS.ReportTime <= '2015-01-01 23:59:59' AND RS.PartitionsID BETWEEN TO_DAYS('2014-12-26 00:00:00') AND TO_DAYS('2015-01-01 23:59:59') GROUP BY ReportDate; 

The above query scans certain sections that I need, and also uses the correct index. So I came up with the right solution after changing the logic of the PartitionsId column.

Thanks for all the answers and many thanks to every time ...

0
source

There are several options that I can think of.

  • Create case that cover multi-year search criteria.
  • Create a CalendarDays table and use it to get a separate DayOfYear list for your in sentence.
  • Change option 1, but using union to search each range separately

Option 1: Using case . This is ugly, but it seems to work. There is a scenario in which this parameter may look for one additional section, 366, if the query covers years during a non-leap year. Also, I'm not sure that the optimizer will like OR in the RS.ParitionsID filter, but you can try it.

 SELECT DATE(RS.ReportTime) AS ReportDate, SUM(RS.Amount) AS Total FROM Report_Summary RS WHERE RS.ReportTime >= @startDate AND RS.ReportTime <= @endDate AND ( RS.PartitionsID BETWEEN CASE WHEN --more than one year, search all days year(@endDate) - year(@startDate) > 1 --one full year difference OR year(@endDate) - year(@startDate) = 1 AND DAYOFYEAR(@startDate) <= DAYOFYEAR(@endDate) THEN 1 ELSE DAYOFYEAR(@startDate) END and CASE WHEN --query spans the end of a year year(@endDate) - year(@startDate) >= 1 THEN 366 ELSE DAYOFYEAR(@endDate) END --Additional query to search less than portion of next year OR RS.PartitionsID <= CASE WHEN year(@endDate) - year(@startDate) > 1 OR DAYOFYEAR(@startDate) > DAYOFYEAR(@endDate) THEN DAYOFYEAR(@endDate) ELSE NULL END ) GROUP BY ReportDate; 

Option 2: Use the CalendarDays table. This option is much cleaner. The downside is that you will need to create a new CalendarDays table if you don't have one.

 SELECT DATE(RS.ReportTime) AS ReportDate, SUM(RS.Amount) AS Total FROM Report_Summary RS WHERE RS.ReportTime >= @startDate AND RS.ReportTime <= @endDate AND RS.PartitionsID IN ( SELECT DISTINCT DAYOFYEAR(c.calDate) FROM dbo.calendarDays c WHERE c.calDate >= @startDate and c.calDate <= @endDate ) 

EDIT: Option 3: Option 1, but using Union All to search for each range separately. The idea here is that since there is no OR in the instruction, the optimizer will be able to apply partition separation. Note. I usually do not work in MySQL , so my syntax may be slightly different, but there is a general idea.

 DECLARE @startDate datetime, @endDate datetime; DECLARE @rangeOneStart datetime, @rangeOneEnd datetime, @rangeTwoStart datetime, @rangeTwoEnd datetime; SELECT @rangeOneStart := CASE WHEN --more than one year, search all days year(@endDate) - year(@startDate) > 1 --one full year difference OR year(@endDate) - year(@startDate) = 1 AND DAYOFYEAR(@startDate) <= DAYOFYEAR(@endDate) THEN 1 ELSE DAYOFYEAR(@startDate) END , @rangeOneEnd := CASE WHEN --query spans the end of a year year(@endDate) - year(@startDate) >= 1 THEN 366 ELSE DAYOFYEAR(@endDate) END , @rangeTwoStart := 1 , @rangeTwoEnd := CASE WHEN year(@endDate) - year(@startDate) > 1 OR DAYOFYEAR(@startDate) > DAYOFYEAR(@endDate) THEN DAYOFYEAR(@endDate) ELSE NULL END ; SELECT t.ReportDate, sum(t.Amount) as Total FROM ( SELECT DATE(RS.ReportTime) AS ReportDate, RS.Amount FROM Report_Summary RS WHERE RS.PartitionsID BETWEEN @rangeOneStart AND @rangeOneEnd AND RS.ReportTime >= @startDate AND RS.ReportTime <= @endDate UNION ALL SELECT DATE(RS.ReportTime) AS ReportDate, RS.Amount FROM Report_Summary RS WHERE RS.PartitionsID BETWEEN @rangeTwoStart AND @rangeTwoEnd AND @rangeTwoEnd IS NOT NULL AND RS.ReportTime >= @startDate AND RS.ReportTime <= @endDate ) t GROUP BY ReportDate; 
+1
source

To begin to solve this problem, you need a subquery, according to the date range, that returns a result set consisting of all the DAYOFYEAR() values ​​in this range.

Let's look at it. First we need a query that can return a sequence of all integers from 0 to at least 366. Here is the query. It returns a column of seq values ​​of 0-624.

 SELECT AN + 5*(BN + 5*(CN + 5*(DN))) AS seq FROM (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS A JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS B JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS C JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS D 

(This is a simple cross-merge to create all combinations of 5 ** 4 numbers.)

Next, we must use this to create a list of DAYOFYEAR () values. Let me use your start and end dates for an example. This query creates a result set containing a bunch of integers showing the days of the year in this date range.

 SELECT DISTINCT DAYOFYEAR(first_day + INTERVAL seq DAY) doy FROM (SELECT DATE('2014-12-26 00:00:00') AS first_day, DATE('2015-01-01 23:59:59') AS last_day ) params JOIN ( SELECT AN + 5*(BN + 5*(CN + 5*(DN))) AS seq FROM (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS A JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS B JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS C JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS D ) seq ON seq.seq <= TIMESTAMPDIFF(DAY,first_day,last_day) ORDER BY 1 

I think you can convince yourself that this rude request works correctly for any reasonable range of days, covering about a year and a half (625 days) or less. If you use longer periods of time, you can ruin leap years.

Finally, you can use this query in your PartitionsID IN () clause. It will look like this.

 SELECT DATE(RS.ReportTime) AS ReportDate, SUM(RS.Amount) AS Total FROM Report_Summary RS WHERE RS.ReportTime >= '2014-12-26 00:00:00' AND RS.ReportTime <= '2015-01-01 23:59:59' AND RS.PartitionsID IN ( SELECT DISTINCT DAYOFYEAR(first_day + INTERVAL seq DAY) doy FROM (SELECT DATE('2014-12-26 00:00:00') AS first_day, DATE('2015-01-01 23:59:59') AS last_day ) params JOIN ( SELECT AN + 5*(BN + 5*(CN + 5*(DN))) AS seq FROM (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS A JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS B JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS C JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS D ) seq ON seq.seq <= TIMESTAMPDIFF(DAY,first_day,last_day) ORDER BY 1 ) GROUP BY ReportDate; 

That should do it for you.

If you use MariaDB 10+, built in sequence tables , named as seq_0_to_624 .

There is a post on this topic:

http://www.plumislandmedia.net/mysql/filling-missing-data-sequences-cardinal-integers/

0
source

Based on your SELECT, you really need Data Warehousing technology called PivotTables. With this, you summarize the data every day (or hour or something else) and save subtotals in a much smaller table. Then the β€œreport” looks through this table and summarizes the subtotals. This is often 10 times faster than scanning the brute force of raw data. More details: http://mysql.rjweb.org/doc.php/datawarehouse .

Doing this eliminates the need for PARTITIONing in the source data ("Fact Table") or pivot table.

However, if you need to clear old data, then PARTITIONING may come in handy due to DROP PARTITION. For this, you use BY RANGE (TO_DAYS (...)), not HASH.

0
source

Source: https://habr.com/ru/post/1210214/


All Articles