Creating a Trendline from an SQL Dataset

The code below returns the number of allowed tickets and the number of open tickets for a period (period YYYY, WW), returning for a certain number of days. For example, if @NoOfDays is 7:

allowed | open | week | year | Period

56 | 30 | 13 | 2012 | 2012, 13

237 | 222 | 14 | 2012 | 2012, 14

“allowed” and “open” in charts (y) for the period (x). I would like to add another “trend” column, which will return a number that, when plotted over the period, will be the trend line (simple linear regression). I do want to use both sets of values ​​as one data source for the trend.

This is the code I have:

SELECT a.resolved, b.opened, a.weekClosed AS week, a.yearClosed AS year, CAST(a.yearClosed as varchar(5)) + ', ' + CAST(a.weekClosed as varchar(5)) AS period FROM (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS resolved, { fn WEEK(date_closed) } AS weekClosed, { fn YEAR(date_closed) } AS yearClosed FROM v_rpt_Service WHERE (date_closed >= DateAdd(Day, DateDiff(Day, 0, GetDate()) - @NoOfDays, 0)) GROUP BY { fn WEEK(date_closed) }, { fn YEAR(date_closed) }) AS a LEFT OUTER JOIN (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS opened, { fn WEEK(date_entered) } AS weekEntered, { fn YEAR(date_entered) } AS yearEntered FROM v_rpt_Service AS v_rpt_Service_1 WHERE (date_entered > = DateAdd(Day, DateDiff(Day, 0, GetDate()) - @NoOfDays, 0)) GROUP BY { fn WEEK(date_entered) }, { fn YEAR(date_entered) }) AS b ON a.weekClosed = b.weekEntered AND a.yearClosed = b.yearEntered ORDER BY year, week 

Edit:

According to serc.carleton.edu/files/mathyouneed/best_fit_line_dividing.pdf, it seems like I want to split the data in half and then calculate the average. Then I need to find the line of best fit and use the slope and y-intercept to calculate the values ​​needed to return in the trend using y = mx + b ?

I know that this is very possible in SQL, however the program into which I embed SQL has limitations on what I can do.

The red and blue dots are the numbers that I am returning now (open and resolved). I need to return a value for each period in the “trend” in order to create a purple line. (This image is hypothetical)

Hypothetical chart

+6
source share
3 answers

I get it. I split the data into several views and subqueries, essentially dividing the data in half. These are my formulas to get each value:

 *(each row is a week)* y1 = average of data first half y2 = average of data second half x1 = 1/4 of number of weeks x2 = 3/4 of number of weeks m = (y1-y2)/(x1-x2) b = y2 - (m * x2) trend = (m * row_number) + b 

And here is my (very dirty) SQL code:

 SELECT resolved_half1,resolved_half2,opened_half1,opened_half2, c.period, ((SUM (resolved_half1) OVER () + SUM(opened_half1) OVER ()) - (SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ())) / ((COUNT(resolved_half1) OVER () + COUNT(opened_half1) OVER ()) / 2) as y1, ((SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ()) / (COUNT(resolved_half2) OVER () + COUNT (opened_half2) OVER ())) as y2, ((COUNT(c.period) OVER ()) / 4) as x1, (((COUNT(c.period) OVER ()) / 4) * 3) as x2, ((CAST(((SUM (resolved_half1) OVER () + SUM(opened_half1) OVER ()) - (SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ())) / ((COUNT(resolved_half1) OVER () + COUNT(opened_half1) OVER ()) / 2) as float) - CAST(((SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ()) / (COUNT(resolved_half2) OVER () + COUNT (opened_half2) OVER ())) as float)) / (CAST(((COUNT(c.period) OVER ()) / 4) as float) - CAST( (((COUNT(c.period) OVER ()) / 4) * 3) as float))) as m, (CAST(((SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ()) / (COUNT(resolved_half2) OVER () + COUNT (opened_half2) OVER ())) as float) - (((CAST(((SUM (resolved_half1) OVER () + SUM(opened_half1) OVER ()) - (SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ())) / ((COUNT(resolved_half1) OVER () + COUNT(opened_half1) OVER ()) / 2) as float) - CAST(((SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ()) / (COUNT(resolved_half2) OVER () + COUNT (opened_half2) OVER ())) as float)) / (CAST(((COUNT(c.period) OVER ()) / 4) as float) - CAST( (((COUNT(c.period) OVER ()) / 4) * 3) as float))) * (((COUNT(c.period) OVER ()) / 4) * 3))) as b, ((((CAST(((SUM (resolved_half1) OVER () + SUM(opened_half1) OVER ()) - (SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ())) / ((COUNT(resolved_half1) OVER () + COUNT(opened_half1) OVER ()) / 2) as float) - CAST(((SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ()) / (COUNT(resolved_half2) OVER () + COUNT (opened_half2) OVER ())) as float)) / (CAST(((COUNT(c.period) OVER ()) / 4) as float) - CAST( (((COUNT(c.period) OVER ()) / 4) * 3) as float))) * (ROW_NUMBER() OVER(ORDER BY c.yearClosed,c.weekClosed))) + (CAST(((SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ()) / (COUNT(resolved_half2) OVER () + COUNT (opened_half2) OVER ())) as float) - (((CAST(((SUM (resolved_half1) OVER () + SUM(opened_half1) OVER ()) - (SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ())) / ((COUNT(resolved_half1) OVER () + COUNT(opened_half1) OVER ()) / 2) as float) - CAST(((SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ()) / (COUNT(resolved_half2) OVER () + COUNT (opened_half2) OVER ())) as float)) / (CAST(((COUNT(c.period) OVER ()) / 4) as float) - CAST( (((COUNT(c.period) OVER ()) / 4) * 3) as float))) * (((COUNT(c.period) OVER ()) / 4) * 3)))) as trend, ROW_NUMBER() OVER(ORDER BY c.yearClosed,c.weekClosed) as row FROM (SELECT *, CAST(yearClosed as varchar(5)) + ', ' + CAST(weekClosed as varchar(5)) AS period FROM (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS resolved_half1, { fn WEEK(date_closed) } AS weekClosed, { fn YEAR(date_closed) } AS yearClosed FROM v_rpt_Service WHERE (date_closed >= DateAdd(Day, DateDiff(Day, 0, GetDate()) - (180), 0)) GROUP BY { fn WEEK(date_closed) }, { fn YEAR(date_closed) }) AS a LEFT OUTER JOIN (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS opened_half1, { fn WEEK(date_entered) } AS weekEntered, { fn YEAR(date_entered) FROM v_rpt_Service AS v_rpt_Service_1 WHERE (date_entered > = DateAdd(Day, DateDiff(Day, 0, GetDate()) - (180), 0)) GROUP BY { fn WEEK(date_entered) }, { fn YEAR(date_entered) }) AS b ON a.weekClosed = b.weekEntered AND a.yearClosed = b.yearEntered) as c LEFT OUTER JOIN (SELECT *, CAST(yearClosed as varchar(5)) + ', ' + CAST(weekClosed as varchar(5)) AS period FROM (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS resolved_half2, { fn WEEK(date_closed) } AS weekClosed, { fn YEAR(date_closed) } AS yearClosed FROM v_rpt_Service WHERE (date_closed >= DateAdd(Day, DateDiff(Day, 0, GetDate()) - (180 / 2), 0)) GROUP BY { fn WEEK(date_closed) }, { fn YEAR(date_closed) }) AS d LEFT OUTER JOIN (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS opened_half2, { fn WEEK(date_entered) } AS weekEntered, { fn YEAR(date_entered)} AS yearEntered FROM v_rpt_Service AS v_rpt_Service_1 WHERE (date_entered > = DateAdd(Day, DateDiff(Day, 0, GetDate()) - (180 / 2), 0)) GROUP BY { fn WEEK(date_entered) }, { fn YEAR(date_entered) }) AS e ON d.weekClosed = e.weekEntered AND d.yearClosed = e.yearEntered ) as f ON c.yearClosed = f.yearClosed AND c.weekClosed = f.weekClosed AND c.weekEntered = f.weekEntered AND c.yearEntered = f.yearEntered AND c.period = f.period GROUP BY c.period, resolved_half1,resolved_half2,opened_half1,opened_half2,c.yearClosed,c.weekClosed ORDER BY row 

This code uses a hard coded value of 180 days. I still need to be able to use varibale to select the number of days (without getting a division error by 0), and the code really needs to be cleared. If someone can do these two things for me (I'm not the best in SQL), then generosity belongs to them.

Picture

Chart

+1
source

I was interested in the problem, and I found that the best way to get a complex request is to reformat it using my own style and conventions. I applied them to your decision, and the result is lower. I do not know if this will have any value for you ...

  • There were a few bits of code that I believe are not part of the MS T-SQL syntax, for example ({fn xxx } and WEEK(xxx) .
  • This code compiles, but I cannot run it, since I do not have a properly configured data table.
  • I made a lot of changes to the encoding that will require serious explanation, and I will skip most of this. Add a comment if you want to explain something.
  • I threw a lot of blanks. The difference between legible and illegible codes is just the perception and sensitivity of the observer, and you can hate my conventions.
  • Not sure what the end result should be (i.e. which columns are returned)

Some additional notes:

  • This request will not receive items entered for the week if no items were closed this week.
  • Weeks may be partial, for example. not all seven days may be present (adjust @Interval to always include full weeks, but what about odd intervals?)
  • Multiply count (*) by 1.0 to convert them to earlier floats (avoids reducing the number of lines and integers)
  • This was done by cte to allow earlier formulas to be replaced by characters in later formulas (at this point everything became much clearer).

So here is what I came up with:

 ;WITH cte as ( select c.period ,resolved_half1 ,resolved_half2 ,opened_half1 ,opened_half2 ,row = row_number() over(order by c.yearClosed, c.weekClosed) ,y1 = ((SUM(resolved_half1) + SUM(opened_half1)) - (SUM(resolved_half2) + SUM(opened_half2))) / ((count(resolved_half1) + count(opened_half1)) / 2) ,y2 = ((SUM(resolved_half2) + SUM(opened_half2)) / (count(resolved_half2) + COUNT (opened_half2))) ,x1 = ((count(c.period)) / 4) ,x2 = (((count(c.period)) / 4) * 3) from (select a.yearclosed ,a.weekClosed ,a.resolved_half1 ,b.yearEntered ,b.weekEntered ,b.opened_half1 ,cast(a.yearClosed as varchar(5)) + ', ' + cast(a.weekClosed as varchar(5)) period from (-- Number of items per week that closed within @Interval select count(distinct TicketNbr) * 1.0 resolved_half1 ,datepart(wk, date_closed) weekClosed ,year(date_closed) yearClosed from v_rpt_Service where date_closed >= @FullInterval group by datepart(wk, date_closed) ,year(date_closed) ) a left outer join (-- Number of items per week that were entered within @Interval select count(distinct TicketNbr) * 1.0 opened_half1 ,datepart(wk, date_entered) weekEntered ,year(date_entered) yearEntered from v_rpt_Service where date_entered >= @FullInterval group by datepart(wk, date_entered) ,year(date_entered) ) b on a.weekClosed = b.weekEntered and a.yearClosed = b.yearEntered) c left outer join (select d.yearclosed ,d.weekClosed ,d.resolved_half2 ,e.yearEntered ,e.weekEntered ,e.opened_half2 ,cast(yearClosed as varchar(5)) + ', ' + cast(weekClosed as varchar(5)) period from (select count(distinct TicketNbr) * 1.0 resolved_half2 ,datepart(wk, date_closed) weekClosed ,year(date_closed) yearClosed from v_rpt_Service where date_closed >= @HalfInterval group by datepart(wk, date_closed) ,year(date_closed) ) d left outer join (select count(distinct TicketNbr) * 1.0 opened_half2 ,datepart(wk, date_entered) weekEntered ,year(date_entered) yearEntered from v_rpt_Service where date_entered >= @HalfInterval group by datepart(wk, date_entered) ,year(date_entered) ) e on d.weekClosed = e.weekEntered and d.yearClosed = e.yearEntered ) f on c.period = f.period group by c.period ,resolved_half1 ,resolved_half2 ,opened_half1 ,opened_half2 ,c.yearClosed ,c.weekClosed ) SELECT row ,Period ,x1 ,y1 ,x2 ,y2 ,m = ((y1 - y2) / (x1 - x2)) ,b = (y2 - (((y1 - y2) / (x1 - x2)) * x2)) ,trend = ((((y1 - y2) / (x1 - x2)) * (row)) + (y2 - (((y1 - y2) / (x1 - x2)) * x2))) from cte order by row 

As an addition, all the subqueries “c” can be replaced with something like the following, and “f” with a slightly modified version. Better or worse performance depends on table size, indexing and other weightless.

 select datepart(wk, date_closed) weekClosed ,year(date_closed) yearClosed ,count (distinct case when date_closed >= @FullInterval then TicketNbr else null end) resolved_half1 ,count (distinct case when date_entered >= @FullInterval then TicketNbr else null end) opened_half1 from v_rpt_Service where date_closed >= @FullInterval or date_entered >= @FullInterval group by datepart(wk, date_closed) ,year(date_closed) 
+3
source

I believe that this will do the trick - if you do not post some actual sample data, and I will see if I can configure it to fix it:

 DECLARE @noOfDays INT SET @noofdays = 180 ;WITH tickets AS ( SELECT DISTINCT DATENAME(YEAR,date_closed) + RIGHT('000' + CAST(DATEPART(WEEK,date_closed) AS VARCHAR(5)),3) as Period ,ticket_nbr ,1 as ticket_type --resolved FROM v_rpt_Service WHERE (date_closed >= DateAdd(Day, DateDiff(Day, 0, GetDate()) - @NoOfDays, 0)) UNION ALL SELECT DISTINCT DATENAME(YEAR,date_closed) + RIGHT('000' + CAST(DATEPART(WEEK,date_closed) AS VARCHAR(5)),3) as Period ,ticket_nbr ,0 as ticket_type --opened FROM v_rpt_Service WHERE (date_entered > = DateAdd(Day, DateDiff(Day, 0, GetDate()) - @NoOfDays, 0)) ) ,tickets2 AS ( SELECT Period ,SUM(CASE WHEN ticket_type = 0 THEN 1 ELSE 0 END) as opened ,SUM(CASE WHEN ticket_type = 1 THEN 1 ELSE 0 END) as closed FROM tickets GROUP BY Period ) ,tickets3 AS ( SELECT Period ,row_number() OVER (ORDER BY period ASC) as row ,opened ,closed ,COUNT(period) OVER() as base ,SUM(opened) OVER () as [Sumopened] ,SUM(opened * opened) OVER () as [Sumopened^2] ,SUM(opened * closed) OVER () as [Sumopenedclosed] ,SUM(closed) OVER () as [Sumclosed] ,SUM(closed * closed) OVER () as [Sumclosed^2] ,SUM(opened * closed) OVER () * COUNT(period) OVER () AS [nSumopenedclosed] ,SUM(opened) OVER () * SUM(closed) OVER () AS [Sumopened*Sumclosed] ,SUM(opened * opened) OVER () * COUNT(period) OVER () AS [nSumopened^2] ,SUM(opened) OVER () * SUM(opened) OVER () as [Sumopened*Sumopened] FROM tickets2 ) --Formula for linear regression is Y = A + BX SELECT period ,opened ,closed ,((1.0 / base) * [Sumclosed]) - ([Sumopenedclosed] - ([Sumopened*Sumclosed] / base)) / ([Sumopened^2] - ([Sumopened*Sumopened] / base)) *((1.0 / base) * [Sumopened]) + row * ([Sumopenedclosed] - ([Sumopened*Sumclosed] / base)) / ([Sumopened^2] - ([Sumopened*Sumopened] / base)) AS trend_point ,((1.0 / base) * [Sumclosed]) - ([Sumopenedclosed] - ([Sumopened*Sumclosed] / base)) / ([Sumopened^2] - ([Sumopened*Sumopened] / base)) *((1.0 / base) * [Sumopened]) AS A ,([Sumopenedclosed] - ([Sumopened*Sumclosed] / base)) / ([Sumopened^2] - ([Sumopened*Sumopened] / base)) as B from tickets3 
0
source

Source: https://habr.com/ru/post/912622/


All Articles