Search for covariance using SQL

# dt---------indx_nm1-----indx_val1-------indx_nm2------indx_val2 2009-06-08----ABQI------1001.2------------ACNACTR----------300.05 2009-06-09----ABQI------1002.12 ----------ACNACTR----------341.19 2009-06-10----ABQI------1011.4------------ACNACTR----------382.93 2009-06-11----ABQI------1015.43 ----------ACNACTR----------362.63 

I have a table that looks like ^ (but with hundreds of rows that date from 2009 to 2013). Is it possible to calculate the covariance: [( indx_val1 - avg ( indx_val1 )) * ( indx_val2 - avg ( indx_val2 )] divided by the total number of rows for each value indx_val1 and indx_val2 (sweep across the table) and return a simple value for cov ( ABQI , ACNACTR )

+4
source share
2 answers

Since you have agents working in two different groups, you will need two different queries. The main group is grouped by dt to get row values ​​per day. Another request is to execute the AVG() and COUNT() aggregates for the entire set of rows.

To use them simultaneously, you need to join them together. But since there is no real connection between the two queries, this is a Cartesian product, and we will use CROSS JOIN . Effectively, which concatenates each row of the main query with one row received by the aggregated query. You can then do the arithmetic in the SELECT list using the values ​​of both:

So, based on the request from your earlier question:

 SELECT indxs.*, ((indx_val2 - indx_val2_avg) * (indx_val1 - indx_val1_avg)) / total_rows AS cv FROM ( SELECT dt, MAX(CASE WHEN indx_nm = 'ABQI' THEN indx_nm ELSE NULL END) AS indx_nm1, MAX(CASE WHEN indx_nm = 'ABQI' THEN indx_val ELSE NULL END) AS indx_val1, MAX(CASE WHEN indx_nm = 'ACNACTR' THEN indx_nm ELSE NULL END) AS indx_nm2, MAX(CASE WHEN indx_nm = 'ACNACTR' THEN indx_val ELSE NULL END) AS indx_val2 FROM table1 a GROUP BY dt ) indxs CROSS JOIN ( /* Join against a query returning the AVG() and COUNT() across all rows */ SELECT 'ABQI' AS indx_nm1_aname, AVG(CASE WHEN indx_nm = 'ABQI' THEN indx_val ELSE NULL END) AS indx_val1_avg, 'ACNACTR' AS indx_nm2_aname, AVG(CASE WHEN indx_nm = 'ACNACTR' THEN indx_val ELSE NULL END) AS indx_val2_avg, COUNT(*) AS total_rows FROM table1 b WHERE indx_nm IN ('ABQI','ACNACTR') /* And it is a cartesian product */ ) aggs WHERE indx_nm1 IS NOT NULL AND indx_nm2 IS NOT NULL ORDER BY dt 

Here is a demo built on your previous one: http://sqlfiddle.com/#!6/2ec65/14

+4
source

Here's a Scalar-value function for performing covariance calculations on any XML formatted column table.

To check: compile the function, then do an alpha test

  CREATE Function [dbo].[Covariance](@XmlTwoValueSeries xml) returns float as Begin /* -- ----------- -- ALPHA TEST -- ----------- IF object_id('tempdb..#_201610101706') is not null DROP TABLE #_201610101706 select * into #_201610101706 from ( select * from ( SELECT '2016-01' Period, 1.24 col0, 2.20 col1 union SELECT '2016-02' Period, 1.6 col0, 3.20 col1 union SELECT '2016-03' Period, 1.0 col0, 2.77 col1 union SELECT '2016-04' Period, 1.9 col0, 2.98 col1 ) A ) A DECLARE @XmlTwoValueSeries xml SET @XmlTwoValueSeries = ( SELECT col0,col1 FROM #_201610101706 FOR XML PATH('Output') ) SELECT dbo.Covariance(@XmlTwoValueSeries) Covariance */ declare @returnvalue numeric(20,10) set @returnvalue = ( SELECT SUM((x - xAvg) *(y - yAvg)) / MAX(n) AS [COVAR(x,y)] from ( SELECT 1E * xx, AVG(1E * x) OVER (PARTITION BY (SELECT NULL)) xAvg, 1E * yy, AVG(1E * y) OVER (PARTITION BY (SELECT NULL)) yAvg, COUNT(*) OVER (PARTITION BY (SELECT NULL)) n FROM ( SELECT ecvalue('(col0/text())[1]', 'float' ) x, ecvalue('(col1/text())[1]', 'FLOAT' ) y FROM @XmlTwoValueSeries.nodes('Output') e(c) ) A ) A ) return @returnvalue end GO 
0
source

Source: https://habr.com/ru/post/1494975/


All Articles