The average number of rows in which column = A in different rows in another column, grouped by the third column

Using SQL Server, I am trying to request some kind of averaged score from a table that I did not design, where basically I want a list grouped by one column to have the number of different values โ€‹โ€‹of another column that matches this criterion, as well as the number of rows, corresponding to another criterion (which I will use to create an averaged account or something else). It may not be difficult, but I have a bad day in theory theory, and any pointers would be greatly appreciated.

Here is a simplified and generalized scenario (diagram and sample data below). Say we have three columns:

  • objid (has a clustered index)
  • userid (no index, I could add it)
  • actiontype (no index, I can add it)

None of them are unique, and none of them can be null . We want to completely ignore any lines where actiontype is none . We want to know, for userid , how many actiontype = 'flag' lines are on average per object with which the user interacted.

So, if we have "ahmed", "joe" and "maria", and joe interacts with 3 objects and raises 5 flags, the number 5 / 3 = 1.6666 continuous; if "ahmed" interacted with 3 objects and did not raise flags, its number would be 0 ; if โ€œmariaโ€ interacted with 5 objects and raised 4 flags, its number would be 4 / 5 = 0.8 :

  + -------- + ------------------ +
 |  userid |  flags_per_object |
 + -------- + ------------------ +
 |  ahmed |  0 |
 |  joe |  1.66666667 |
 |  maria |  0.8 |
 + -------- + ------------------ + 

I will not be remotely surprised if it is closed as a duplicate, I simply cannot find it.

Here's a simplified table setup and data retrieval:

 create table tmp ( objid varchar(254) not null, userid varchar(254) not null, actiontype varchar(254) not null ) create clustered index tmp_objid on tmp(objid) insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'none') insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'none') insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'update') insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'close') insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'flag') insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'flag') insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'flag') insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'flag') insert into tmp (objid, userid, actiontype) values ('beta', 'joe', 'none') insert into tmp (objid, userid, actiontype) values ('beta', 'joe', 'none') insert into tmp (objid, userid, actiontype) values ('beta', 'joe', 'close') insert into tmp (objid, userid, actiontype) values ('beta', 'joe', 'flag') insert into tmp (objid, userid, actiontype) values ('gamma', 'joe', 'none') insert into tmp (objid, userid, actiontype) values ('delta', 'joe', 'update') insert into tmp (objid, userid, actiontype) values ('alpha', 'maria', 'update') insert into tmp (objid, userid, actiontype) values ('beta', 'maria', 'flag') insert into tmp (objid, userid, actiontype) values ('beta', 'maria', 'flag') insert into tmp (objid, userid, actiontype) values ('gamma', 'maria', 'flag') insert into tmp (objid, userid, actiontype) values ('gamma', 'maria', 'flag') insert into tmp (objid, userid, actiontype) values ('gamma', 'maria', 'update') insert into tmp (objid, userid, actiontype) values ('gamma', 'maria', 'close') insert into tmp (objid, userid, actiontype) values ('delta', 'maria', 'update') insert into tmp (objid, userid, actiontype) values ('epsilon', 'maria', 'update') insert into tmp (objid, userid, actiontype) values ('alpha', 'ahmed', 'none') insert into tmp (objid, userid, actiontype) values ('beta', 'ahmed', 'none') insert into tmp (objid, userid, actiontype) values ('gamma', 'ahmed', 'none') insert into tmp (objid, userid, actiontype) values ('gamma', 'ahmed', 'update') insert into tmp (objid, userid, actiontype) values ('delta', 'ahmed', 'update') insert into tmp (objid, userid, actiontype) values ('delta', 'ahmed', 'close') insert into tmp (objid, userid, actiontype) values ('epsilon', 'ahmed', 'update') insert into tmp (objid, userid, actiontype) values ('epsilon', 'ahmed', 'close') 
+6
source share
5 answers

Answer: It depends.

In my testing, my solution is the slowest of the group, no matter what test data I use. With real-life data, this is about half the speed of the fastest solution.

Mikael's solution is faster for the test data given in my question, and faster for a dataset with large but still small (our test system, about 2 thousand rows) in my real tables.

But the a1ex07 solution is faster for my full-sized real tables (our living system, about 700 thousand rows). There is not much distance between a1ex07 and Mikael, but a1ex07 definitely has an advantage.

In the end, I actually used the Mikael solution, because itโ€™s easier to conceptualize if you are not a L33t DB person (and the people doing the maintenance of this code, of which SQL is only a small part, will not be) and itโ€™s easier to adapt to various other scenarios.

So this is a meta-response from the wiki community that I will accept when time runs out, instead of accepting any of their excellent answers. If you find this helpful, vote for Michael's answer and a1ex07 answer , as I did.

+1
source

You can try the following:

 select t1.userid, CASE cnt2 WHEN 0 THEN 0 ELSE ISNULL(cast(cnt2 as float)/cnt1,0) END as num FROM ( select userid, COUNT(distinct(t1.objid)) as cnt2 from tmp as t1 where t1.actiontype <> 'none' group by t1.userid ) t1 LEFT JOIN ( SELECT t2.userid, COUNT(*) as cnt1 FROM tmp as t2 WHERE t2.actiontype='flag' GROUP BY t2.userid)b ON (b.userid = t1.userid) 

Although it looks uglier than your solution, it unexpectedly creates a better execution plan based on the test data you provided.

+5
source

(Answering my own question.)

I have something that works:

 select userid, cast(count(case when actiontype = 'flag' then 1 else null end) as float) / count(distinct(objid)) as flags_per_object from tmp where actiontype <> 'none' group by userid 

.... but I can't help but feel better there ...

+2
source
 ;with cte as ( select userid, sum(case actiontype when 'flag' then 1 else 0 end) as Flags from tmp where actiontype <> 'none' group by userid, [objid] ) select userid, cast(sum(Flags) as float)/count(*) as flags_per_object from cte group by userid 

As an auxiliary request instead of CTE

 select userid, cast(sum(Flags) as float)/count(*) as flags_per_object from (select userid, sum(case actiontype when 'flag' then 1 else 0 end) as Flags from tmp where actiontype <> 'none' group by userid, [objid]) as T group by userid 
+1
source
 SELECT userid, flags_per_object = COUNT(CASE actiontype WHEN 'flag' THEN objid END) * 1.0 / COUNT(DISTINCT objid) FROM tmp WHERE actiontype <> 'none' GROUP BY userid 
0
source

Source: https://habr.com/ru/post/889768/


All Articles