Count duplicate events after a period of time or based on a secondary column

I currently have a table of access logs similar to

LogID UserID BuildingID Date/Time =========================================== 1 1 1 2013-01-01 10:00 2 2 1 2013-01-01 10:00 3 3 1 2013-01-01 10:30 4 3 2 2013-01-01 11:00 5 2 1 2013-01-01 11:00 6 4 1 2013-01-01 11:30 7 5 1 2013-01-01 11:30 8 5 1 2013-01-01 11:31 9 1 3 2013-01-01 12:00 10 1 3 2013-01-01 12:03 11 1 2 2013-01-01 12:05 

What I need to do is create a query to count the number of duplicate user entries based on the following 2 conditions:

  • Time difference in excess of X minutes - X will be a parameter specified by the user
  • OR each individual building for the user

For example, if I set a time difference of 5 minutes, then my results would be as follows:

 UserID AccessCount ==================== 1 3 <-- +1 for timediff (ID 1,10) +1 for building (ID 11) 2 2 <-- +1 for timediff (ID 2,5) 3 2 <-- +1 for building (ID 3,4) 4 1 5 1 <-- duplicate ignored because DateDiff < 5min 

Hope this makes sense.

To give some background, this concerns access to some of our buildings, and a business requirement has been adopted for some security analysis reports. In fact, we want to check access to specified time periods for duplicates (usually on weekends), but we must take into account the fact that some scroll points fail and require the user to scroll repeatedly. That is why I want dating as a swipe error, as a rule, to mean that the user will scroll several times in a very short time.

Any help is greatly appreciated, thanks in advance!

+4
source share
2 answers

You can rephrase your logic by thinking about when you are counting a line and not counting a line. You do not count the line when it is in the same building and for a certain period of the previous time in the same building.

I think this might be what you want:

 select userId, count(*) from (select LogID, UserID, BuildingID, dt, lag(dt) over (partition by userid, buildingid) as prevdt from t ) t where dt > prevdt + TIMEDIFF or prevdt is NULL 

In SQL, a constant added to a date is interpreted as the number of days. So, 5 minutes will be (5.0/60)/24 .

You have no examples in your data, but if you have three lines:

 1 1 1 11:30 2 1 2 11:31 3 1 1 11:32 

Then it will not count line three, because line 1 is covered by the first condition.

+3
source

Here is one approach:

 declare @duplicateMinutes int = 5 select UserID, AccessCount = count(1) from AccessLogs a where not exists ( select 1 from AccessLogs d where a.LogID < d.LogID -- add this to try and avoid duplicate times cancelling each other and a.UserID = d.UserID and a.BuildingID = d.BuildingID and a.SwipeTime >= dateadd(mi, -@duplicateMinutes , d.SwipeTime) and a.SwipeTime <= d.SwipeTime ) group by UserID order by UserID 

SQL Fiddle with a demo - gives the expected results for your data.

0
source

Source: https://habr.com/ru/post/1483943/


All Articles