Here is the basic R solution using aggregate() :
setNames(aggregate(status ~ userid, mytbl[mytbl$status == "login_failed", ], function(x) length(x)), c("userid", "failed_logins")) # userid failed_logins # 1 abc 5
Update
Another useful feature that comes to mind is ave() , which you can use as follows:
First, use ave() to add a new column to your dataset that processes a counter for each action by each user. ( Note : I had to make sure that the "userid" and "status" columns were a character class, not factors, to make this work for me).
mytbl$status_seq <- ave(mytbl$status, mytbl$userid, mytbl$status, FUN = seq_along) head(mytbl)
Second, use aggregate() as shown above, a subset of the condition you are interested in and retrieve the max value.
aggregate(status_seq ~ userid, mytbl[mytbl$status == "login_failed", ], function(x) max(x)) # userid status_seq # 1 abc 5 aggregate(status_seq ~ userid, mytbl[mytbl$status == "logged_out", ], function(x) max(x)) # userid status_seq # 1 aabc 1 # 2 abbc 1 # 3 abdc 1 # 4 abuc 1
Note that ave() may be even more interesting if you used
mytbl$status_seq <- ave(mytbl$status, mytbl$date, mytbl$userid, mytbl$status, FUN = seq_along)
as this will reset the counter for each new day in your dataset.
Finally (at the risk of sharing a solution that might be too obvious), since you are only interested in numbers, you can examine table() , which gives you all the information right away:
table(mytbl$userid, mytbl$status)
source share