Only case selection for all time periods

I have a longitudinal data set for the month in which there is some loss of the user.

I would like to multiply the data only for those users who are active for all 30 days , but I could not find an example of this type of subset. Here is an example of data composition:

date userID x 2001-11-08 1 20 2001-11-08 2 2 2001-11-08 3 10 2001-11-08 4 5 2001-11-08 5 1 2001-11-09 1 19 2001-11-09 3 4 2001-11-09 4 5 ... 2001-11-30 1 15 
+4
source share
4 answers
 subset(dnow, ave(as.numeric(date), userID, FUN=function(x) length(unique(x)))==30) 
+2
source

You should consider using data processing tools in the plyr library.

 library(plyr) startdate <- ISOdate(2011, 1, 1) userdata <- data.frame( date = startdate + rep(1:31, each=3), userID = 1 + round(9*runif(93)), x = round(100*runif(93)) ) summary <- ddply(userdata, .(userID), summarize, activedays=length(date)) summary[summary$activedays >= 30, ] 

You can learn more about plyr at the excellent Hadley website: http://had.co.nz/plyr/

+2
source

I would use ave to determine the number of days that each user was active per month.

 Data$activeDays <- ave(Data$userID, Data$userID, FUN=length) Data[ Data$activeDays >= 30, ] 

It would be a little more complicated if your data set contained several months ...

+2
source
 which(tapply(userdata$date, userdata$userID, length) == 30) 
-1
source

Source: https://habr.com/ru/post/1338914/


All Articles