Calculation of simple retention in R

Question

Calculation of simple retention in R

For a data set, testmy goal is to find out how many unique users were transferred from one period to another depending on the period.

> test
   user_id period
1        1      1
2        5      1
3        1      1
4        3      1
5        4      1
6        2      2
7        3      2
8        2      2
9        3      2
10       1      2
11       5      3
12       5      3
13       2      3
14       1      3
15       4      3
16       5      4
17       5      4
18       5      4
19       4      4
20       3      4

For example, in the first period there were four unique users (1, 3, 4, and 5), two of which were active in the second period. Therefore, the retention rate will be 0.5. In the second period there were three unique users, two of which were active in the third period, so the retention rate will be 0.666, etc. How can I find the percentage of unique users who are active in the next period? Any suggestions would be appreciated.

The output will be as follows:

> output
  period retention
1      1        NA
2      2     0.500
3      3     0.666
4      4     0.500

Data test:

> dput(test)
structure(list(user_id = c(1, 5, 1, 3, 4, 2, 3, 2, 3, 1, 5, 5, 
2, 1, 4, 5, 5, 5, 4, 3), period = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 
2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4)), .Names = c("user_id", "period"
), row.names = c(NA, -20L), class = "data.frame")

+6

r dplyr retention

the_darkside May 19, '17 at 20:46

source share

3

? , , , mapply.

splt <- split(test$user_id, test$period)

carryover <- function(x, y) {
    length(unique(intersect(x, y))) / length(unique(x))
}
mapply(carryover, splt[1:(length(splt) - 1)], splt[2:length(splt)])

        1         2         3 
0.5000000 0.6666667 0.5000000

+4

Daniel Anderson 19 '17 21:14

dplyr, summarise:

test %>% 
group_by(period) %>% 
summarise(retention=length(intersect(user_id,test$user_id[test$period==(period+1)]))/n_distinct(user_id)) %>% 
mutate(retention=lag(retention))

:

period retention
   <dbl>     <dbl>
1      1        NA
2      2 0.5000000
3      3 0.6666667
4      4 0.5000000

+3

Lamia 20 '17 0:08

svenhalvorson · Accepted Answer · 2017-05-19T20:59:51+0000

, , , . , df - :

# make a list to hold unique IDS by 
uniques = list()
for(i in 1:max(df$period)){
  uniques[[i]] = unique(df$user_id[df$period == i])
}

# hold the retention rates
retentions = rep(NA, times = max(df$period))

for(j in 2:max(df$period)){
  retentions[j] = mean(uniques[[j-1]] %in% uniques[[j]])
}

% in% , . , .

Calculation of simple retention in R

More articles: