Here is an example data frame that resembles a larger data set:
Day <- c(1, 2, NA, 3, 4, NA, NA, NA, NA, NA, 1, 2, 3, NA, NA, NA, NA, 1, 2, NA, NA, 3, 4, 5)
y <- rpois(length(Day), 2)
z <- seq(1:length(Day)) + 500
df <- data.frame(z, Day, y)
If the Day column contains a sequence of 4 or more missing (NA) values, this sequence represents the gap between the cohorts in the study. If the sequence has less than 4 NA, then the missing value is still considered part of the cohort (for example, line 3 is part of cohort 1, but line 8 is not). There are 3 cohorts in the sample data frame (Cohort 1: lines 1-5, Cohort 2: lines 11-13 and Cohort 3: lines 18-24). I would like to add a column indicating the cohort number and another column that indicates the day of the cohort study. Here is the code I used:
require(dplyr)
CheckNA <- rle(is.na(df$Day))
CheckNA$values <- CheckNA$lengths >= 4 & CheckNA$values == 1
ListNA <- rep(CheckNA$values, CheckNA$lengths)
df$Co <- rep(c(1, NA, 2, NA, 3), rle(ListNA)$lengths) %>% as.factor()
df <- df %>%
group_by (Co) %>%
mutate(CoDay = seq(Co)) %>%
as.data.frame()
df$CoDay <- ifelse(is.na(df$Co), NA, df$CoDay)
? , , 10 . , : c (1, NA, 2, NA, 3).
!