Fill NA Values โ€‹โ€‹with Group Sequence

I am trying to populate some values โ€‹โ€‹in a dataset. A simplified version of my data can be found below:

> example_df Date GROUP value 157 2018-01-31 10180 3.464 158 2018-02-28 10180 3.413 159 2018-03-31 10180 3.418 160 2018-04-30 10180 NA 161 2018-05-31 10180 NA 162 2018-06-30 10180 NA 163 2018-07-31 10180 NA 164 2018-08-31 10180 NA 165 2018-09-30 10180 NA 166 2018-10-31 10180 NA 167 2018-11-30 10180 NA 168 2018-12-31 10180 NA 169 2019-01-31 10180 NA 170 2019-02-28 10180 NA 171 2019-03-31 10180 NA 172 2019-04-30 10180 NA 173 2019-05-31 10180 NA 174 2019-06-30 10180 NA 175 2019-07-31 10180 NA 176 2019-08-31 10180 NA 177 2019-09-30 10180 NA 178 2019-10-31 10180 NA 179 2019-11-30 10180 NA 373 2018-01-31 10420 5.085 374 2018-02-28 10420 5.051 375 2018-03-31 10420 4.993 376 2018-04-30 10420 NA 377 2018-05-31 10420 NA 378 2018-06-30 10420 NA 379 2018-07-31 10420 NA 380 2018-08-31 10420 NA 381 2018-09-30 10420 NA 382 2018-10-31 10420 NA 383 2018-11-30 10420 NA 384 2018-12-31 10420 NA 385 2019-01-31 10420 NA 386 2019-02-28 10420 NA 387 2019-03-31 10420 NA 388 2019-04-30 10420 NA 389 2019-05-31 10420 NA 390 2019-06-30 10420 NA 391 2019-07-31 10420 NA 392 2019-08-31 10420 NA 393 2019-09-30 10420 NA 394 2019-10-31 10420 NA 395 2019-11-30 10420 NA 589 2018-01-31 10500 5.796 590 2018-02-28 10500 5.860 591 2018-03-31 10500 5.913 592 2018-04-30 10500 NA 593 2018-05-31 10500 NA 594 2018-06-30 10500 NA 595 2018-07-31 10500 NA 596 2018-08-31 10500 NA 597 2018-09-30 10500 NA 598 2018-10-31 10500 NA 599 2018-11-30 10500 NA 600 2018-12-31 10500 NA 601 2019-01-31 10500 NA 602 2019-02-28 10500 NA 603 2019-03-31 10500 NA 604 2019-04-30 10500 NA 605 2019-05-31 10500 NA 606 2019-06-30 10500 NA 607 2019-07-31 10500 NA 608 2019-08-31 10500 NA 609 2019-09-30 10500 NA 610 2019-10-31 10500 NA 611 2019-11-30 10500 NA 

As you can see. For each group, I have values โ€‹โ€‹until this month, and then a set of NS before the start of the next group. What I would like to do would be for each group to populate these NAs with a sequence that comes from the last non-NULL value and increases by a fixed value (I chose 0.065) until the group's final date. I would prefer a dplyr solution, but any information on how to achieve this would be very helpful. Thanks.

+5
source share
3 answers
 library(data.table) dt = as.data.table(yourdf) # or convert in place using setDT dt[, value := value[1] + 0.065 * (1:.N - 1) , by = .(GROUP, cumsum(!is.na(value)))] 
+2
source

You could do something like this (inspired by the comments and decisions of Frank and eddi):

 df$value2 <- ave(df$value, df$GROUP, cumsum(!is.na(df$value)), FUN = function(x) x[1] + 0.065 * (1:length(x) - 1)) 

Or my original ave :

 df$value2 <- ave(df$value, df$GROUP, FUN = function(x) {nas_to_replace <- is.na(x) & seq_along(x) > tail(which(!is.na(x)),1) replace(x, nas_to_replace, tail(x[!is.na(x)],1) + 0.065*(1:sum(nas_to_replace)))} ) 

This function is intended to replace only NA that appear after the last non-NA. Therefore, if you have a vector of type c(NA, 1, 2, NA, NA) , it will replace only the last two elements.

 head(df) # Date GROUP value value2 #1 2018-01-31 10180 3.464 3.464 #2 2018-02-28 10180 3.413 3.413 #3 2018-03-31 10180 3.418 3.418 #4 2018-04-30 10180 NA 3.483 #5 2018-05-31 10180 NA 3.548 #6 2018-06-30 10180 NA 3.613 
+1
source

Not as elegant as the data.table solution, but using dplyr and relying on this answer , you can do something like:

 library(dplyr) df %>% group_by(GROUP, tmp=cumsum(!is.na(value))) %>% mutate(value=value[1] + 0.065*(0:(length(value)-1))) %>% ungroup() %>% select(-tmp) 
0
source

Source: https://habr.com/ru/post/1276163/


All Articles