Replacing zero and NA with a recursive value

I am trying to replace NA and null values ​​with recursive ones. Im working on time-series data, where NA or zero is best replaced with a value in the previous week (every 15 minutes of measurement, so take 672 steps back). My data contains data for two years from 15 minutes, so this is a large set. Not many NA or zeros are expected, and adjacent series of zeros or NA> 672 are also not expected.

I found this chain ( recursive replacement in R ), which shows a recursive method adapted to my problem.

load[is.na(load)] <- 0 o <- rle(load) o$values[o$values == 0] <- o$values[which(o$values == 0) - 672] newload<-inverse.rle(o) 

Is this now the “best” or elegant way? And how will I protect my code from errors when a zero value occurs within the first 672 values?

I used for matlab where I would do something like:

 % Replace NaN with 0 Load(isnan(Load))=0; % Find zero values Ind=find(Load==0); for f=Ind if f>672 fprintf('Replacing index %d with the load 1 day ago\n', Ind) % Replace zero with previous week value Load(f)=Load(f-672); end end 

As I am not familiar with R, how would I set up such an if if loop?

Playable example (changing the code as an example used from another thread did not match neighboring zeros):

 day<-1:24 load<-rep(day, times=10) load[50:54]<-0 load[112:115]<-NA load[is.na(load)] <- 0 load[load==0]<-load[which(load == 0) - 24] 

Which gives the original boot frame without zero and NA. When zero exists in the first 24 values, this happens incorrectly because there is no value to replace:

 loadtest[c(10,50:54)]<-0 # instead of load[50:54]<-0 gives: Error in loadtest[which(loadtest == 0) - 24] : only 0 may be mixed with negative subscripts 

Now, to get around this, you can use the if else statement, but I don’t know how to use it. Sort of:

 day<-1:24 loadtest<-rep(day, times=10) loadtest[c(10,50:54)]<-0 loadtest[112:115]<-NA loadtest[is.na(loadtest)] <- 0 if(INDEX(loadtest[loadtest==0])<24) { # nothing / mean / standard value } else { loadtest[loadtest==0]<-loadtest[which(loadtest == 0) - 24] } 

INDEX is not a valid code ..

+4
source share
2 answers

You can use this example:

 set.seed(42) x <- sample(c(0,1,2,3,NA), 100, T) stepback <- 6 x_old <- x x_new <- x_old repeat{ filter <- x_new==0 | is.na(x_new) x_new[filter] <- c(rep(NA, stepback), head(x_new, -stepback))[filter] if(identical(x_old,x_new)) break x_old <- x_new } x x_new 

Result:

 > x [1] NA NA 1 NA 3 2 3 0 3 3 2 3 NA 1 2 NA NA 0 2 2 NA 0 NA NA 0 [26] 2 1 NA 2 NA 3 NA 1 3 0 NA 0 1 NA 3 1 2 0 NA 2 NA NA 3 NA 3 [51] 1 1 1 3 0 3 3 0 1 2 3 NA 3 2 NA 0 1 NA 3 1 0 0 1 2 0 [76] 3 0 1 2 0 2 0 1 3 3 2 1 0 0 1 3 0 1 NA NA 3 1 2 3 3 > x_new [1] NA NA 1 NA 3 2 3 NA 3 3 2 3 3 1 2 3 2 3 2 2 2 3 2 3 2 [26] 2 1 3 2 3 3 2 1 3 2 3 3 1 1 3 1 2 3 1 2 3 1 3 3 3 [51] 1 1 1 3 3 3 3 1 1 2 3 3 3 2 1 2 1 3 3 1 1 2 1 2 3 [76] 3 1 1 2 2 2 3 1 3 3 2 1 3 1 1 3 2 1 3 1 3 1 2 3 3 

Please note that some values ​​are still NA , because there is no prior information for them. If your data has sufficient preliminary information, this will not happen.

+1
source

One option would be to turn your vector into a matrix with 672 rows:

 load2 <- matrix(load, nrow=672) 

Then apply the last observation moved forward (either from the zoo, or the method described above, or ...) to each row of the matrix:

 load3 <- apply( load2, 1, locf.function ) 

Then return the resulting matrix back to the vector with the correct length:

 load4 <- t(load3)[ seq_along(load) ] 
+1
source

Source: https://habr.com/ru/post/1502621/


All Articles