How to determine the NA series indices in a vector

Assuming we have a vector of values ​​with missing values, such as:

test <- c(3,6,NA,7,8,NA,NA,5,8,6,NA,4,3,NA,NA,NA)

The goal is to identify a series of NAs that are 2 or less in length in order to apply linear interpolation to a series that have non-NA values ​​at their ends. I was able to detect the end index of such series using this code:

which.na <- which(is.na(test))

diff.which.na <- diff(which.na)

which.diff.which.na <- which(diff.which.na>1)

end.index <- which.na[which.diff.which.na]

result:

> end.index
[1]  3  7 11

the latest NA series can be processed with a conditional statement. However, I cannot find the start index of the NA series, because I cannot do the following:

diff.which.na <- diff(which.na,lag=-1)

Thus, the expected result:

beg.index= c(3,6,11)

and

end.index=c(3,7,11)

Any ideas?

thanks

+4
source share
1 answer

You can try with rle:

seq_na <- rle(is.na(test))
seq_na
#Run Length Encoding
#  lengths: int [1:8] 2 1 2 2 3 1 2 3
#  values : logi [1:8] FALSE TRUE FALSE TRUE FALSE TRUE ...

Look for a sequence TRUEwith a length of at least 2:

seq_na$lengths[seq_na$values]
# [1] 1 2 1 3 # there are 2 of them

, cumsum ( @Frank !):

end.index <- with(seq_na, cumsum(lengths)[lengths <= 2 & values])
#[1]  3  7 11

beg.index <- end.index - with(seq_na, +(lengths==2 & values)[lengths <= 2 & values])
#[1]  3  6 11
+9

Source: https://habr.com/ru/post/1620143/


All Articles