Count lines between NA

I am trying to get a specific score from a previously created result set. I need the number of lines between lines containing NA. Aggregating the values โ€‹โ€‹of these rows is not of interest, only the score.

Below is a fairly simplified example, which, I hope, better explains what I'm talking about. On the left side is the actual data, and on the right is the desired result.

  + ------ + ------- + --- + ------ + -------- +
 |  TIME |  Value |  - |  TIME |  Result |
 + ------ + ------- + --- + ------ + -------- +
 |  10 |  NA |  - |  20 |  2 |
 |  20 |  0 |  - |  60 |  3 |
 |  30 |  1 |  - |  |  |
 |  40 |  NA |  - |  |  |
 |  50 |  NA |  - |  |  |
 |  60 |  30 |  - |  |  |
 |  70 |  68 |  - |  |  |
 |  80 |  0 |  - |  |  |
 |  90 |  NA |  - |  |  |
 + ------ + ------- + --- + ------ + -------- +

Any comments are welcome. If additional input is required, leave a message.

+5
source share
4 answers

To end my answer, change the version:

d <- data.frame( TIME = seq(10, 90, by = 10), Value = c(NA, 0, 1, NA, NA, 30, 68, 0, NA)) aux <- rle(as.numeric((!is.na(d[,2])))) cbind(TIME = d[cumsum(aux$lengths)[which(aux$values == 1)] - aux$lengths[aux$values == 1] +1, 1], Result = rle(is.na(d$Value))$lengths[!rle(is.na(d$Value))$values]) TIME Result [1,] 2 20 [2,] 3 60 
+6
source

Besides rle , you can also use a combination of diff , which and is.na :

 dat <- data.frame(time = seq(10, 90, 10), value = c(NA, 0, 1, NA, NA, 30, 68, 0, NA)) res <- data.frame(result = diff(which(is.na(dat$value))) - 1) res$time <- dat$time[which(is.na(dat$value)) + 1][1:nrow(res)] res[res$result != 0, ] # time result # 20 2 # 60 3 
+5
source

My "SOfun" package has a feature called TrueSeq that looks like a group creator with logical vectors. You can use this function in combination with "data.table" to get the desired result, for example:

 library(SOfun) library(data.table) na.omit(data.table(TIME = df$TIME, Val = TrueSeq( !is.na(df$value), zero2NA = TRUE)))[, list(TIME = TIME[1], .N), by = Val] # Val TIME N # 1: 1 20 2 # 2: 2 60 3 

If you have "devtools" installed, you can install "SOfun" with:

 library(devtools) install_github("mrdwab/SOfun") 

For reference, I posted this Gist to be able to compare results with different approaches in this answer.

Summarizing:

  • If the first value in the "value" column is NA :
    • All approaches will give the same answer.
  • If the first value in the "values" column is not NA :
    • This answer and @RStudent will be the same, starting with the first non- NA value (thus the first line of input) as the first line of results.
    • @konvas answer and @beginneR will be the same, starting with the second non- NA value as the first row of results.
+5
source

This may not be the easiest way to do this, but it gives the desired result, and since I wrote it, I thought that I could also publish it (using the data from the example by @konvas):

 require(dplyr) dat %>% group_by(m = cumsum(is.na(value))) %>% summarise(n = n() -1, time = first(time[!is.na(value)])) %>% ungroup() %>% filter(n > 0 & m > 0) %>% select(-m) #Source: local data frame [2 x 2] # # n time #1 2 20 #2 3 60 

Edit: I made a small correction in response to Ananda's comment, I hope that now it works better. For example, if the data was:

 dat <- data.frame(time = seq(10, 90, 10), value = c(0, 2, 1, NA, NA, 30, 68, 0, NA)) dat # time value #1 10 0 #2 20 2 #3 30 1 #4 40 NA #5 50 NA #6 60 30 #7 70 68 #8 80 0 #9 90 NA 

As a result, the code:

 dat %>% group_by(m = cumsum(is.na(value))) %>% summarise(n = n() -1, time = first(time[!is.na(value)])) %>% ungroup() %>% filter(n > 0 & m > 0) %>% select(-m) #Source: local data frame [1 x 2] # # n time #1 3 60 
+3
source

Source: https://habr.com/ru/post/1206262/


All Articles