How to remove more than two consecutive NA in a column?

I'm new to R, In my Frame data, I have col1 ("Timestamp"), col2 ("Values"). I need to delete rows from more than two consecutive NAs in col2. My dataframe is like the one below

Timestamp  | values  
-----------|--------
2011-01-02 |  2  
2011-01-03 |  3  
2011-01-04 |  NA  
2011-01-05 |  1  
2011-01-06 |  NA  
2011-01-07 |  NA    
2011-01-08 |  8  
2011-01-09 |  6  
2011-01-10 |  NA  
2011-01-11 |  NA  
2011-01-12 |  NA  
2011-01-13 |  2  

I would like to remove more than two duplicate rows based on the second column. Expected Result -

Timestamp  | values  
-----------|--------
2011-01-02 |  2  
2011-01-03 |  3  
2011-01-04 |  NA  
2011-01-05 |  1  
2011-01-06 |  NA  
2011-01-07 |  NA    
2011-01-08 |  8  
2011-01-09 |  6 
2011-01-13 |  2  

I am looking for a solution in advance.

+4
source share
3 answers

You can use the run length encoding function rle. I assume that the data is already sorted by date.

r <- rle(is.na(df$values))                      # check runs of NA in value column
df[!rep(r$values & r$lengths > 2, r$lengths),]  # remove runs of >2 length
+7
source

, rleid from data.table. 'data.frame' 'data.table' (setDT(df1)), 'values', (.I), 2 (.N >2) (&) all "" - "NA". ($V1), .

library(data.table)
setDT(df1)[df1[, .I[!(.N >2 & all(is.na(values)))], rleid(is.na(values))]$V1]
#    Timestamp values
#1: 2011-01-02      2
#2: 2011-01-03      3
#3: 2011-01-04     NA
#4: 2011-01-05      1
#5: 2011-01-06     NA
#6: 2011-01-07     NA
#7: 2011-01-08      8
#8: 2011-01-09      6
#9: 2011-01-13      2
+1

You can use this one ruler code:

Df[!duplicated(Df$column),]
0
source

Source: https://habr.com/ru/post/1671791/


All Articles