R: delete rows and replace values ​​using multi-column conditions

I want to filter out all the values ​​of var3 <5, keeping at least one occurrence of each value of var1.

> foo <- data.frame(var1=c(1, 1, 8, 8, 5, 5, 5), var2=c(1,2,3,2,4,6,8), var3=c(7,1,1,1,1,1,6))
> foo
  var1 var2 var3
1    1    1    7
2    1    2    1
3    8    3    1
4    8    2    1
5    5    4    1
6    5    6    1
7    5    8    6

subset(foo, (foo$var3>=5)) will delete line 2-6 and I would lose var1 == 8.

  • I want to delete a line if there is another var1 value that fulfills the condition foo $ var3> = 5. See line 5.
  • I want to save the string by setting NA for var2 and var3 if all occurrences of var1 do not satisfy the condition foo $ var3> = 5.

As a result, I expect:

  var1 var2 var3
1    1    1    7
3    8   NA   NA
7    5    8    6

This is the closest I got:

> foo$var3[ foo$var3 < 5 ] = NA
> foo$var2[ is.na(foo$var3) ] = NA
> foo
  var1 var2 var3
1    1    1    7
2    1   NA   NA
3    8   NA   NA
4    8   NA   NA
5    5   NA   NA
6    5   NA   NA
7    5    8    6

Now I just need to know how to conditionally delete the correct lines (2, 3 or 4, 5, 6): delete the line if var2 and var3 are NA, and if the value of var1 has more than one occurrence.

But, of course, a much simpler / elegant way to approach this little problem.

: foo,

+3
5

- :

> merge(foo[foo$var3>5,],unique(foo$var1),by.x=1,by.y=1,all.y=T)
  var1 var2 var3
1    1    1    7
2    5    8    6
3    8   NA   NA

unique(foo$var1) var1. , var3 . (all.x = 1, all.y = 1), , y (all.y = T). . ?merge.

, :

> merge(foo[foo$var3>5,],unique(foo$var1),by.x=1,by.y=1,
+ all.y=T)[order(unique(foo$var1)),]
  var1 var2 var3
1    1    1    7
3    8   NA   NA
2    5    8    6

merge , . order , , . . ?order.

+10

:

foo$var3[ foo$var3 < 5 ] = NA
foo$var2[ is.na(foo$var3) ] = NA

, NA, var1:

foo[!(!complete.cases(foo) & duplicated(foo$var1)), ]

, , NA, var1, .

: dataframe var1 var3, , . data.frame, , :

foo <- foo[order(foo$var2),]   # ordering on var3 should be the same
foo[!(!complete.cases(foo) & duplicated(foo$var1)), ]
+3
rbind(r <- subset(foo, (foo$var3>=5)), 
      unique(transform(subset(foo, !var1%in%r$var1), var2=NA, var3=NA)))

:

r <- subset(foo, (foo$var3>=5))

r2 <- subset(foo, !var1%in%r$var1) # extract var1 != r$var1
r3 <- transform(r2, var2=NA, var3=NA) # replace var2 and var3 with NA
r4 <- unique(r3) # remove duplicates

rbind(r, r4) # bind them
+2

plyr ddply colwise, subset. null2na:

null2na <- function(x) if ( length(x) == 0 ) NA else x

Then define the function filterthat we want to apply to each sub-data frame that has a specific meaning for var1:

filter <- function(df) cbind( data.frame( var1 = df[1,1]),
                              colwise(null2na) (subset(df, var3 >= 5)[,-1]))

Now ddplyon footo var1:

> ddply(foo, .(var1), filter)
  var1 var2 var3
1    1    1    7
2    5    8    6
3    8   NA   NA
+1
source

Try the following:

foo <- data.frame(var1= c(1, 1, 2, 3, 3, 4, 4, 5), 
     var2=c(9, 5, 13, 9, 12, 11, 13, 9), 
     var3=c(6, 8, 3, 6, 4, 7, 2, 9))
f2=foo[which(foo$var3>5),]

missing = which(!(foo$var1 %in% f2$var1))
f3 = rbind(f2, list(foo$var1[missing], rep(NA, length(missing)),rep(NA,length(missing))))
f3[order(f3$var1),]

The last line is needed only if you care about the order (provided that the data is ordered on var1 in the first place =.

0
source

Source: https://habr.com/ru/post/1785655/


All Articles