First of all, please let me know if I am using dplyr badly because I am not sure if I will be the best. I have the following data framework:
mydf = data.frame(user = c(7,7,7,7,7,7,7,8,8,8,8,8,8),
col1 = c('0','0','1','1','0','3','NULL','3','3','0','1','0','0'),
col2 = runif(n=13),
col3 = letters[1:13],
stringsAsFactors = FALSE)
> mydf
user col1 col2 col3
1 7 0 0.7607907 a
2 7 0 0.1580448 b
3 7 1 0.8063540 c
4 7 1 0.7331512 d
5 7 0 0.2433631 e
6 7 3 0.2357065 f
7 7 NULL 0.4864172 g
8 8 3 0.6806089 h
9 8 3 0.2229874 i
10 8 0 0.6187911 j
11 8 1 0.7617177 k
12 8 0 0.5884821 l
13 8 0 0.4985750 m
The filtering I would like to do is a bit verbose, but I will try - I would like to filter out the dataframe by deleting all rows where col1 == '0' if this row occurs AFTER the first row for this user, where col1 == '1 ' . (bold means I messed up the original question and switched 0 and 1).
, 7, col1 == '1', 3, col1 == '0' ( 5). 8 11- , col1 == '1', 12 13, col1 == '0'.
:
> mydf
user col1 col2 col3
1 7 0 0.7607907 a
2 7 0 0.1580448 b
3 7 1 0.8063540 c
4 7 1 0.7331512 d
6 7 3 0.2357065 f
7 7 NULL 0.4864172 g
8 8 3 0.6806089 h
9 8 3 0.2229874 i
10 8 0 0.6187911 j
11 8 1 0.7617177 k
, . rownums, , , , . , - :
mydf %>%
mutate(rownums = 1:nrow(mydf)) %>%
group_by(user) %>%
filter(!(col1 == "0" & rownums > min(which(col1 == "1"))))
user col1 col2 col3 rownums
<dbl> <chr> <dbl> <chr> <int>
1 7 0 0.2088034 a 1
2 7 0 0.2081894 b 2
3 7 1 0.1825428 c 3
4 7 1 0.2143353 d 4
5 7 3 0.1979774 f 6
6 7 NULL 0.2990799 g 7
7 8 3 0.7808038 h 8
8 8 3 0.1694272 i 9
9 8 1 0.1526450 k 11
, 10 .
!
EDIT - , group_by()% > % filter() - R dplyr. 99% group_by() summary(), , , .
EDIT2 - , !
mydf %>%
group_by(col0) %>%
mutate(rownums = 1:length(col0)) %>%
filter(!(col1 == "0" & rownums > min(which(col1 == "1"))))
mutate() group_by() mutate(), , . .