The problem with real life: I have items with MRI scan data. Some of them were scanned several times (single lines). Some of them were checked each time using different protocols. I want to save all unique lines by the identifier of the object, and if the object was scanned under two different protocols, I want it to prefer one over the other.
Toy example:
library(dplyr)
df <- tibble(
id = c("A", "A", "B", "C", "C", "D"),
protocol = c("X", "Y", "X", "X", "X", "Y"),
date = c(seq(as.Date("2018-01-01"), as.Date("2018-01-06"),
by="days")),
var = 1:6)
I want to return a data frame with all unique objects by id. When it comes to double value, instead of automatically saving the first record, I want it to save the record with "Y" as the protocol, if it has this choice, but otherwise don't get rid of the lines with "X".
2, 3, 4 6.
dplyr, .
, , :
df %>% distinct(id, .keep_all = TRUE) #Nope!
df %>% distinct(id, protocol == "Y", .keep_all = TRUE) #Nope!
df$protocol <- factor(df$protocol, levels = c("Y", "X"))
df %>% distinct(id, .keep_all = TRUE) #Nope!
df %>% group_by(id) %>% filter(protocol == "Y") #Nope!
:
@RobJensen
df %>% arrange(id, desc(protocol == 'Y')) %>% distinct(id, .keep_all = TRUE)
, , , , @joran
df %>% group_by(id) %>% arrange(desc(protocol),var) %>% slice(1)
!