As a follow-up question to this: Remove duplicate lines with dplyr , I have the following:
How do you arbitrarily delete duplicate rows using dplyr () (among others)?
Now my command:
data.uniques <- distinct(data, KEYVARIABLE, .keep_all = TRUE)
But it returns the first occurrence of KEYVARIABLE. I want this behavior to be random: so somewhere between 1and ncases of this KEYVARIABLE.
For instance:
KEYVARIABLE BMI
1 24.2
2 25.3
2 23.2
3 18.9
4 19
4 20.1
5 23.0
My command currently returns:
KEYVARIABLE BMI
1 24.2
2 25.3
3 18.9
4 19
5 23.0
I want it to randomly return one of the duplicate rows n, for example:
KEYVARIABLE BMI
1 24.2
2 23.2
3 18.9
4 19
5 23.0
source
share