I have data.frame with multiple columns and you want to filter low-frequency data according to a combination of variables. An example is the same as with a man / woman in a changing sex and with high / low cholesterol. Then my data frame will look like this:
set.seed(123)
Sex = sample(c('Male','Female'),size = 20,replace = TRUE)
Age = sample(c('Low','High'),size = 20,replace = TRUE)
Index = 1:20
df = data.frame(index = Index,Sex=Sex,Age=Age)
df
index Sex Age
1 1 Male High
2 2 Female High
3 3 Male High
4 4 Female High
5 5 Female High
6 6 Male High
7 7 Female High
8 8 Female High
9 9 Female Low
10 10 Male Low
11 11 Female High
12 12 Male High
13 13 Female High
14 14 Female High
15 15 Male Low
16 16 Female Low
17 17 Male High
18 18 Male Low
19 19 Male Low
20 20 Female Low
Now I want to filter out the Sex / Age combination, where the frequency is above 3
table(df[,2:3])
Age
Sex High Low
Female 8 3
Male 5 4
In other words, I want to keep indexes for women of high, male and male.
Please note that 1) my data frame has several variables (not like in the example above), and 2) I use I donβt want to use the third R package and 3) I want it to be fast.
source
share