I have been working with data.table in R recently, and it is quite popular and efficient. I am currently facing a problem that I think can be solved using data.table.
I have a dataset as follows:
event | group_ind
1 | group1
1 | group1
1 | group1
2 | group1
2 | group1
1 | group2
1 | group2
2 | group2
2 | group3
2 | group3
Now I want to know that the percentage of event 1 occurs in each group. The result for this dataset is obvious: 60% for event 1 in group 1, 67% in group 2 and 0 in group 3. Actually, the data set has much more observations with more than two types of events, and the rows are not sorted in a specific order. I can get what I want in a very fictitious way in R (by counting the occurrence in the column of the event divided by the general observations in each group), but I think there should be a more convenient way to do this.
So the result that I want will be like this:
event | group_ind | percentage
1 | group1 | 0.6
2 | group1 | 0.4
1 | group2 | 0.67
2 | group2 | 0.33
1 | group3 | 0
2 | group3 | 100
Hope this can be done in data.table. Many thanks for the help.
Lambo source
share