Use data.table to calculate percentage of occurrence based on category in another column

I have been working with data.table in R recently, and it is quite popular and efficient. I am currently facing a problem that I think can be solved using data.table.

I have a dataset as follows:

event | group_ind 
  1   | group1
  1   | group1
  1   | group1
  2   | group1
  2   | group1
  1   | group2
  1   | group2
  2   | group2
  2   | group3
  2   | group3

Now I want to know that the percentage of event 1 occurs in each group. The result for this dataset is obvious: 60% for event 1 in group 1, 67% in group 2 and 0 in group 3. Actually, the data set has much more observations with more than two types of events, and the rows are not sorted in a specific order. I can get what I want in a very fictitious way in R (by counting the occurrence in the column of the event divided by the general observations in each group), but I think there should be a more convenient way to do this.

So the result that I want will be like this:

 event | group_ind | percentage
   1   | group1    | 0.6
   2   | group1    | 0.4
   1   | group2    | 0.67
   2   | group2    | 0.33
   1   | group3    | 0
   2   | group3    | 100

Hope this can be done in data.table. Many thanks for the help.

+4
source share
1 answer

A simple solution would be simple

setDT(DT)[, .(event = 1:2, percentage = tabulate(event)/.N), by = group_ind]
#    group_ind event percentage
# 1:    group1     1  0.6000000
# 2:    group1     2  0.4000000
# 3:    group2     1  0.6666667
# 4:    group2     2  0.3333333
# 5:    group3     1  0.0000000
# 6:    group3     2  1.0000000

unique event ( - @EdM).

setDT(DT)[order(event), .(event = unique(event), percentage = tabulate(event)/.N), by = group_ind]
+4

Source: https://habr.com/ru/post/1617657/


All Articles