Nested groups with data.table

In the data.tablefollowing, I have information on the composition of teams participating in projects. The variable idindicates the command identifier, and the variable indicates the eventproject number. The variable freqreldescribes the composition of the commands (you can see that freqrel adds up to 1 in each command).

structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 
4L, 4L, 5L, 5L, 5L), event = c("127b", "127b", "127b", "127b", 
"127b", "127b", "127b", "127b", "127b", "125t", "125t", "125t", 
"125t", "125t", "125t"), membr = c("engineer", "mathematician", 
"physicist", "mathematician", "physicist", "surgeon", "dentist", 
"mathematician", "programmer", "physicist", "sociologist", "surgeon", 
"musician", "sociologist", "surgeon"), freqrel = c(0.4, 0.4, 
0.2, 0.166666666666667, 0.5, 0.333333333333333, 0.333333333333333, 
0.5, 0.166666666666667, 0.75, 0.125, 0.125, 0.444444444444444, 
0.444444444444444, 0.111111111111111)), .Names = c("id", "event", 
"membr", "freqrel"), row.names = c(NA, -15L), class = c("data.table", 
"data.frame"), sorted = c("id", "event"), .internal.selfref = <pointer: 0x039a24a0>)

As I see it, the data is divided into nested groups. The first separation occurs at the project level (straight line), and the second at the team level (dashed line).

    id event         membr   freqrel
 1:  1  127b      engineer 0.4000000
 2:  1  127b mathematician 0.4000000
 3:  1  127b     physicist 0.2000000
--------------------------------------
 4:  2  127b mathematician 0.1666667
 5:  2  127b     physicist 0.5000000
 6:  2  127b       surgeon 0.3333333
--------------------------------------
 7:  3  127b       dentist 0.3333333
 8:  3  127b mathematician 0.5000000
 9:  3  127b    programmer 0.1666667
_____________________________________
10:  4  125t     physicist 0.7500000
11:  4  125t   sociologist 0.1250000
12:  4  125t       surgeon 0.1250000
--------------------------------------
13:  5  125t      musician 0.4444444
14:  5  125t   sociologist 0.4444444
15:  5  125t       surgeon 0.1111111

From this initial condition, I would like to make the teams within the same project perfectly comparable by adding tags to each of them membrthat the team does not show, assigning them freqrel = 0. The result should be as follows:

    id event         membr   freqrel
 1:  1  127b       dentist 0.0000000  
 2:  1  127b      engineer 0.4000000
 3:  1  127b mathematician 0.4000000
 4:  1  127b     physicist 0.2000000
 5:  1  127b    programmer 0.0000000
 6:  1  127b       surgeon 0.0000000
--------------------------------------
 7:  2  127b       dentist 0.0000000  
 8:  2  127b      engineer 0.0000000
 9:  2  127b mathematician 0.1666667
 10: 2  127b     physicist 0.5000000
 11: 2  127b    programmer 0.0000000
 12: 2  127b       surgeon 0.3333333    
--------------------------------------
 13: 3  127b       dentist 0.3333333 
 14: 3  127b      engineer 0.0000000
 15: 3  127b mathematician 0.5000000
 16: 3  127b     physicist 0.0000000
 17: 3  127b    programmer 0.1666667
 18: 3  127b       surgeon 0.0000000   
_____________________________________
 19: 4  125t      musician 0.0000000
 20: 4  125t     physicist 0.7500000
 21: 4  125t   sociologist 0.1250000
 22: 4  125t       surgeon 0.1250000
--------------------------------------
 23: 5  125t      musician 0.4444444
 24: 5  125t     physicist 0.0000000
 25: 5  125t   sociologist 0.4444444
 26: 5  125t       surgeon 0.1111111

, by event , . , , , by, . , ?

, . .

+4
1

:

setkey(dt, id, membr)
ans <- dt[, .SD[CJ(unique(id), unique(membr))], by=list(event)]

NA 0 :

ans[is.na(freqrel), freqrel := 0.0]

: - event id, membr, - , .SD.

, event, id, membr CJ ( ). .SD. key dt id, membr upfront. , . , .

+4

Source: https://habr.com/ru/post/1526600/


All Articles