R data.table is a subset within a group and the division of the data table into two

I have the following data table.

ts,id 1,a 2,a 3,a 4,a 5,a 6,a 7,a 1,b 2,b 3,b 4,b 

I want to multiply this data table in two. The criteria are to have approximately the first half for each group (in this case, the "id" column) in one data table, and the rest in another data table. So the expected result are two data.tables as follows

 ts,id 1,a 2,a 3,a 4,a 1,b 2,b 

and

  ts,id 5,a 6,a 7,a 3,b 4,b 

I tried the following:

 z1 = x[,.SD[.I < .N/2,],by=dev] z1 

and received only the following

 id ts a 1 a 2 a 3 

Somehow .I inside .SD does not work as it seems to me. Any help appreciated. Thanks in advance.

+4
source share
2 answers

.I gives row layout across the entire data table. Therefore, it cannot be used as inside .SD .

Sort of

 DT[, subset := seq_len(.N) > .N/2,by='id'] subset1 <- DT[(subset)][,subset:=NULL] subset2 <- DT[!(subset)][,subset:=NULL] subset1 # ts id # 1: 4 a # 2: 5 a # 3: 6 a # 4: 7 a # 5: 3 b # 6: 4 b subset2 # ts id # 1: 1 a # 2: 2 a # 3: 3 a # 4: 1 b # 5: 2 b 

Must work

For more than two groups, you can use cut to create a coefficient with the appropriate number of levels.

Sort of

  DT[, subset := cut(seq_len(.N), 3, labels= FALSE),by='id'] # you could copy to the global environment a subset for each, but this # will not be memory efficient! list2env(setattr(split(DT, DT[['subset']]),'names', paste0('s',1:3)), .GlobalEnv) 
+6
source

Here's the adjusted version of your expression:

 dt[, .SD[, .SD[.I <= .N/2]], by = id] # id ts #1: a 1 #2: a 2 #3: a 3 #4: b 1 #5: b 2 

The reason yours doesn't work is because .I and .N not available in i-expression (i.e. the first argument [ ), and therefore the parent data.table .I and .N (i.e. dt )

+2
source

Source: https://habr.com/ru/post/1497360/


All Articles