R: select a range of columns in data.table

I am trying to understand the documentation of data.table, but I would like to ask for feedback where I am mistaken in my reasoning regarding the following.

(1) I would like to select a range of columns from the data.table to create a new data table.

(2) In addition, I would just like to take the first meaning of each group. Regarding the first question, I think the answer is given here , but then regarding the column numbers. But I would like to use column names, which, in my opinion, are one of the main strengths (and selling points) of the data table.

Here is an example dataset.

DT <- data.table(ID=c(101,101,101,102,103,104,104), "year.1" = c(1,5,3,2,3,4,8), "year.2" = c(4,5,6,NA,1,2,3), "year.3" = c(1,2,3,7,9,8,0), "year.4" = c(4,5,NA,1,2,6,9)) setkey(DT,ID) 

In fact, I have a lot more columns, not just for the "year".

 # ALL OF THESE DONT WORK AND END IN ERRORS # To extract a range of columns I have tried this: dt.sub <- DT[,list(year.1:year.3,ID)] dt.sub <- DT[,c("year.1":"year.3",ID), with=FALSE] # I know shouldn't work since # "with=FALSE" is only intended in combination with := according to the documentation dt.sub <- DT[,lapply(SD),.SDcols= for (i in 1:3) paste0("year.",i) ] 

For the second question: if I wanted dt to contain only the first observation of each group, I would expect that I can use the argument "mult". However, this also works in a different way than I expect. Using the example for only one column:

 dt.sub1 <- DT[,year.1, by=ID,mult="first",] 

This does not give errors, but also does not just give the first line of the group. I know a workaround is like:

 dt.sub1 <- unique(DT[,year.1, by=ID]) 

provides the expected result, but I feel like I'm missing something important with the mult option.

+4
source share
1 answer
 # (1) DT[, c(paste0('year.', 1:3), 'ID'), with = F] # (2) DT[, year.1[1], by = ID] 

mult used when combining / combining two data.tables and means what to do when there are multiple matches. Therefore, as @Arun pointed out, a way to use mult for your second question would be (given that you are already entered with the ID key):

 DT[J(unique(ID)), list(ID, year.1), mult = 'first'] 
+7
source

Source: https://habr.com/ru/post/1492149/


All Articles