The added column in 'j' data.table should be accessible in the scope

I have this code:

dat<-dat[,list(colA,colB ,RelativeIncome=Income/.SD[Nation=="America",Income] ,RelativeIncomeLog2=log2(Income)-log2(.SD[Nation=="America",Income])) #Read 1) ,by=list(Name,Nation)] 

1) I would like to say "RelativeIncomeLog2=log2(RelativeIncome)" , but "RelativeIncome" not available in area j ?

2) I tried the following instead (in the FAQ). "RelativeIncome" is now available, but it does not add columns:

  dat<-dat[,{colA;colB;RelativeIncome=Income/.SD[Nation=="America",Income]; ,RelativeIncomeLog2=log2(RelativeIncome)])) ,by=list(Name,Nation)] 
+5
source share
1 answer

You can create and assign objects in j , just use { curly brackets } .

You can then pass these objects (or functions and object calculations) from j and assign them as columns of a data table. To assign columns more than once at a time, simply:

  • wrap LHS in c(.) make sure the column names are strings and
  • the last line j (ie the value of "return") should be a list.

  dat[ , c("NewIncomeComlumn", "AnotherNewColumn") := { RelativeIncome <- Income/.SD[Nation == "A", Income]; RelativeIncomeLog2 <- log2(RelativeIncome); ## this last line is what will be asigned. list(RelativeIncomeLog2 * 100, c("A", "hello", "World")) # assigned values are recycled as needed. # If the recycling does not match up, a warning is issued. } , by = list(Name, Nation) ] 

You can think of j as a function in the dat environment

You can also get much more complicated and complex if required. You can also include by arguments using by=list(<someName>=col)

In fact, like functions, just creating an object in j and assigning a value to it does not mean that it will be accessible outside j . For it to be assigned to your data table., You must return it. j automatically returns the last line; if this last row is a list, each list item will be treated as a column. If you assign by reference (i.e. Using := ), you will achieve the expected results.


In a separate note, I noticed the following in your code:

  Income / .SD[Nation == "America", Income] # Which instead could simply be: Income / Income[Nation == "America"] 

.SD wonderful that this is a wonderful shorthand. However, to call it without the need to use all the columns that it encapsulates, you need to burden your code with additional memory costs. If you use only one column, consider naming this column explicitly, or perhaps add the .SDcols argument (after j ) and name the columns you .SDcols .

+8
source

Source: https://habr.com/ru/post/945188/


All Articles