When combining .tables X and Y data using X [Y], X must have the key that the Y key uses to create the connection. If X is a very large table and is usually used for columns not used in the join, then the X key must be changed for the join, and then restored back to the original key after joining. Is there an effective way to make a connection without losing the original primary key on X?
I have a large environment dataset with DT time series (1M rows, 36 columns), like a data table with a key on the site and date columns. I need to do calculations on existing columns in DT and / or insert a new column based on an existing column using a small lookup or recoding table.
Here is a minimal example:
require(data.table)
To join the x2y lookup table with the main DT table, I set the DT key to "x":
setkey(DT,x)
Then the connection works as expected.
DT[x2y]
and I can use the "y" from the lookup table in the calculations or create a new column in DT.
DT[x2y, y:=y]
But now my DT time series dataset is bound to "x", and I need to return the key to the "site, date" for future use.
setkey(DT,site,date)
Is this approach (the X key, concatenation, and then the X repeated key) the fastest way to do this when the DT is very large (1M rows), or is there an equally efficient way to do this type of search without losing the original key on the large DT table?
source share