Can't use a comma inside the callback column name for data.table setkey?

If I have a column name that requires backreferences because it contains a comma, setkey throws an error message that does not use a comma. The error directs me to ?setkey , but I don't see anything in the examples where this is mentioned. The only workaround that I can find is to rename the setkey column and then rename it.

Code example:

 library(data.table) > DT = data.table(`X, in $` = rnorm(10)) > DT X, in $ 1: -1.28475886 2: 0.97789059 3: -0.05023914 4: -0.38133978 5: -0.24949607 6: 0.99213156 7: -0.29310512 8: 0.02840372 9: 0.25294231 10: -0.88955013 > setkey(DT, `X, in $`) Error in setkeyv(x, cols, verbose = verbose) : Don't use comma inside quotes. Please see the examples in help('setkey') 

Edit: shows a more likely example

For me, the main reason you came across this is to use reshape2 dcast to get the column values โ€‹โ€‹of characters (which will be from an external source, such as a database), and make them column names.

Until you need to "join" the key behavior and just need to sort, then you can get around this by copying the table or using data.frame . For instance:

 library(ggplot2) library(reshape2) DT = data.table(Office = rep(c("Cambridge, UK", "Cambridge, US", "London", "New York"), each = 12), Product = rep(1:12,4), Sales = rnorm(48)^2) DF = dcast(DT, Product~Office) DT = data.table(DF) setkey(DT, 'Cambridge, UK') DT = DT[order(DF$`Cambridge, UK`),] DT 

gives:

 > library(ggplot2) > library(reshape2) > > DT = data.table(Office = rep(c("Cambridge, UK", "Cambridge, US", "London", "New York"), each = 12), Product = rep(1:12,4), Sales = rnorm(48)^2) > DF = dcast(DT, Product~Office) Using Sales as value column: use value.var to override. > DT = data.table(DF) > setkey(DT, 'Cambridge, UK') Error in setkeyv(x, cols, verbose = verbose) : Don't use comma inside quotes. Please see the examples in help('setkey') > DT = DT[order(DF$`Cambridge, UK`),] > DT Product Cambridge, UK Cambridge, US London New York 1: 12 0.0009257347 1.7183751269 0.818101229 0.002499808 2: 1 0.0010855828 0.0889560105 0.083778108 1.451149328 3: 2 0.0139649148 0.7385617360 0.221688602 4.771307440 4: 5 0.0520875574 0.3389613574 0.934932759 0.127634044 5: 10 0.0837778446 0.0598955035 0.015930174 0.715849795 6: 9 0.0856246191 1.1303900183 1.555058058 0.367063297 7: 6 0.1608235273 0.7147643550 0.004588596 2.995598768 8: 8 0.4797866129 0.1783997616 0.016459971 0.497328990 9: 4 0.5282546636 1.7011670679 0.016126768 0.024388172 10: 7 0.5655147714 0.1106522938 0.045130643 0.442473457 11: 3 0.8315246051 0.1399159784 5.792956446 1.632060601 12: 11 3.9958208033 0.0005297928 0.003282897 1.635506818 
+6
source share
1 answer

UPDATE (eddi): Compared to version 1.8.11, this error has been fixed, and arbitrary column names will work with setkey .


I found a hack: (1) sorting and (2) settattr .

Example:

 mydt <- data.table(`b,ah`=c(2L,3:1),var=letters[1:4]) mydt <- mydt[order(`b,ah`)] setattr(mydt,'sorted','b,ah') 

Now, to make sure he behaves well ...

 key(mydt) # [1] "b,ah" mydt[.(2)] # b,ah var # 1: 2 a # 2: 2 c mydt[,.N,by=`b,ah`] # b,ah N # 1: 1 1 # 2: 2 2 # 3: 3 1 

Comments I did not use the OP example because setting large floating point columns as a number of keys is weird (for me).

Who knows what negative side effects can have? In any case, I would not use it, and I agree that it would be nice to have a comma. Maybe maybe setkeyn to set by column number if it makes too much mess in setkey / setkeyv ?

+4
source

Source: https://habr.com/ru/post/955174/


All Articles