Substitution row and column at the same time

I am a little surprised at how data.table works:

 > library(data.table) data.table 1.8.2 For help type: help("data.table") > dt <- data.table(a=11:20, b=21:30, c=31:40, key="a") > dt[list(12)] abc 1: 12 22 32 > dt[list(12), b] ab 1: 12 22 > dt[list(12)][,b] [1] 22 

What I'm trying to do is get the value of one column (or expression) in the rows corresponding to the selected one. I see that I need to pass the key as a list, since the raw number will indicate the line number, not the key value. So, the first of the above is clear to me. But why the second and next expression of the subset give different results seems to me rather confusing. I would like to get the third result, but you could write it in the second way.

Is there a good reason why a subset of data.table for rows and columns at the same time will always include the key value as well as the calculated result? Is there a syntactically shorter way to get a single result, with the exception of a double subset, as mentioned above?

I am using data.table 1.8.2 on R 2.15.1. If you cannot reproduce my example, you can also consider the factor as a key:

 dt <- data.table(a=paste("a", 11:20, sep=""), b=21:30, c=31:40, key="a") dt["a11", b] 
+6
source share
2 answers

Regarding this question:

Is there a good reason why a subset of the data table for rows and columns at the same time will always include the key value as well as the calculated result?

I believe that the reason (good enough for me) is that Matthew Dole has not yet managed to add this option (probably because he gave priority to working on much more useful functions, such as ": = with") ,

In the comments following my answer here , Matthew seemed to indicate that he was on his TODO list, noting that β€œ[this] is what drop=TRUE will do (with a speed advantage) when drop added” .

Until then, any of the following actions will be performed:

 dt[list(12)][,b] # [1] 22 dt[list(12)][[2]] # [1] 22 dt[dt[list(12), which=TRUE], b] # [1] 22 
+7
source

One possibility is to use:

 dt[a == 12] 

and

 dt[a == 12, b] 

This will work as expected, but it prevents binary searches and requires a sequential search (is there a plan to change this behavior?), Which makes it potentially slower.


UPDATE September 2014: now in version 1.9.3

From NEWS:

DT[column==values] now optimized for using the DT key(DT)[1]=="column" when key(DT)[1]=="column" , otherwise an additional key is added (index aka), so the next DT[column==values] is much faster. DT[column %in% values] equivalent; those. both == and %in% take vector values. No code changes are required; existing code should automatically be beneficial. Secondary keys can be added manually using set2key() and existence checked using key2() . These optimizations and function names / arguments are experimental and can be disabled using options(datatable.auto.index=FALSE) .

+5
source

Source: https://habr.com/ru/post/922044/


All Articles