Use row to select column in row in dplyr (or R base)

I have a column populated with other column names. I want to get a value in each of the column names.

# three columns with values and one "key" column library(dplyr) data = data.frame( x = runif(10), y = runif(10), z = runif(10), key = sample(c('x', 'y', 'z'), 10, replace=TRUE) ) # now get the value named in 'key' data = data %>% mutate(value = VALUE_AT_COLUMN(key)) 

I'm sure the answer has something to do with the lazy version of eval mutate, but I can't make life understand me.

Any help would be appreciated.

+5
source share
4 answers

Here's the basic solution of R:

 data$value = diag(as.matrix(data[,data$key])) 
+5
source

We can try data.table . Convert 'data.frame' to 'data.table' ( setDT(data) ), grouped by sequence of rows, we use .SD for a subset of the columns specified by the key.

  library(data.table) setDT(data)[, .SD[, key[[1L]], with=FALSE] ,1:nrow(data)] 

Or another get option after converting the key class to character (as a factor ) after grouping by a sequence of lines, as in the previous case.

  setDT(data)[, get(as.character(key)), 1:nrow(data)] 

Here is one option: do

  library(dplyr) data %>% group_by(rn = row_number()) %>% do(data.frame(., value= .[[.$key]])) 
+6
source

For an efficient and quick memory solution, you must update the original data table by doing the following connection:

 data[.(key2 = unique(key)), val := get(key2), on=c(key="key2"), by=.EACHI][] 

For each key2 matching rows in data$key are calculated. These rows are updated with the values ​​from the column contained in key2 . For example, key2="x" corresponds to lines 1,2,6,8,10 . The corresponding data$x values ​​are equal to data$x[c(1,2,6,8,10)] . by=.EACHI ensures that get(key2) executed for each key2 value.

Since this operation is performed only on unique values, it should be much faster than starting on a line. And since the data table is updated by reference, it should be efficient enough for memory (and this also contributes to speed).

+5
source

It seems like there should be a basic R-solution for this, but the best thing I could do with tidyr is to first convert the data to a wide form, and then filter out only the observations that match the desired key.

 data %>% add_rownames("index") %>% gather(var, value, -index, -key) %>% filter(key == var) 

A basic R solution that almost works:

 data[cbind(seq_along(data$key), data$key)] 

In the above data, it works, but since it uses a matrix, it has two serious problems. One of them is that the order of the factor matters, because it simply supplants it and selects the columns by the power level, and not by the column name. Another is that the resulting output is character , not numeric , because when converting to a matrix, the character type is selected because of the key column. The key problem is the lack of an analogue of data.frame with respect to matrix behavior

When indexing arrays using "[", one argument "i" can be a matrix with as many columns as dimensions "x"; the result is a vector with elements corresponding to the sets of indices in each row "i".

Given these problems, I will probably go with the tidyr solution, since the fact that the columns can be selected by choice means that they probably represent different observations for the same observed element.

+4
source

Source: https://habr.com/ru/post/1241722/


All Articles