Working with rich objects in data.table columns

Let's say I have a data table in which one column contains linear models:

library(data.table)
set.seed(1014)

dt <- data.table(
  g = c(1, 1, 2, 2, 3, 3, 3),
  x = runif(7),
  y = runif(7)
)

models <- dt[, list(mod = list(lm(y ~ x, data = .SD))), by = g]

Now I want to extract the r-squared value from each model. Can I do better than this?

models[, list(rsq = summary(mod[[1]])$r.squared), by = g]

##    g      rsq
## 1: 1 1.000000
## 2: 2 1.000000
## 3: 3 0.004452

Ideally, I would like to remove [[1]]and not rely on knowing the previous grouping variable (I know that I want each line to be its own group).

+4
source share
4 answers

It is just summarybeing a bad function, not vectorized. So, how about manually vectorizing it (it's about the same as @mnel's solution):

r.squared = Vectorize(function(x) summary(x)$r.squared)

models[, rsq := r.squared(mod)]
models
#   g  mod         rsq
#1: 1 <lm> 1.000000000
#2: 2 <lm> 1.000000000
#3: 3 <lm> 0.004451631
+3
source

rapply, classes='lm', . sapply, ( )

library(data.table)
set.seed(1014)

dt <- data.table(
  g = c(1, 1, 2, 2, 3, 3, 3),
  x = runif(7),
  y = runif(7)
)

models <- dt[, list(mod = list(lm(y ~ x, data = .SD))), by = g]
models[, rsq := sapply(mod, function(x) summary(x)$r.squared)]

models
#     g  mod         rsq
#  1: 1 <lm> 1.000000000
#  2: 2 <lm> 1.000000000
#  3: 3 <lm> 0.004451631

" " data.table - , .SD .

. lm data.table ? , . # 2590.

+2

?

library(data.table)
set.seed(1014)

dt <- data.table(
  g = c(1, 1, 2, 2, 3, 3, 3),
  x = runif(7),
  y = runif(7)
)
models <- dt[, list(rsq = summary(lm(y ~ x))$r.squared), by = g]
#   g         rsq
#1: 1 1.000000000
#2: 2 1.000000000
#3: 3 0.004451631
+1

, , .

require(purrr)
require(broom)
map_df(models$mod, glance)
0

Source: https://habr.com/ru/post/1535828/


All Articles