Run a regression on certain parts of the data frame and extract estimates + errors

I am trying to run multiple regressions in a selected part of a data frame. There are 22 columns. One of them is “DATE”, one is “INDEX” and S1, S2, S3 ... S20.

I run the regression as follows:

Regression <- lm(as.matrix(df[c('S1', 'S2', 'S3', 'S4', 'S5', 'S6', 'S7', 'S8', 'S9', 'S10', 'S11', 'S12', 'S13', 'S14', 'S15', 'S16', 'S17', 'S18', 'S19', 'S20')]) ~ df$INDEX)
Regression$coefficients

1) How can I make the code shorter? In the same way as using the interval to indicate R: count columns S1-S20 as explanatory variables and start regression on them with the dependent variable INDEX.

2) Regression formula: a + b * INDEX + error Then extract all estimates of "b" from the regression. Assume that the columns have 10 rows, so there should be 10 ratings. Also extract all errors: it should be 10 errors in each column and only 10 * 20 = 200 errors.

Since I have no experience with R, any help is appreciated! Thank!

+4
source share
2 answers

You can significantly reduce your code by using paste()instead of manually displaying all column names:

Regression <- lm(as.matrix(df[paste0("S", 1:20)]) ~ df$INDEX)

To access regression estimates, use Regression$fitted.values. Use for errors Regression$residuals.

Data usage example iris:

data(iris)
Regression <- lm(Sepal.Length + Sepal.Width ~ Petal.Length, data = iris)

head(Regression$fitted.values)
  Sepal.Length Sepal.Width
1     4.879095    3.306775
2     4.879095    3.306775
3     4.838202    3.317354
4     4.919987    3.296197
5     4.879095    3.306775
6     5.001771    3.275039

head(Regression$residuals)
  Sepal.Length Sepal.Width
1    0.2209054   0.1932249
2    0.0209054  -0.3067751
3   -0.1382024  -0.1173536
4   -0.3199868  -0.1961965
5    0.1209054   0.2932249
6    0.3982287   0.6249605
+2
source

If you have 22 columns, just use the position of the columns in the data frame. Using the same dataset as LAP, in his answer:

# load iris dataset
date(iris)
# run regression
Regression <- lm(as.matrix(iris[1:3]) ~ Petal.Width, data = iris)

This, in your case, translate to something like:

# run the regression
Regression <- lm(as.matrix(df[3:22]) ~ INDEX, data = df)

Assuming your dependent variables are in columns 3 through 22 (and the 1st column is the date, the second is the index, or something like that)

+3
source

Source: https://habr.com/ru/post/1685877/


All Articles