Reformatting a data frame to a long format in R

I am struggling with a change in R. I have 2 types of errors (err and rel_err) that were calculated for 3 different models. This gives me 6 variable errors (i.e. Err_1, err_2, err_3, rel_err_1, rel_err_2 and rel_err_3). For each of these types of errors, I have 3 different types of reliability tests (for example, random delays, backcast, forecast). I would like my data to be set for a long time, so I keep 4 types of test for a long time, while simultaneously making two error measurements long. Thus, in the end I will have one variable named err and one, called rel_err, as well as an id variable, for which model the error corresponds (1,2 or 3)

Here are my details right now:

iter err_1 rel_err_1 err_2 rel_err_2 err_3 rel_err_3 test_type 1 -0.09385732 -0.2235443 -0.1216982 -0.2898543 -0.1058366 -0.2520759 random 1 0.16141630 0.8575728 0.1418732 0.7537442 0.1584816 0.8419816 back 1 0.16376930 0.8700738 0.1431505 0.7605302 0.1596502 0.8481901 front 1 0.14345986 0.6765194 0.1213689 0.5723444 0.1374676 0.6482615 random 1 0.15890059 0.7435382 0.1589823 0.7439204 0.1608709 0.7527580 back 1 0.14412360 0.6743928 0.1442039 0.6747684 0.1463520 0.6848202 front 

and here is what I would like to look like this:

 iter model err rel_err test_type 1 1 -0.09385732 (#'s) random 1 2 -0.1216982 (#'s) random 1 3 -0.1216982 (#'s) random 

and more ...

I tried playing with the syntax, but I can’t figure out what to put on the time.varying argument

Thanks so much for any help you can offer.

+4
source share
2 answers

You can do it in a "tough" way. For transparency, you can use names.

 with( dat, data.frame(iter = rep(iter, 3), model = rep(1:3, each = nrow(dat)), err = c(err_1, err_2, err_3), rel_err = c(rel_err_1, rel_err_2, rel_err_3), test_type = rep(test_type, 3)) ) 

Or, for brevity, indexes.

 data.frame(iter = dat[,1], model = rep(1:3, each = nrow(dat)), err = dat[,c(2, 4, 6)], rel_err = dat[,c(3, 5, 7)], test_type = dat[,8]) ) 

If you had many columns, the hard way might include grepping column names.

This "hard" method was about as brief as reshape and required less thought about how to use the commands. Sometimes I just miss thoughts about reshape .

+5
source

The basic reshape function allows you to do this.

 reshape(DT, direction = 'long', varying = list(paste('err',1:3,sep ='_'), paste('rel_err',1:3,sep ='_')), v.names = c('err','rel_err'), timevar = 'model') iter test_type model err rel_err id 1.1 1 random 1 -0.09385732 -0.2235443 1 2.1 1 back 1 0.16141630 0.8575728 2 3.1 1 front 1 0.16376930 0.8700738 3 4.1 1 random 1 0.14345986 0.6765194 4 5.1 1 back 1 0.15890059 0.7435382 5 6.1 1 front 1 0.14412360 0.6743928 6 1.2 1 random 2 -0.12169820 -0.2898543 1 2.2 1 back 2 0.14187320 0.7537442 2 3.2 1 front 2 0.14315050 0.7605302 3 4.2 1 random 2 0.12136890 0.5723444 4 5.2 1 back 2 0.15898230 0.7439204 5 6.2 1 front 2 0.14420390 0.6747684 6 1.3 1 random 3 -0.10583660 -0.2520759 1 2.3 1 back 3 0.15848160 0.8419816 2 3.3 1 front 3 0.15965020 0.8481901 3 4.3 1 random 3 0.13746760 0.6482615 4 5.3 1 back 3 0.16087090 0.7527580 5 6.3 1 front 3 0.14635200 0.6848202 6 

I agree that the syntax for reshape hard to find around. I will tell you how this call works.

  • direction = 'long' - reformatting to long format
  • varying = list(paste('err',1:3,sep ='_'), paste('rel_err',1:3,sep ='_')) - We pass a list of length 2 because we are trying to execute stack on two different variables. Columns paste('err',1:3,sep ='_') will become the first new variable in long format and paste('rel_err',1:3,sep ='_')) will become the second new variable in long format
  • v.names = c('err','rel_err') sets the names of two new variables in a long format
  • timevar = 'model' sets the name of the time identifier (here _1 of the columns in wide format.

I hope this is somewhat clearer.

+4
source

Source: https://habr.com/ru/post/1441966/


All Articles