How to apply spline () to a large data frame

I'm new to R and I'm trying to apply smooth.spline() to a large data frame. I looked at related threads ("Apply a list of n functions to each row of a data frame", "How to apply a spline basis matrix", ...). Here is my dataframe and what I have tried so far:

 > dim(mUnique) [1] 4565 9 > str(mUnique) 'data.frame': 4565 obs. of 9 variables: $ Group.1: Factor w/ 4565 levels "mal_mito_1","mal_mito_2",..: 1 2 3 4 5 6 7 8 9 10 ... $ h0 : num 0.18 -0.025 0.212 0.015 0.12 ... $ h6 : num -0.04 -0.305 -0.188 -0.185 -0.09 ... $ h12 : num -0.86 -1.1 -1.01 -1.04 -0.91 ... $ h18 : num -0.73 -1.215 -1.222 -0.355 -0.65 ... $ h24 : num 0.04 0.025 -0.143 0.295 0.09 ... $ h30 : num -0.14 1.275 0.732 -0.015 -0.27 ... $ h36 : num 1.44 1.795 1.627 0.385 0.91 ... $ h42 : num 1.49 1.385 1.397 0.305 1.12 ... > head(mUnique) ID h0 h6 h12 h18 h24 h30 h36 h42 1 mal_mito_1 0.1800 -0.0400 -0.8600 -0.7300 0.0400 -0.1400 1.4400 1.4900 2 mal_mito_2 -0.0250 -0.3050 -1.1050 -1.2150 0.0250 1.2750 1.7950 1.3850 3 mal_mito_3 0.2125 -0.1875 -1.0075 -1.2225 -0.1425 0.7325 1.6275 1.3975 4 mal_rna_10_rRNA 0.0150 -0.1850 -1.0450 -0.3550 0.2950 -0.0150 0.3850 0.3050 5 mal_rna_11_rRNA 0.1200 -0.0900 -0.9100 -0.6500 0.0900 -0.2700 0.9100 1.1200 6 mal_rna_14_rRNA 0.0200 -0.0200 -0.8400 -0.6600 0.1700 -0.0900 0.6200 0.0800 

I can apply smooth.spline to each line independently and looks good with spline() so far (I want 48 points. I will later figure out how to do this with smoooth.spline spar ):

 > time <- c(0,6,12,18,24,30,36,42) > plot(time, mUnique[1, 2:9]) > smooth <- smooth.spline(time, mUnique[1, 2:9]) > lines(smooth, col="blue") > splin <-spline(time, mUnique[1, 2:9], n=48) > lines(splin, col="blue") 

My question, I suppose, is basic, but how to apply smooth.spline() or spline() to the entire data frame and return the matrix 4565 * 49, where I have the coordinates for each node of the smooth spline? I don't care what this data does.

I tried:

 > smooth <- smooth.spline(time, mUnique[, 2:9]|factor(ID)) 

Now, I don’t know what to do. Is it a matter of creating loops?

Thank you in advance

+4
source share
2 answers

Is this what you are looking for?

 time <- c(0,6,12,18,24,30,36,42) t( apply(mUnique[-1],1, function(x){ tmp <- smooth.spline(time,x) predict(tmp,seq(min(time),max(time),length.out=49))$y } ) ) 

It should give you the matrix as you described.

Additional explanation:

I throw the first column ( mUnique[-1] ). This is a list of ways to do this, you can also do mUnique[,-1] , which is the matrix equivalent. Both work for data.

Then I say that we apply this function to the rows, which is the first field.

The function that I define is

 function(x){ tmp <- smooth.spline(time,x) predict(tmp,seq(min(time),max(time),length.out=49))$y } 

is two-line:

  • I calculate smooth spline
  • I calculate the predictions for a regular sequence of 49 points ( seq(min(time),max(time),length.out=49) ) and take the y values ​​of this prediction.

X in the definition of this function is the argument that is passed. In this case, it represents a single string, which is passed by the apply function.

Finally, I transfer the matrix ( t ) to get it in the format you requested.

The code works fine with the following test code:

 mUnique <- read.table(textConnection(" ID h0 h6 h12 h18 h24 h30 h36 h42 mal_mito_1 0.1800 -0.0400 -0.8600 -0.7300 0.0400 -0.1400 1.4400 1.4900 mal_mito_2 -0.0250 -0.3050 -1.1050 -1.2150 0.0250 1.2750 1.7950 1.3850 mal_mito_3 0.2125 -0.1875 -1.0075 -1.2225 -0.1425 0.7325 1.6275 1.3975 mal_rna_10_rRNA 0.0150 -0.1850 -1.0450 -0.3550 0.2950 -0.0150 0.3850 0.3050 mal_rna_11_rRNA 0.1200 -0.0900 -0.9100 -0.6500 0.0900 -0.2700 0.9100 1.1200 mal_rna_14_rRNA 0.0200 -0.0200 -0.8400 -0.6600 0.1700 -0.0900 0.6200 0.0800 ") ,header=T) time <- c(0,6,12,18,24,30,36,42) 

Before running my code, make sure you define time ...

+4
source

Using your piece of data in a dat object, we can do what (I think) you need. First, we write a small wrapper function, which is suitable for a smoothing spline via smooth.spline() , and then predicts the response from this spline for a set of locations n . You are requesting n = 48 , so we will use this as the default value.

Here is one such wrapper function:

 SSpline <- function(x, y, n = 48, ...) { ## fit the spline to x, and y mod <- smooth.spline(x, y, ...) ## predict from mod for n points over range of x pred.dat <- seq(from = min(x), to = max(x), length.out = n) ## predict preds <- predict(mod, x = pred.dat) ## return preds } 

We check this for the first line of your data:

 > res <- SSpline(time, dat[1, 2:9]) > res $x [1] 0.000000 0.893617 1.787234 2.680851 3.574468 4.468085 5.361702 [8] 6.255319 7.148936 8.042553 8.936170 9.829787 10.723404 11.617021 [15] 12.510638 13.404255 14.297872 15.191489 16.085106 16.978723 17.872340 [22] 18.765957 19.659574 20.553191 21.446809 22.340426 23.234043 24.127660 [29] 25.021277 25.914894 26.808511 27.702128 28.595745 29.489362 30.382979 [36] 31.276596 32.170213 33.063830 33.957447 34.851064 35.744681 36.638298 [43] 37.531915 38.425532 39.319149 40.212766 41.106383 42.000000 $y [1] 0.052349585 0.001126837 -0.049851737 -0.100341294 -0.150096991 [6] -0.198873984 -0.246427429 -0.292510695 -0.336721159 -0.378381377 [11] -0.416785932 -0.451229405 -0.481006377 -0.505411429 -0.523759816 [16] -0.535714043 -0.541224748 -0.540251293 -0.532753040 -0.518689349 [21] -0.498019582 -0.470750611 -0.437182514 -0.397727107 -0.352796426 [26] -0.302802508 -0.248157388 -0.189272880 -0.126447574 -0.059682959 [31] 0.011067616 0.085850805 0.164713260 0.247701633 0.334851537 [36] 0.425833795 0.519879613 0.616194020 0.713982047 0.812448724 [41] 0.910799082 1.008296769 1.104781306 1.200419068 1.295380186 [46] 1.389834788 1.483953003 1.577904960 > plot(time, dat[1, 2:9]) > lines(res, col = "blue") 

which gives:

plot of fitted spline

This seems to work, so now we can apply the data set function, save only the $y component of the object returned by SSpline() . To do this, we use apply() :

 > res2 <- apply(dat[, 2:9], 1, + function(y, x, ...) { SSpline(x, y, ...)$y }, + x = time) > head(res2) 1 2 3 4 5 6 [1,] 0.052349585 -0.02500000 0.21250000 -0.06117869 -0.02153366 -0.02295792 [2,] 0.001126837 -0.04293509 0.17175460 -0.10994988 -0.06538250 -0.06191095 [3,] -0.049851737 -0.06407856 0.12846458 -0.15838412 -0.10899505 -0.10074427 [4,] -0.100341294 -0.09168227 0.08005550 -0.20614476 -0.15213426 -0.13933920 [5,] -0.150096991 -0.12899810 0.02395291 -0.25289514 -0.19456304 -0.17757705 [6,] -0.198873984 -0.17927793 -0.04241763 -0.29829862 -0.23604434 -0.21533911 

Now res2 contains 48 rows and 6 columns, 6 columns refer to each dat row used here. If you want it the other way around, just rearrange res2 : t(res2) .

We can see what was done with a simple call to matplot() :

 > matplot(x = seq(min(time), max(time), length = 48), + y = res2, type = "l") 

which produces:

fitted splines

+4
source

Source: https://habr.com/ru/post/1345687/


All Articles