Calculate the area under the curve

I would like to calculate the area under the curve for integration without defining a function such as integrate() .

My data is as follows:

 Date Strike Volatility 2003-01-01 20 0.2 2003-01-01 30 0.3 2003-01-01 40 0.4 etc. 

I built plot(strike, volatility) to look at the volatility of a smile. Is there any way to integrate this constructed โ€œcurveโ€?

+44
r numerical-integration
Feb 10 '11 at 7:35
source share
7 answers

The AUC is quite easily approximated by looking at the set of trapezoid figures, each time connected between x_i , x_{i+1} , y{i+1} and y_i . Using the rollmean of the zoo package, you can do:

 library(zoo) x <- 1:10 y <- 3*x+25 id <- order(x) AUC <- sum(diff(x[id])*rollmean(y[id],2)) 

Make sure you order x values, or your result does not make sense. If you have negative values โ€‹โ€‹somewhere along the y axis, you need to figure out exactly how you want to determine the area under the curve and adjust accordingly (for example, using abs() )

As for your follow-up: if you don't have an official function, how would you build it? Therefore, if you have only values, the only thing you can approximate is a certain integral. Even if you have a function in R, you can only calculate certain integrals with integrate() . The construction of a formal function is possible only if you can also define it.

+31
Feb 10 '11 at 9:07
source share

Just add the following to your program and you will get the area under the curve:

 require(pracma) AUC = trapz(strike,volatility) 

From ?trapz :

This approach corresponds exactly to the approximation for integration using the trapezoidal rule with base points x.

+27
Mar 01 2018-12-12T00:
source share

Three more options, including one using the spline method and one using the Simpson rule ...

 # get data n <- 100 mean <- 50 sd <- 50 x <- seq(20, 80, length=n) y <- dnorm(x, mean, sd) *100 # using sintegral in Bolstad2 require(Bolstad2) sintegral(x,y)$int # using auc in MESS require(MESS) auc(x,y, type = 'spline') # using integrate.xy in sfsmisc require(sfsmisc) integrate.xy(x,y) 

The trapezoidal method is less accurate than the spline method, so MESS::auc (uses the spline method) or Bolstad2::sintegral (uses the Simpson rule) is likely to be preferred. DIY versions of these (and an additional approach using the quadrature rule) are here: http://www.r-bloggers.com/one-dimensional-integrals/

+17
Jan 29 '13 at 21:34
source share

OK, so I'm a little late to the party, but after going through the answers, there is no simple solution to the R problem. Here, simple and clean:

 sum(diff(x) * (head(y,-1)+tail(y,-1)))/2 

The solution for the OP is then read as:

 sum(diff(strike) * (head(volatility,-1)+tail(volatility,-1)))/2 

This effectively calculates the area using the trapezoidal method, taking the average value of the "left" and "right" values โ€‹โ€‹of y.

NB: since @Joris already indicated that you can use abs(y) if that makes sense.

+8
May 16 '15 at 21:19
source share

In the world of pharmacokinetics (PC), the calculation of various types of AUC is a common and fundamental task. Many different AUC calculations for pharmacokitics such as

  • AUC0-t = AUC from zero to time t
  • AUC0-last = AUC from zero to the last point in time (may be the same as above)
  • AUC0-inf = AUC from zero to infinity
  • AUCint = AUC for a period of time
  • AUCall = AUC for the entire time period for which data exists

One of the best packages that performs this calculation is the relatively new PKNCA package from people in Pfizer. Check this.

+2
May 03 '16 at 20:42
source share

Joris Meys answer was wonderful, but I struggled to remove NA from my samples. Here is a small function that I wrote to deal with them:

 library(zoo) #for the rollmean function ###### #' Calculate the Area Under Curve of y~x #' #'@param y Your y values (measures ?) #'@param x Your x values (time ?) #'@param start : The first x value #'@param stop : The last x value #'@param na.stop : returns NA if one value is NA #'@param ex.na.stop : returns NA if the first or the last value is NA #' #'@examples #'myX = 1:5 #'myY = c(17, 25, NA, 35, 56) #'auc(myY, myX) #'auc(myY, myX, na.stop=TRUE) #'myY = c(17, 25, 28, 35, NA) #'auc(myY, myX, ex.na.stop=FALSE) auc = function(y, x, start=first(x), stop=last(x), na.stop=FALSE, ex.na.stop=TRUE){ if(all(is.na(y))) return(NA) bounds = which(x==start):which(x==stop) x=x[bounds] y=y[bounds] r = which(is.na(y)) if(length(r)>0){ if(na.stop==TRUE) return(NA) if(ex.na.stop==TRUE & (is.na(first(y)) | is.na(last(y)))) return(NA) if(is.na(last(y))) warning("Last value is NA, so this AUC is bad and you should feel bad", call. = FALSE) if(is.na(first(y))) warning("First value is NA, so this AUC is bad and you should feel bad", call. = FALSE) x = x[-r] y = y[-r] } sum(diff(x[order(x)])*rollmean(y[order(x)],2)) } 

Then I use it with a binding to my data framework: myDF$auc = apply(myDF, MARGIN=1, FUN=auc, x=c(0,5,10,15,20))

Hope it can help noobs like me :-)

EDIT: restrictions added

0
Jun 15 '17 at 9:53 on
source share

You can use the ROCR package where the following lines will give you AUC:

 pred <- prediction(classifier.labels, actual.labs) attributes(performance(pred, 'auc'))$y.values[[1]] 
-one
Oct 26 '12 at 8:26
source share



All Articles