Why is the sum of the area under the density curve always greater than 1 (R)?

I found codes for calculating the sum of the area under the density curve in R. Unfortunately, I do not understand why there is always an additional "0.000976" in the area ...

nb.data = 500000
y = rnorm(nb.data,10,2)

de = density(y)

require(zoo)
sum(diff(de$x[order(de$x)])*rollmean(de$y[order(de$x)],2))

[1] 1.000976

Why is this so?

It should be equal to 1, right?

+4
source share
2 answers

- . , density, (.. trapzoidal rule), , , , . , :


Trapezoidal rule illustration

Intégration_num_trapèzes.svg: Scalerderivative work: Cdang (talk) - Intégration_num_trapeszes.svg, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=8541370


(.. ), . , (.. N) . , (, ).

, , , density . ( .)

, , - , 1, integrate :

> integrate(dnorm, lower=-Inf, upper=Inf, mean=10, sd=2)
1 with absolute error < 4.9e-06
+7

. n ( - 512)

set.seed(42)
de = density(rnorm(500000, 10, 2))
sum(diff(sort(de$x)) * 0.5 * (de$y[-1] + head(de$y, -1)))
#[1] 1.00098

set.seed(42)
de = density(rnorm(500000, 10, 2), n = 1000)
sum(diff(sort(de$x)) * 0.5 * (de$y[-1] + head(de$y, -1)))
#[1] 1.000491

set.seed(42)
de = density(rnorm(500000, 10, 2), n = 10000)
sum(diff(sort(de$x)) * 0.5 * (de$y[-1] + head(de$y, -1)))
#[1] 1.000031

set.seed(42)
de = density(rnorm(500000, 10, 2), n = 100000)
sum(diff(sort(de$x)) * 0.5 * (de$y[-1] + head(de$y, -1)))
#[1] 1.000004

set.seed(42)
de = density(rnorm(500000, 10, 2), n = 1000000)
sum(diff(sort(de$x)) * 0.5 * (de$y[-1] + head(de$y, -1)))
#[1] 1
+8

Source: https://habr.com/ru/post/1683778/


All Articles