Plotting a smooth line across all data points, perhaps polynomial interpolation?

I am trying to build a smooth line that runs right through all my data points and has a gradient based on another variable. Theoretically, polynomial interpolation would do the job, but I'm not sure how to do this with ggplot. This is what I came up with:

DATA:

dayofweek hour impressions conversions cvr 1 0 3997982 352.0 8.80e-05 1 1 3182678 321.2 1.01e-04 1 2 2921004 248.6 8.51e-05 1 3 1708627 115.6 6.77e-05 1 4 1225059 98.4 8.03e-05 1 5 1211708 62.0 5.12e-05 1 6 1653280 150.0 9.07e-05 1 7 2511577 309.4 1.23e-04 1 8 3801969 397.8 1.05e-04 1 9 5144399 573.0 1.11e-04 1 10 5770269 675.6 1.17e-04 1 11 6936943 869.8 1.25e-04 1 12 7953053 996.4 1.25e-04 1 13 8711737 1117.8 1.28e-04 1 14 9114872 1217.4 1.34e-04 1 15 9257161 1155.2 1.25e-04 1 16 8437068 1082.0 1.28e-04 1 17 8688057 1047.2 1.21e-04 1 18 9200450 1114.0 1.21e-04 1 19 8494295 1086.8 1.28e-04 1 20 9409142 1092.6 1.16e-04 1 21 10500000 1266.8 1.21e-04 1 22 9783073 1196.4 1.22e-04 1 23 8225267 812.0 9.87e-05 

R CODE:

 ggplot(d) + geom_line(aes(y=impressions, x=hour, color=cvr)) + stat_smooth(aes(y=impressions, x=hour), method = lm, formula = y ~ poly(x, 10), se = FALSE) 

That way, I can get the gradient that I want to use geom_line, but not smooth. With stat_smooth, I get a smooth line, but it does not go through all the data points and does not have the desired gradient. Any ideas on how to do this?

enter image description here

+5
source share
1 answer

Polynomial interpolation in the sense that you use it is probably not the best idea if you want it to go through all your points. You have 24 points for which a polynomial of order 23 is needed if it needs to go through all the points. I cannot use poly with a power of 23, but using a lower power is already enough to show you why this will not work:

 ggplot(d) + geom_point(aes(x = hour, y = impressions, colour = cvr), size = 3) + stat_smooth(aes(x = hour, y = impressions), method = "lm", formula = y ~ poly(x, 21), se = FALSE) + coord_cartesian(ylim = c(0, 1.5e7)) 

enter image description here

This more or less goes through all the points (and indeed, if I could manage to use a polynomial with a higher order), but otherwise it is probably not the smooth curve you want. The best option is to use interpolation with splines . It is also an interpolation that uses polynomials, but instead of using only one (as you tried), it uses a lot. They forcefully pass through all data points so that your curve is continuous.

As far as I know (and maybe I'm wrong) this cannot be done directly with ggplot, so I will show you a solution where spline interpolation is created in a separate step:

 spline_int <- as.data.frame(spline(d$hour, d$impressions)) 

You need as.data.frame because spline returns a list. Now you can use this new data on the chart using geom_line() :

 ggplot(d) + geom_point(aes(x = hour, y = impressions, colour = cvr), size = 3) + geom_line(data = spline_int, aes(x = x, y = y)) 

enter image description here

+16
source

Source: https://habr.com/ru/post/1242307/


All Articles