Linear interpolation using dplyr

I am trying to use a function na.approx()in a library zoo(combined with xts) to interpolate missing values ​​from repeating measures data for multiple individuals with multiple dimensions.

Sample data ...

event.date <- c("2010-05-25", "2010-09-10", "2011-05-13", "2012-03-28", "2013-03-07",    
                "2014-02-13", "2010-06-11", "2010-09-10", "2011-05-13", "2012-03-28",
                "2013-03-07", "2014-02-13")
variable   <- c("neck.bmd", "neck.bmd", "neck.bmd", "neck.bmd", "neck.bmd", "neck.bmd",
                "wbody.bmd", "wbody.bmd", "wbody.bmd", "wbody.bmd", "wbody.bmd", "wbody.bmd")
value      <- c(0.7490, 0.7615, 0.7900, 0.7730, NA, 0.7420, 1.0520, 1.0665, 1.0760,
                1.0870, NA, 1.0550)
## Bind into a data frame
df <- data.frame(event.date, variable, value)
rm(event.date, variable, value)
## Convert date
df$event.date <- as.Date(df$event.date)
## Load libraries
library(magrittr)
library(xts)
library(zoo)

I can interpolate one missing data point for one result for a given person using xts()and na.approx()....

## Subset one variable
wbody <- subset(df, variable == "wbody.bmd")
## order/index and then interpolate
xts(wbody$value, wbody$event.date) %>%
  na.approx()
2010-06-11 1.052000
2010-09-10 1.066500
2011-05-13 1.076000
2012-03-28 1.087000
2013-03-07 1.070977
2014-02-13 1.055000

Matrix return is not ideal, but I can get around this. The main problem that I have is that I have many results for several people. I might have naively thought that since this is therefore a split-apply-comb problem that I could use dplyrto achieve this as follows ...

## Load library
library(dplyr)
## group and then arrange the data (to ensure dates are correct)
df %>%
  group_by(variable) %>%
    arrange(variable, event.date) %>%
      xts(.$value, .$event.date) %>%
        na.approx()
Error in xts(., .$value, .$event.date) : 
  order.by requires an appropriate time-based object

, dplyr xts/zoo, , / , R, , , , - , ( , , , , ).

// , , .

EDIT: , zoo.

+3
2

, , @docendodiscimus

, , , , dplyr mutate().

...

df %>%
  group_by(variable) %>%
    arrange(variable, event.date) %>%
      mutate(ip.value = na.approx(value, maxgap = 4, rule = 2))

maxgap NA, rule .

+5

approx() :

df %>%
  group_by(variable) %>%
    arrange(variable, event.date) %>%
    mutate(time=seq(1,n())) %>%
      mutate(ip.value=approx(time,value,time)$y) %>%
      select(-time)

spline :

df %>%
  group_by(variable) %>%
    arrange(variable, event.date) %>%
    mutate(time=seq(1,n())) %>%
      mutate(ip.value=spline(time,value ,n=n())$y) %>%
      select(-time)
+3

Source: https://habr.com/ru/post/1607723/


All Articles