Filling a ton of NA data in R by index?

I have price data indexed according to three things:

Status, date and UPC (this is the product code).

I have a bunch of prices that are equal to NA.

I try to fill in the NS as follows: for a given missing price with an index (S, D, UPC) fill in the average price of all data points with the same S and UPC. Ie, take the average value by date.

There must be an incredibly easy way to do this, because it is very simple. I use for loops, but now I understand that it is incredibly inefficient, and I would like to use a function, for example, one in plyr or dplyr, which will do all this as few steps as possible.

upc=c(1153801013,1153801013,1153801013,1153801013,1153801013,1153801013,2105900750,2105900750,2105900750,2105900750,2105900750,2173300001,2173300001,2173300001,2173300001)
date=c(200601,200602,200603,200604,200601,200602,200601,200602,200603,200601,200602,200603,200604,200605,200606)
price=c(26,28,NA,NA,23,24,85,84,NA,81,78,24,19,98,NA)
state=c(1,1,1,1,2,2,1,1,2,2,2,1,1,1,1)

# This is what I have:
data <- data.frame(upc,date,state,price)

# This is what I want:
price=c(26,28,27,27,23,24,85,84,79.5,81,78,24,19,98,47)
data2 <- data.frame(upc,date,state,price)

Any tips? Thanks.

+4
source share
3

ave , NA :

with(data,
  ave(price, list(upc,state), FUN=function(x) replace(x,is.na(x),mean(x,na.rm=TRUE) ) )
)
# [1] 26.0 28.0 27.0 27.0 23.0 24.0 85.0 84.0 79.5 81.0 78.0 24.0 19.0 98.0 47.0
+8

upc state:

meanmtx <- tapply(dat$price, dat[c('upc','state')], mean, na.rm=TRUE)

, upc state. , , "":

dat$price[is.na(dat$price)] <-  
          meanmtx[  cbind( as.character(dat[  is.na(dat$price), 'upc']), 
                           as.character(dat[  is.na(dat$price),'state']) )  ]

> dat
          upc   date state price
1  1153801013 200601     1  26.0
2  1153801013 200602     1  28.0
3  1153801013 200603     1  27.0
4  1153801013 200604     1  27.0
5  1153801013 200601     2  23.0
6  1153801013 200602     2  24.0
7  2105900750 200601     1  85.0
8  2105900750 200602     1  84.0
9  2105900750 200603     2  79.5
10 2105900750 200601     2  81.0
11 2105900750 200602     2  78.0
12 2173300001 200603     1  24.0
13 2173300001 200604     1  19.0
14 2173300001 200605     1  98.0
15 2173300001 200606     1  47.0
+6

, na.aggregate ( zoo) data.table. na.aggregate NA mean . FUN , NA median, min max, . dplyr/data.table/base R. data.table "data.frame" "data.table" (setDT(data)), "upc", "state", (:=) "" na.aggregate "".

library(data.table)
library(zoo)
setDT(data)[,  price:= na.aggregate(price) , .(upc, state)]
+4

Source: https://habr.com/ru/post/1628416/


All Articles