R: Replacing NA values ​​with an average hour using dplyr

I am learning the dplyr package in R and I really like it. But now I am dealing with NA values ​​in my data.

I would like to replace any NA with the average of the corresponding hour, for example with this very simple example:

#create an example day = c(1, 1, 2, 2, 3, 3) hour = c(8, 16, 8, 16, 8, 16) profit = c(100, 200, 50, 60, NA, NA) shop.data = data.frame(day, hour, profit) #calculate the average for each hour library(dplyr) mean.profit <- shop.data %>% group_by(hour) %>% summarize(mean=mean(profit, na.rm=TRUE)) > mean.profit Source: local data frame [2 x 2] hour mean 1 8 75 2 16 130 

Can I use the dplyr conversion command to replace day 3 NA in profit with 75 (at 8:00) and 130 (at 16:00)?

+6
source share
2 answers

Try

  shop.data %>% group_by(hour) %>% mutate(profit= ifelse(is.na(profit), mean(profit, na.rm=TRUE), profit)) # day hour profit #1 1 8 100 #2 1 16 200 #3 2 8 50 #4 2 16 60 #5 3 8 75 #6 3 16 130 

Or you can use replace

  shop.data %>% group_by(hour) %>% mutate(profit= replace(profit, is.na(profit), mean(profit, na.rm=TRUE))) 
+15
source

A (less elegant) approach with basic features:

 transform(shop.data, profit = ifelse(is.na(profit), ave(profit, hour, FUN = function(x) mean(x, na.rm = TRUE)), profit)) # day hour profit # 1 1 8 100 # 2 1 16 200 # 3 2 8 50 # 4 2 16 60 # 5 3 8 75 # 6 3 16 130 
+3
source

Source: https://habr.com/ru/post/976659/


All Articles