How to enter values โ€‹โ€‹into a table of data by groups?

Take the following data table:

# IMPUTING VALUES set.seed(1337) mydt <- data.table(Year = rep(2000:2005, each = 10), Type = c("A","B"), Value = 30 + rnorm(60) ) naRows <- sample(nrow(mydt),15) mydt[ naRows, Value := NA] setkey(mydt,Year,Type) 

How can I impute NS median by year and type? I tried the following

 # computed medians computedMedians <- mydt[, .(Median = median(Value, na.rm = TRUE)), keyby = .(Year,Type)] # dataset of just NA rows dtNAs <- mydt[ is.na(Value), .SD, by = .(Year,Type)] mydt[ is.na(Value), Imputations := dtNAs[computedMedians, nomatch = 0][, Median], by = .(Year,Type)] mydt 

but when you run the code, you will see that it works if the group does not receive the data completely, and the calculated medians receive recirculation. Is there an easier way? or how can you get a fix for only the last error?

thanks

+5
source share
2 answers

If you prefer to update the rows without copying the entire column, then:

 require(data.table) # v1.9.6+ cols = c("Year", "Type") dt[is.na(Value), Value := dt[.BY, median(Value, na.rm=TRUE), on=cols], by=c(cols)] 

.BY is a special character that is a named list containing groups. Although this requires a connection to the entire data.table every time, it should be pretty fast as it only searches for one group.

+4
source

There is no need to make a secondary table; this can be done inside one group call:

 mydt[, Value := replace(Value, is.na(Value), median(Value, na.rm=TRUE)) , by=.(Year,Type)] 

This imputation does not guarantee that all missing values โ€‹โ€‹are filled (for example, 2005-B is still NA ).

+3
source

Source: https://habr.com/ru/post/1233742/


All Articles