Using R to insert a value for missing data with a value from another data frame

Question

Using R to insert a value for missing data with a value from another data frame

Everything,

I have a question that, I'm afraid, might be too pedestrian to ask here, but looking for it elsewhere leads me astray. I cannot use the correct search terms.

I have a panel data pane (country-year) in R with some missing values for this variable. I am trying to attribute to them a value from another vector in another data frame. Here is an illustration of what I'm trying to do.

Suppose Data is a data frame of interest that does not matter for a given vector, which I am trying to assign to another donor data frame. It looks like this.

 country year x 70 1920 9.234 70 1921 9.234 70 1922 9.234 70 1923 9.234 70 1924 9.234 80 1920 NA 80 1921 NA 80 1922 NA 80 1923 NA 80 1924 NA 90 1920 7.562 90 1921 7.562 90 1922 7.562 90 1923 7.562 90 1924 7.562

This will be a Donor frame that matters for country == 80

 country x 70 9.234 80 1.523 90 7.562

I am trying to find an easy way to automate this, outside of the command Data$x[Data$country == 80] <- 1.523 . There are many countries with no on x .

It may be worthwhile to clarify that a simple merge would be the simplest, but not necessarily suitable for what I'm trying to do. Over the years, some countries will see variations on x . Basically, what I'm trying to execute is a command that says that if the x value is not in Data for all years for a given country, take the corresponding value for the country from the Donor data and paste it into all countries of the country as the “best guess” .

Thanks for any input. I suspect this is a rookie question, but I did not know the right conditions to find him.

Below is the reproducibility code of the above data.

 country <- c(70,70,70,70,70,80,80,80,80,80,90,90,90,90,90) year <- c(1920,1921,1922,1923,1924,1920,1921,1922,1923,1924,1920,1921,1922,1923,1924) x <- c(9.234,9.234,9.234,9.234,9.234,NA,NA,NA,NA,NA,7.562,7.562,7.562,7.562,7.562) Data=data.frame(country=country,year=year,x=x) summary(Data) country <- c(70,80,90) x <- c(9.234,1.523,7.562) Donor=data.frame(country=country,x=x) summary(Donor)

+4

r data-manipulation missing-data

steve Jun 16 '13 at 2:35

source share

2 answers

Here is one option, should work as a whole:

 #Get the vector of countries with missing x country.na <- Data$country[is.na(Data$x)] #Get corresponding location of x in Donor index <- sapply(country.na, function(x) which(Donor$country == x)) #Replace NA values with corresponding values in Donor Data$x[is.na(Data$x)] <- Donor$x[index] Data # country year x # 1 70 1920 9.234 # 2 70 1921 9.234 # 3 70 1922 9.234 # 4 70 1923 9.234 # 5 70 1924 9.234 # 6 80 1920 1.523 # 7 80 1921 1.523 # 8 80 1922 1.523 # 9 80 1923 1.523 # 10 80 1924 1.523 # 11 90 1920 7.562 # 12 90 1921 7.562 # 13 90 1922 7.562 # 14 90 1923 7.562 # 15 90 1924 7.562

+5

alexwhan Jun 16 '13 at 3:21

source share

topchef · Accepted Answer · 2013-06-16T03:21:32+0000

Using merge :

 r = merge(Data, Donor, by="country", suffixes=c(".Data", ".Donor")) Data$x = ifelse(is.na(r$x.Data), r$x.Donor, r$x.Data)

If for some reason the idea of overwriting all x values seems bad, use which to overwrite only NA (with the same merge):

 r = merge(Data, Donor, by="country", suffixes=c(".Data", ".Donor")) na.idx = which(is.na(Data$x)) Data[na.idx,"x"] = r[na.idx,"x.Donor"]

Using R to insert a value for missing data with a value from another data frame

More articles: