Filling from another data.frame file

Question

Filling from another data.frame file

I often have situations when I have to “fill out” information from another data source.

For instance:

x <- data.frame(c1=letters[1:26],c2=letters[26:1]) x[x$c1 == "m","c2"] <- NA x[x$c1 == "a","c2"] <- NA c1 c2 1 a <NA> 2 by 3 cx 4 dw 5 ev 6 fu 7 gt 8 hs 9 ir 10 jq 11 kp 12 lo 13 m <NA> ...

Now, with this missing variable, I would like to check and populate it with a separate data.frame, let's call it y

 y <- data.frame(c1=c("m","a"),c2=c("n","z"))

So, I would like x to be filled with y. (line 13 should be c ("m", "n"), line 1 should be c ("a", "z"))

The method I use to solve this problem seems confusing and indirect. What will be your approach? Bearing in mind that my data is not necessarily in good order, like this one, but the order must be maintained in x . My preference would be for a solution that does not rely on anything other than base R.

+4

r

Brandon bertelsen Oct 29 '12 at 1:58

source share

2 answers

mnel · Answer 1 · 2012-10-29T02:11:42+0000

This will be a much simpler sentence if you are dealing with character variables, not factors .

I will introduce a simple solution to data.table (for an elegant and easy-to-use syntax among many other benefits)

 x <- data.frame(c1=letters[1:26],c2=letters[26:1], stringsAsFactors =FALSE) x[x$c1 == "m","c2"] <- NA y <- data.frame(c1="m",c2="n", stringsAsFactors = FALSE) library(data.table) X <- as.data.table(x) Y <- as.data.table(y)

For ease of merging, I will create a column indicating

 X[,missing_c2 := is.na(c2)] # a similar column in Y Y[,missing_c2 := TRUE] setkey(X, c2, missing_c2) setkey(Y, c2, missing_c2) # merge and replace (by reference) those values in X with the the values in `Y` X[Y, c2 := i.c2]

i.c2 means that we use the values of c2 from argument i to [

This approach assumes that not all values in which c1 = 'm' are absent in X , and you do not want to replace all values in c2 with 'm' , where c1='m' are those that are not

Basic solution

Here is the basic solution - I use merging, so y data.frame may contain more missing replacements than necessary (i.e. it can have values for all values of c1 , although only c1= m `` is required.

  # add a second missing value row because to make the solution more generalizable x <- rbind(x, data.frame(c1 = 'm',c2 = NA, stringsAsFactors = FALSE) ) missing <- x[is.na(x$c2),] merged <- merge(missing, y, by = 'c1') x[is.na(x$c2),] <- with(merged, data.frame(c1 = c1, c2 = c2.y, stringsAsFactors = FALSE))

If you use factors , you will come across a wall of pain to match levels.

Drew steen · Answer 2 · 2012-10-29T02:19:37+0000

In the R database, I believe this will work for you:

 nas <- is.na(x$c2) x[nas, ] <- y[y$c1 %in% x[nas, 1], ]

Filling from another data.frame file

Basic solution

More articles: