Filling from another data.frame file

I often have situations when I have to โ€œfill outโ€ information from another data source.

For instance:

x <- data.frame(c1=letters[1:26],c2=letters[26:1]) x[x$c1 == "m","c2"] <- NA x[x$c1 == "a","c2"] <- NA c1 c2 1 a <NA> 2 by 3 cx 4 dw 5 ev 6 fu 7 gt 8 hs 9 ir 10 jq 11 kp 12 lo 13 m <NA> ... 

Now, with this missing variable, I would like to check and populate it with a separate data.frame, let's call it y

 y <- data.frame(c1=c("m","a"),c2=c("n","z")) 

So, I would like x to be filled with y. (line 13 should be c ("m", "n"), line 1 should be c ("a", "z"))

The method I use to solve this problem seems confusing and indirect. What will be your approach? Bearing in mind that my data is not necessarily in good order, like this one, but the order must be maintained in x . My preference would be for a solution that does not rely on anything other than base R.

+4
source share
2 answers

This will be a much simpler sentence if you are dealing with character variables, not factors .

I will introduce a simple solution to data.table (for an elegant and easy-to-use syntax among many other benefits)

 x <- data.frame(c1=letters[1:26],c2=letters[26:1], stringsAsFactors =FALSE) x[x$c1 == "m","c2"] <- NA y <- data.frame(c1="m",c2="n", stringsAsFactors = FALSE) library(data.table) X <- as.data.table(x) Y <- as.data.table(y) 

For ease of merging, I will create a column indicating

 X[,missing_c2 := is.na(c2)] # a similar column in Y Y[,missing_c2 := TRUE] setkey(X, c2, missing_c2) setkey(Y, c2, missing_c2) # merge and replace (by reference) those values in X with the the values in `Y` X[Y, c2 := i.c2] 

i.c2 means that we use the values โ€‹โ€‹of c2 from argument i to [

This approach assumes that not all values โ€‹โ€‹in which c1 = 'm' are absent in X , and you do not want to replace all values โ€‹โ€‹in c2 with 'm' , where c1='m' are those that are not


Basic solution

Here is the basic solution - I use merging, so y data.frame may contain more missing replacements than necessary (i.e. it can have values โ€‹โ€‹for all values โ€‹โ€‹of c1 , although only c1= m `` is required.

  # add a second missing value row because to make the solution more generalizable x <- rbind(x, data.frame(c1 = 'm',c2 = NA, stringsAsFactors = FALSE) ) missing <- x[is.na(x$c2),] merged <- merge(missing, y, by = 'c1') x[is.na(x$c2),] <- with(merged, data.frame(c1 = c1, c2 = c2.y, stringsAsFactors = FALSE)) 

If you use factors , you will come across a wall of pain to match levels.

+3
source

In the R database, I believe this will work for you:

 nas <- is.na(x$c2) x[nas, ] <- y[y$c1 %in% x[nas, 1], ] 
+2
source

Source: https://habr.com/ru/post/1442662/


All Articles