Attach a data frame to a master data frame if some columns are shared

I want to add one data frame to another (main). The problem is that only a subset of their columns is common. In addition, the order of their columns may vary.

Main data frame:

abc r1 1 2 -2 r2 2 4 -4 r3 3 6 -6 r4 4 8 -8 

New data frame:

  dac r1 -120 10 -20 r2 -140 20 -40 

Expected Result:

  abc r1 1 2 -2 r2 2 4 -4 r3 3 6 -6 r4 4 8 -8 r5 10 NaN -20 r6 20 NaN -40 

Is there any smart way to do this? This question is similar, but the setup is different.

+5
source share
4 answers

Check the bind_rows function. By default, you will do some nice things for you, such as filling columns that exist in one data.frame , but not others with NA , and not just crashing. Here is an example:

 # Use the dplyr package for binding rows and for selecting columns library(dplyr) # Generate some example data a <- data.frame(a = rnorm(10), b = rnorm(10)) b <- data.frame(a = rnorm(5), c = rnorm(5)) # Stack data frames bind_rows(a, b) Source: local data frame [15 x 3] abc 1 2.2891895 0.1940835 NA 2 0.7620825 -0.2441634 NA 3 1.8289665 1.5280338 NA 4 -0.9851729 -0.7187585 NA 5 1.5829853 1.6609695 NA 6 0.9231296 1.8052112 NA 7 -0.5801230 -0.6928449 NA 8 0.2033514 -0.6673596 NA 9 -0.8576628 0.5163021 NA 10 0.6296633 -1.2445280 NA 11 2.1693068 NA -0.2556584 12 -0.1048966 NA -0.3132198 13 0.2673514 NA -1.1181995 14 1.0937759 NA -2.5750115 15 -0.8147180 NA -1.5525338 

To solve the problem in your question, you must first select for the columns in your main data.frame . If a is the leading data.frame and b contains the data you want to add, you can use the select function from dplyr to get the columns you need.

 # Select all columns in b with the same names as in master data, a # Use select_() instead of select() to do standard evaluation. b <- select_(b, names(a)) # Combine bind_rows(a, b) Source: local data frame [15 x 2] ab 1 2.2891895 0.1940835 2 0.7620825 -0.2441634 3 1.8289665 1.5280338 4 -0.9851729 -0.7187585 5 1.5829853 1.6609695 6 0.9231296 1.8052112 7 -0.5801230 -0.6928449 8 0.2033514 -0.6673596 9 -0.8576628 0.5163021 10 0.6296633 -1.2445280 11 2.1693068 NA 12 -0.1048966 NA 13 0.2673514 NA 14 1.0937759 NA 15 -0.8147180 NA 
+6
source

try the following:

 library(plyr) # thanks to comment @ialm df <- data.frame(a=1:4,b=seq(2,8,2),c=seq(-2,-8,-2)) new <- data.frame(d=c(-120,-140),a=c(10,20),c=c(-20,40)) # we use %in% to pull the columns that are the same in the master # then we use rbind.fill to put in this dataframe below the master # filling any missing data with NA values res <- rbind.fill(df,new[,colnames(new) %in% colnames(df)]) > res abc 1 1 2 -2 2 2 4 -4 3 3 6 -6 4 4 8 -8 5 10 NA -20 6 20 NA 40 
+2
source

The dplyr and plyr -based solutions presented here are very natural for this task, using bind_rows and rbind.fill , respectively, although this is also possible as a single-line in the R base. Basically I will go through the names of the first data frame, capturing the corresponding column of the second data frame, if it returns one or another value to all NaN values.

 rbind(A, sapply(names(A), function(x) if (x %in% names(B)) B[,x] else rep(NaN, nrow(B)))) # abc # r1 1 2 -2 # r2 2 4 -4 # r3 3 6 -6 # r4 4 8 -8 # 5 10 NaN -20 # 6 20 NaN -40 
+2
source

another option uses rbind.fill from the plyr package

enter your sample data

 toread <- " abc 1 2 -2 2 4 -4 3 6 -6 4 8 -8" master <- read.table(textConnection(toread), header = TRUE) toread <- " dac -120 10 -20 -140 20 -40" to.append <- read.table(textConnection(toread), header = TRUE) 

bind data

 library(plyr) rbind.fill(master, to.append) 
+1
source

Source: https://habr.com/ru/post/1238193/


All Articles