I have two data frames that have multiple columns with the same names and others with different names. Data frames look something like this:
df1 ID hello world hockey soccer 1 1 NA NA 7 4 2 2 NA NA 2 5 3 3 10 8 8 23 4 4 4 17 5 12 5 5 NA NA 3 43 df2 ID hello world football baseball 1 1 2 3 43 6 2 2 5 1 24 32 3 3 NA NA 2 23 4 4 NA NA 5 15 5 5 9 7 12 23
As you can see, in two common columns (“hello” and “world”), some data is in one of the data frames, and the rest is in the other.
What I'm trying to do is (1) combine 2 frames of data using "id", (2) combine all the data from the hi and world columns in both frames into 1 hi column and 1 world , and (3) have the final data frame, also contain all the other columns in the two source frames (hockey, football, soccer, baseball). So, I want the end result to be like this:
ID hello world hockey soccer football baseball 1 1 2 3 7 4 43 6 2 2 5 3 2 5 24 32 3 3 10 8 8 23 2 23 4 4 4 17 5 12 5 15 5 5 9 7 3 43 12 23
I am new to R, so only the codes that I tried are variants of merge and I tried the answer I found here that was based on a similar question: R: merging copies of the same variable . However, my data sets are actually much larger than what I show here (about 20 relevant columns (for example, “hello” and “world”) and 100 mismatched (for example, “hockey” and “football”) so I'm looking something that does not require me to write all this manually.
Any idea if this can be done? Sorry, I can’t provide an example of my efforts, but I really don’t know where to start:
mydata <- merge(df1, df2, by=c("ID"), all = TRUE)
To play back data frames:
df1 <- structure(list(ID = c(1L, 2L, 3L, 4L, 5L), hellow = c(2, 5, NA, NA, 9), world = c(3, 1, NA, NA, 7), football = c(43, 24, 2, 5, 12), baseball = c(6, 32, 23, 15, 23)), .Names = c("ID", "hello", "world", "football", "baseball"), class = "data.frame", row.names = c(NA, -5L)) df2 <- structure(list(ID = c(1L, 2L, 3L, 4L, 5L), hellow = c(NA, NA, 10, 4, NA), world = c(NA, NA, 8, 17, NA), hockey = c(7, 2, 8, 5, 3), soccer = c(4, 5, 23, 12, 43)), .Names = c("ID", "hello", "world", "hockey", "soccer"), class = "data.frame", row.names = c(NA, -5L))