I have two data frames: Data1 and Data2, which I want to combine based on the variable "ID".
This sample data can be downloaded here: http://dl.dropbox.com/u/52600559/example.RData
Here is the first data frame:
> Data1 ID Fruit Color Weight 1 1 Apple Red 5 2 2 Orange Orange 7 3 3 Banana Yellow 3 4 4 Pear Green 5 5 5 Tomato Red 4 6 6 Berry Blue 4 7 7 Mandarin Orange 4 8 8 Pineapple Yellow 9 9 9 Nectarine Orange 5 10 10 Beet Red 5
And here is the second data frame:
> Data2 ID Fruit Color Weight 1 1 Apple Red 5 2 2 Orange Orange 7 3 3 Banana Yellow 3 4 4 Pear Green 5 5 5 Tomato Red 4 6 11 Pomegranate Red 6 7 12 Grape Green 4 8 13 Cranberry Red 4 9 14 Melon Pink 5 10 15 Pumpkin Orange 10
I tried to combine them as follows:
> merge(Data1, Data2, by = "ID", sort = FALSE, all.x = TRUE, all.y = TRUE) ID Fruit.x Color.x Weight.x Fruit.y Color.y Weight.y 1 1 Apple Red 5 Apple Red 5 2 2 Orange Orange 7 Orange Orange 7 3 3 Banana Yellow 3 Banana Yellow 3 4 4 Pear Green 5 Pear Green 5 5 5 Tomato Red 4 Tomato Red 4 6 9 Nectarine Orange 5 <NA> <NA> NA 7 6 Berry Blue 4 <NA> <NA> NA 8 7 Mandarin Orange 4 <NA> <NA> NA 9 8 Pineapple Yellow 9 <NA> <NA> NA 10 10 Beet Red 5 <NA> <NA> NA 11 14 <NA> <NA> NA Melon Pink 5 12 11 <NA> <NA> NA Pomegranate Red 6 13 12 <NA> <NA> NA Grape Green 4 14 13 <NA> <NA> NA Cranberry Red 4 15 15 <NA> <NA> NA Pumpkin Orange 10
As you can see, both data frames have many identical variables. However, some identifiers in Data1 are not in Data2, and vice versa. Moreover, some identifiers are located in both data frames.
Question 1: I want to combine all the columns shown above. So, I want Fruit.x to merge with Fruit.y. in one column called "Fruits." How can i do this?
Question 2: What if for one of the samples that are present in both Data1 and Data2, one of the values ββis not consistent. So, for sample ID 1, if Fruit.x is Apple, but Fruit.y is incorrectly encoded as Aple (with a spelling error), is there a way to check all these instances quickly so that I can choose which one is correct? Or can I tell R to always consider Data1 correct and Data2 when this happens?
Thanks to everyone who can help!