Fusion leading to unexpected results in R

Question

Fusion leading to unexpected results in R

I am trying to combine:

to_graph <- structure(list(Teacher = c("BS", "BS", "FA" ), Level = structure(c(2L, 1L, 1L), .Label = c("BE", "AE", "ME", "EE"), class = "factor"), Count = c(2L, 25L, 28L)), .Names = c("Teacher", "Level", "Count"), row.names = c(NA, 3L), class = "data.frame")

and

 graph_avg <- structure(list(Teacher = structure(c(1L, 1L, 2L), .Label = c("BS", "FA"), class = "factor"), Count.Fraction = c(0.0740740740740741, 0.925925925925926, 1)), .Names = c("Teacher", "Count.Fraction" ), row.names = c(NA, -3L), class = "data.frame")

with merge(to_graph, graph_avg, by="Teacher") , but instead of getting what I expect (3 lines), I get:

  Teacher Level Count Count.Fraction 1 BS AE 2 0.07407407 2 BS AE 2 0.92592593 3 BS BE 25 0.07407407 4 BS BE 25 0.92592593 5 FA BE 28 1.00000000

Any ideas? Thanks!

+4

r data-management

Jeff Erickson Nov 24 '11 at 22:26

source share

2 answers

Since it’s obvious that one of your data sets comes from another, I would suggest you don’t need to merge at all, but find a way to do the analysis so that all the data remains intact.

For example, use ddply in the plyr package to output one set from another. Notice how this result contains all the necessary information:

 > library(plyr) > ddply(to_graph, .(Teacher), transform, Count.Fraction=Count/sum(Count)) Teacher Level Count Count.Fraction 1 BS AE 2 0.07407407 2 BS BE 25 0.92592593 3 FA BE 28 1.00000000

To answer the merger question. A merge in R is like a join database. To join the two tables, you must be sure that you can map the primary key in both tables. The primary key in your case is a combination of Teacher and Level . Since the Level column does not exist in the second data.frame , a merge not possible.

The only way to recover from this situation is to add the missing bit of the primary key back to the data. Assuming the data is sorted in exactly the same order, you can do it with cbind , and then do merge :

 > merge(to_graph, cbind(graph_avg, Level=to_graph$Level)) Teacher Level Count Count.Fraction 1 BS AE 2 0.07407407 2 BS BE 25 0.92592593 3 FA BE 28 1.00000000

+2

Andrie Nov 25 '11 at 14:54

source share

John · Accepted Answer · 2011-11-24T22:40:25+0000

Not sure what you are trying to achieve. merge does what is supposed to be here.

Look at all the data.frames files

 graph_avg Teacher Count.Fraction 1 BS 0.07407407 2 BS 0.92592593 3 FA 1.00000000 to_graph Teacher Level Count 1 BS AE 2 2 BS BE 25 3 FA BE 28 merge(to_graph, graph_avg) Teacher Level Count Count.Fraction 1 BS AE 2 0.07407407 2 BS AE 2 0.92592593 3 BS BE 25 0.07407407 4 BS BE 25 0.92592593 5 FA BE 28 1.00000000

Now, if I am going to combine the ones that I need to look at and see what is common and what I am going to get for the result. Teacher, you have it in both. But, if I try to merge only with Teacher, what should I do? There is no single identifier for the BS, and it appears twice in both data.frames files. If he appeared once in one of them, it would be easy to solve. So, I can check and say: well, I have a unique identifier in one data.frame file, level ... that would do this ... and go and do something that will not lose any of your data. merge really convenient for situations where you have a small data.frame, say, with each teacher in it once, and he has a teacher's age or sex. You can combine this into your other data.frame with repetitive measures by teacher, and every time a teacher appears, you will also know this. But for what you do, this is not the right tool.

merge not what you want here. If this is really your data.frames, use cbind instead.

 cbind(to_graph, graph_avg$Count.Fraction) Teacher Level Count Count.Fraction 1 BS AE 2 0.07407407 2 BS BE 25 0.92592593 3 FA BE 28 1.00000000

This is probably what you were looking for.

Fusion leading to unexpected results in R

More articles: