Taking what you call “keyboard shortcuts” is actually the right way to do something in R. But I really think manual looping is sometimes a good exercise. But in your "production code", i.e. on the code you want to count on, use the built-in functions when applicable.
You just miss one parameter for data.frame
. Everything else is fine. The problem is that by default, character vectors are entered as factors
in data.frame
, and when you try to replace the value with the value from the factor
vector, it replaces it with the base numeric index of that level. Here is the complete code:
df1 <- data.frame(x = 1:3, y=letters[1:3], stringsAsFactors=FALSE) df2 <- data.frame(x= rep(c(1,2,3),each=3)) for(i in 1:length(df2$x)){ df2$y[i]<- df1$y[which(df1$x == df2$x[i])] } df2 xy 1 1 a 2 1 a 3 1 a 4 2 b 5 2 b 6 2 b 7 3 c 8 3 c 9 3 c
For more information on the stringsAsFactors
option stringsAsFactors
see ?data.frame
Since you seem to be interested in learning, here's how you could start debugging. Suppose your source commands are in a file called temp.R
Then
> source('temp.R') > ls() [1] "df1" "df2" "i"
i
stays after the for loop. Let me use it so that your next commands with i
will work in them. You can reassign the value of i
to find out what your command will give for other values. Now let's start breaking the code to see where the problem is.
> i [1] 9 > which(df1$x == df2$x[i]) [1] 3
Looks nice. 3 is what we expect from this, right?
> df1$y[which(df1$x == df2$x[i])] [1] c Levels: abc
Here you need to know "oh, this is a factor!". Whenever you see Levels, the factor light should light up in your head.
Look at the value before we try the replacement to make sure that the rest of your code does not accidentally change it:
> df2$y[9] [1] 3
Looks nice. We know what happens after the replacement, so something is wrong with the appointment. Let's try it just to see what happens:
> df2$y[9] <- as.factor("c") > df2$y[9] [1] 1
Clearly, something is wrong. So we narrowed down the problem to here. Now we need to go back to find out why we are replacing the factor. Hope this leads you to the data.frame
help.
Such things are annoying in R
, but you just need to believe that there are reasons for this behavior, and once you learn more about coding in R
and more of the R
philosophy, you won’t have so many surprises. Good luck