How to compare df1 and df2 of unequal length and assign values ​​in R

These are the definitions of df1 and df2:

df1 <- data.frame(x = 1:3, y=letters[1:3]) df2 <- data.frame(x= rep(c(1,2,3),each=3)) 

I want to assign the value of the column y to df1 to the column y to df2, where the value in column x df1 is equal to the value in column x df2. As shown above, df1 and df2 are of unequal length.

 for(i in 1:length(df2$x)){ df2$y[i]<- df1$y[which(df1$x == df2$x[i])] } 

I am not looking for short cuts to do this (no built-in functions, please). I want to learn this right way.

Is my logic correct? If for some reason this does not work?

Any guidance would be greatly appreciated.

+4
source share
1 answer

Taking what you call “keyboard shortcuts” is actually the right way to do something in R. But I really think manual looping is sometimes a good exercise. But in your "production code", i.e. on the code you want to count on, use the built-in functions when applicable.

You just miss one parameter for data.frame . Everything else is fine. The problem is that by default, character vectors are entered as factors in data.frame , and when you try to replace the value with the value from the factor vector, it replaces it with the base numeric index of that level. Here is the complete code:

 df1 <- data.frame(x = 1:3, y=letters[1:3], stringsAsFactors=FALSE) df2 <- data.frame(x= rep(c(1,2,3),each=3)) for(i in 1:length(df2$x)){ df2$y[i]<- df1$y[which(df1$x == df2$x[i])] } df2 xy 1 1 a 2 1 a 3 1 a 4 2 b 5 2 b 6 2 b 7 3 c 8 3 c 9 3 c 

For more information on the stringsAsFactors option stringsAsFactors see ?data.frame

Since you seem to be interested in learning, here's how you could start debugging. Suppose your source commands are in a file called temp.R Then

 > source('temp.R') > ls() [1] "df1" "df2" "i" 

i stays after the for loop. Let me use it so that your next commands with i will work in them. You can reassign the value of i to find out what your command will give for other values. Now let's start breaking the code to see where the problem is.

 > i [1] 9 > which(df1$x == df2$x[i]) [1] 3 

Looks nice. 3 is what we expect from this, right?

 > df1$y[which(df1$x == df2$x[i])] [1] c Levels: abc 

Here you need to know "oh, this is a factor!". Whenever you see Levels, the factor light should light up in your head.

Look at the value before we try the replacement to make sure that the rest of your code does not accidentally change it:

 > df2$y[9] [1] 3 

Looks nice. We know what happens after the replacement, so something is wrong with the appointment. Let's try it just to see what happens:

 > df2$y[9] <- as.factor("c") > df2$y[9] [1] 1 

Clearly, something is wrong. So we narrowed down the problem to here. Now we need to go back to find out why we are replacing the factor. Hope this leads you to the data.frame help.

Such things are annoying in R , but you just need to believe that there are reasons for this behavior, and once you learn more about coding in R and more of the R philosophy, you won’t have so many surprises. Good luck

+2
source

Source: https://habr.com/ru/post/1384555/


All Articles