How to compare two frames of data?

I have two data frames, each of which has two columns (e.g. x and y). I need to compare two data frames and see if any of the values ​​in x or y are the same or both x and y are in two data frames.

+6
source share
2 answers

Use the all.equal function. It does not sort dataframes. It will simply check each cell in the data frame for the same cell in another. You can also use the identical() function.

+27
source

Without an example, I cannot be sure that I understand what you want. However, I think you want something similar. If so, there are almost certainly better ways to do the same.

 a <- matrix(c(1,2, 3,4, 5,6, 7,8), nrow=4, byrow=T, dimnames = list(NULL, c("x","y"))) b <- matrix(c(1,2, 9,4, 9,6, 7,9), nrow=4, byrow=T, dimnames = list(NULL, c("x","y"))) cc <- matrix(c(NA,NA, NA,NA, NA,NA, NA,NA), nrow=4, byrow=T, dimnames = list(NULL, c("x","y"))) for(i in 1:dim(a)[1]) { for(j in 1:dim(a)[2]) { if(a[i,j]==b[i,j]) cc[i,j]=a[i,j] } } cc 

EDIT: January 8, 2013

The next line tells you which cells differ between the two matrices:

 which(a != b, arr.ind=TRUE) # row col # [1,] 2 1 # [2,] 3 1 # [3,] 4 2 

If the two matrices a and b are identical, then:

 which(a != b) # integer(0) which(a != b, arr.ind=TRUE) # row col 

EDIT January 9, 2012

The following code demonstrates the impact that row names can have on identical , all.equal and which , when one of the two data frames is created by a subset of the third data frame. If the row names differ between the two compared data frames, neither identical nor all.equal returns TRUE . However, which can still be used to compare columns x and y between two data frames. If the row names are set to NULL for each of the two compared data frames, then both identical and all.equal will return TRUE .

 df1 <- read.table(text = " group xy 1 10 20 1 10 20 1 10 20 1 10 20 2 1 2 2 3 4 2 5 6 2 7 8 ", sep = "", header = TRUE) df2 <- read.table(text = " group xy 2 1 2 2 3 4 2 5 6 2 7 8 ", sep = "", header = TRUE) # df3 is a subset of df1 df3 <- df1[df1$group==2,] # rownames differ between df2 and df3 and # therefore neither 'all.equal' nor 'identical' return TRUE # even though the i,j cells of df2 and df3 are the same. # Note that 'which' indicates no i,j cells differ between df2 and df3 df2 df3 all.equal(df2, df3) identical(df2, df3) which(df2 != df3) # set row names to NULL in both data sets and # now both 'all.equal' and 'identical' return TRUE. # Note that 'which' still indicates no i,j cells differ between df2 and df3 rownames(df2) <- NULL rownames(df3) <- NULL df2 df3 all.equal(df2, df3) identical(df2, df3) which(df2 != df3) 
+2
source

Source: https://habr.com/ru/post/917782/


All Articles