Create a new data frame to match the contrast between two similar df

I have a dataframe made like this:

XYZT 1 2 4 2 3 2 1 4 7 5 NA 3 

After several steps (no matter which one) I got this df:

  XYZT 1 2 4 2 3 2 NA 4 7 5 NA 3 

I want to get a new data frame made only by rows that did not change during the steps ; The result will be as follows:

  XYZT 1 2 4 2 7 5 NA 3 

How can i do

+5
source share
4 answers

One parameter with base R will consist of paste rows of each data set and compare ( == ) to create a logical vector that we use to subset the new data set

 dfO[do.call(paste, dfO) == do.call(paste, df),] # XYZT #1 1 2 4 2 #3 7 5 NA 3 

where 'dfO' is the old dataset and 'df' is the new

+4
source

You can use the dplyr intersect function:

 library(dplyr) intersect(d1, d2) # XYZT #1 1 2 4 2 #2 7 5 NA 3 

This is the equivalent of data.frame of the base function R intersect .

If you work with data.table s, this package also provides this function:

 library(data.table) setDT(d1) setDT(d2) fintersect(d1, d2) # XYZT #1: 1 2 4 2 #2: 7 5 NA 3 
+3
source

I am afraid that neither the correct answers correspond to a semi join , nor intersect or merge . merge and intersect will not handle duplicate lines properly. sem join will reorder the lines.

From this point of view, I think that the only right one is still Akrun's.

You can also do something like:

 df1[rowSums(((df1 == df2) | (is.na(df1) & is.na(df2))), na.rm = T) == ncol(df1),] 

But I think that the acrun method is more elegant and probably works better in terms of speed.

+3
source

Another dplyr solution: semi_join .

 dt1 %>% semi_join(dt2, by = colnames(.)) XYZT 1 1 2 4 2 2 7 5 NA 3 

Data

 dt1 <- read.table(text = "XYZT 1 2 4 2 3 2 1 4 7 5 NA 3", header = TRUE, stringsAsFactors = FALSE) dt2 <- read.table(text = " XYZT 1 2 4 2 3 2 NA 4 7 5 NA 3", header = TRUE, stringsAsFactors = FALSE) 
+2
source

Source: https://habr.com/ru/post/1271685/


All Articles