Create a new data frame to match the contrast between two similar df

Question

Create a new data frame to match the contrast between two similar df

I have a dataframe made like this:

XYZT 1 2 4 2 3 2 1 4 7 5 NA 3

After several steps (no matter which one) I got this df:

  XYZT 1 2 4 2 3 2 NA 4 7 5 NA 3

I want to get a new data frame made only by rows that did not change during the steps ; The result will be as follows:

  XYZT 1 2 4 2 7 5 NA 3

How can i do

+5

r dataframe

Silvia Sep 11 '17 at 10:13

source share

4 answers

You can use the dplyr intersect function:

 library(dplyr) intersect(d1, d2) # XYZT #1 1 2 4 2 #2 7 5 NA 3

This is the equivalent of data.frame of the base function R intersect .

If you work with data.table s, this package also provides this function:

 library(data.table) setDT(d1) setDT(d2) fintersect(d1, d2) # XYZT #1: 1 2 4 2 #2: 7 5 NA 3

+3

docendo discimus Sep 11 '17 at 10:21

source share

I am afraid that neither the correct answers correspond to a semi join , nor intersect or merge . merge and intersect will not handle duplicate lines properly. sem join will reorder the lines.

From this point of view, I think that the only right one is still Akrun's.

You can also do something like:

 df1[rowSums(((df1 == df2) | (is.na(df1) & is.na(df2))), na.rm = T) == ncol(df1),]

But I think that the acrun method is more elegant and probably works better in terms of speed.

+3

ira Sep 11 '17 at 11:21

source share

Another dplyr solution: semi_join .

 dt1 %>% semi_join(dt2, by = colnames(.)) XYZT 1 1 2 4 2 2 7 5 NA 3

Data

 dt1 <- read.table(text = "XYZT 1 2 4 2 3 2 1 4 7 5 NA 3", header = TRUE, stringsAsFactors = FALSE) dt2 <- read.table(text = " XYZT 1 2 4 2 3 2 NA 4 7 5 NA 3", header = TRUE, stringsAsFactors = FALSE)

+2

www Sep 11 '17 at 10:25

source share

akrun · Accepted Answer · 2017-09-11T10:19:31+0000

One parameter with base R will consist of paste rows of each data set and compare ( == ) to create a logical vector that we use to subset the new data set

 dfO[do.call(paste, dfO) == do.call(paste, df),] # XYZT #1 1 2 4 2 #3 7 5 NA 3

where 'dfO' is the old dataset and 'df' is the new

Create a new data frame to match the contrast between two similar df

More articles: