Assign value to df $ column from another df?

Example: I have df in which the first column

dat <- c("A","B","C","A") 

and then I have another df in which I have in the first column:

 dat2[, 1] [1] ABC Levels: ABC dat2[, 2] [1] 21000 23400 26800 

How to add values ​​in the second df ( dat2 ) to the first df ( dat )? There is repetition in the first df, and I want every time "A" it adds the corresponding value (21000) from the second df to a new column.

+5
source share
4 answers

Creating a reproducible data frame ...

 dat1 <- data.frame(x1 = c("A","B","C","A"), stringsAsFactors = FALSE) dat2 <- data.frame(x1 = c("A","B","C"), x2 = c(21000, 23400, 26800), stringsAsFactors = FALSE) 

Then use the match function.

 dat1$dat2_vals <- dat2$x2[match(dat1$x1, dat2$x1)] 

It is important to convert character columns to character type, not factor , or the elements will not match. I mention this because of the levels attribute in your dat2.

+5
source

The third option that I prefer is left_join dplyr ... It seems to be faster than merge with large data frames.

 require(dplyr) dat1 <- data.frame(x1 = c("A","B","C","A"), stringsAsFactors = FALSE) dat2 <- data.frame(x1 = c("A","B","C"), x2 = c(21000, 23400, 26800), stringsAsFactors = FALSE) dat1 <- left_join(dat1, dat2, by="x1") 
+2
source

Let the big data race with microbenchmark , just for fun!

create large data frames

 dat1 <- data.frame(x1 = rep(c("A","B","C","A"), 1000), stringsAsFactors = FALSE) dat2 <- data.frame(x1 = rep(c("A","B","C", "D"), 1000), x2 = runif(1,0), stringsAsFactors = FALSE) 

on your stamps, set set, GO!

 library(microbenchmark) mbm <- microbenchmark( left_join = left_join(dat1, dat2, by="x1"), merge = merge(dat1, dat2, by = "x1"), times = 20 ) 

Many, many seconds later .... left_join is faster than MUCH for large data frames.

enter image description here

+2
source

Use the merge function.

 # Input data dat <- data.frame(ID = c("A", "B", "C", "A")) dat2 <- data.frame(ID = c("A", "B", "C"), value = c(1, 2, 3)) # Merge two data.frames by specified column merge(dat, dat2, by = "ID") ID value 1 A 1 2 A 1 3 B 2 4 C 3 
+1
source

Source: https://habr.com/ru/post/1271550/


All Articles