Data Merge in R

Question

Data Merge in R

I have dataset A

paper_id author_id
  1       521630
  1       1611750
  2       9
  3       627950
  4       1456512
  8       15
  ........

and dataset B

author_id    author_name        author_affiliation
    9       Ernest Jordan            Cambridge                                                    
    14         K. MORIBE               NA                                                 
    15     D. Jakominich               NA                                                 
    25     William H. Nailon                                                                
    37     P. B. Littlewood    Cavendish Laboratory|Cambridge University 
    ........

I want to combine these two datasets so that the merge is done through author_id, but the result should look like:

paper id    author_id        author_name     author_affiliation
  2            9             Ernest Jordan     Cambridge
  8            15            D. Jakominich       NA

That is, I want the data to be in the order only with paper_id, and the merge is done on author_id, so that the whole order of paper_id is not violated.

From what I am doing:

b<-merge(A,B,by="author_id")

and I get. In this case, paper_id becomes violated.

 author_id paper_id       author_name      author_affiliation
     9     1468598       Ernest Jordan       cambridge
     9     1682105       Ernest Jordan       cambridge

and then I need to sort this result by sorting the paper_id column. This is a very inefficient way.

How can I do that.

thank

+4

merge r

user3171906 Mar 18 '14 at 21:36

source share

3 answers

, .

b <-merge(A,B,by="author_id", sort=F)
b <- b[,c(2,1,3,4)]

by=... sort=F, merge(...) . 1 2.

EDIT ( @BrianDiggs)

@BrianDiggs , , sort=F by=..., A. , data.table, :

# create an example
A <- data.frame(paper_id=1:10000, author_id=rev(LETTERS[1:4]))
B <- data.frame(author_id=LETTERS[1:4],
                author_name=c("Davies","Hawking","Carlyle","Higgs"),
                author_affiliation=c("Oxford","Cambridge","UCL","Edinburgh"),
                stringsAsFactors=F)

library(data.table)
A <- data.table(A,key="author_id")
B <- data.table(B,key="author_id")
A[B,c("author_name","author_affiliation"):=list(author_name,author_affiliation)]
setkey(A,paper_id)
head(A)
#    paper_id author_id author_name author_affiliation
# 1:        1         D       Higgs          Edinburgh
# 2:        2         C     Carlyle                UCL
# 3:        3         B     Hawking          Cambridge
# 4:        4         A      Davies             Oxford
# 5:        5         D       Higgs          Edinburgh
# 6:        6         C     Carlyle                UCL

sort(...), " " radix. , . .

, A[B,...] , . , ( A merge(...).

+2

jlhoward 18 . '14 21:56

, :

A <- read.table(text="paper_id author_id
1       521630
1       1611750
2       9
3       627950
4       1456512
8       15", header=T)

B <- read.table(text="author_id  author_name author_affiliation
9       Ernest_Jordan            Cambridge
14         K._MORIBE               NA
15     D._Jakominich               NA
25     William_H._Nailon           NA
37     P._B._Littlewood    Cavendish_Laboratory|Cambridge_University", 
header=T)

b <- merge(A, B, by="author_id")
b
#   author_id paper_id   author_name author_affiliation
# 1         9        2 Ernest_Jordan          Cambridge
# 2        15        8 D._Jakominich               <NA>

?

0

gung 18 . '14 21:48

Henrik · Accepted Answer · 2014-03-18T22:23:04+0000

base, plyr merge: join. "" ?join: Unlike merge, preserves the order of x no matter what join type is used.. .

library(plyr)
join(A, B, type = "inner")
# Joining by: author_id
#   paper_id author_id  author_name author_affiliation
# 1        2         9 ErnestJordan          Cambridge
# 2        8        15   Jakominich               <NA>

inner_join dplyr . , x , y :

library(dplyr)
inner_join(x = A, y = B)
# Joining by: "author_id"
#   paper_id author_id author_affiliation  author_name
# 1        2         9          Cambridge ErnestJordan
# 2        8        15               <NA>   Jakominich

Data Merge in R

More articles: