Convert data to multiple columns

I have 2 data frames:

dat: 1900 obs of 9 variables

V1 V2 V3 V4 V5 V6 V7 V8 V9 1 V_P50P50_Q3 chr12 106642383 106642395 + 18.1425 4.03e-08 0.0515 GGGGGACTCCCCC 2 V_P50RELAP65_Q5_01 chr8 142276666 142276677 - 16.6429 2.51e-07 0.2780 GGGATTTCCCAC 3 V_RELA_Q6 chr22 51020067 51020078 - 15.9395 2.71e-07 0.3350 GGGAATTTCCCC 4 V_NFKB_Q6_01 chr14 98601454 98601469 + 17.0684 3.08e-07 0.236 GGAGTGGAAATTCC 5 V_CREL_Q6 chr22 51020068 51020079 - 16.1165 3.19e-07 0.4050 AGGGAATTTCCC 

dat.markov: 1486 obs of 9 variables

  V1 V2 V3 V4 V5 V6 V7 V8 V9 1 V_NFKB_Q6_01 chr14 98601454 98601469 + 17.2212 1.33e-07 0.146 GGAGTGGAAATTCCCT 2 V_P50P50_Q3 chr12 106642383 106642395 + 16.9358 1.57e-07 0.201 GGGGGACTCCCCC 3 V_CREL_Q6 chr22 51020068 51020079 - 16.0549 2.29e-07 0.292 AGGGAATTTCCC 4 V_NFKB_Q6_01 chr22 51020064 51020079 + 16.9906 2.32e-07 0.146 TTGGGGGAAATTCCCT 5 V_RELA_Q6 chr22 51020067 51020078 - 15.7496 3.42e-07 0.433 GGGAATTTCCCC 

I need to combine two data frames so that I get all the rows with the corresponding columns V1, V2, V3 and V4 between two data frames.

I tried:

 y<-merge(dat,dat.markov,by=c("V1","V2","V3","V4")) 

which gives me a combined data block, but from 1513 about. But technically, the number of observations should be equal to or less than a smaller information frame, that is, 1486 vol.

My combined data.frame looks in order by the number of columns returned:

  V1 V2 V3 V4 V5.x V6.x V7.x V8.x V9.x V5.y 1 V_CREL_01 chr10 112778464 112778473 + 12.9434 1.94e-05 0.694 TGGGTTTTCC + V6.y V7.y V8.y V9.y 1 12.8838 2.35e-05 0.788 TGGGTTTTCC 

I know that you can cross data.frames with a single column, but is there a way that you can cross two data.frames in multiple columns?

+6
source share
1 answer

If I understand, do you want the 1,2,3,4 columns to be perfectly matched in both frames in order to keep them? Besides merge , I would use the interaction and match functions

 dat$combine = as.character(interaction(dat$V1, dat$V2, dat$V3, dat$V4)) dat.markov$combine = as.character(interaction(dat.markov$V1, dat.markov$V2, dat.markov$V3, dat.markov$V4)) dat.overlap = dat[complete.cases(match(dat.markov$combine, dat$combine)),] 
+2
source

Source: https://habr.com/ru/post/953513/


All Articles