R: How to use setdiff for two string vectors, only comparing the first 3 elements with tab delimiters in each line? without using qdap

ive previously asked this question, and the answer I got processed: R: How to use setdiff for two string vectors, only comparing the first 3 elements with tab delimiters on each line? , However, qdap requires rJava and the correct configuration of the user system. cannot load qdap R-package . So now I am asking the question again, but I wonder if there is a way to do this without using qdap? I will repeat the question below:

I am trying to find a way in R to distinguish between two string vectors, but only based on the first three columns that are listed on each row. For example, this is list1 and list2

list1:

 "1\t1113200\t1118399\t1\t1101465\t1120176\tENSRNOG00000040300\tRaet1l\t0\n" 
        "1\t1180200\t1187599\t1\t1177682\t1221416\tENSRNOG00000061316\tAABR07000121.1\t0\n"
        "1\t1180200\t1187599\t1\t1177632\t1221416\tENSRNOG00000061316\tAABR07000121.1\t0\n"

list2:

"1\t1113200\t1118399\t1\t1101465\t1120176\tENSRNOG00000040300\tRaet1l\t0\n" 
  "1\t1180200\t1187599\t1\t1177682\t1221416\tENSRNOG00000061316\tAABR07000121.1\t0\n"

I want to make setdiff (list2, list1), so I just get everything on list2 that is not on list1, but I want to do this only on the first lines with three tabs. Therefore, in list1, I would just think:

  "1\t1113200\t1118399"

from the first entry. However, I still need a full line. I want to compare only the first three columns. I find it difficult to understand how to do this, any help will be appreciated. Ive already looked at a few SO posts, none of them seemed to help.

+4
source share
1 answer

, ( ) list1 list2?

R, , :

# first, let get the first 3 columns of `list1` (get up to the third tab)
m = regexec("^(?:[^\t]+\t){3}", list1)
# you'll see it a list with the first 3 columns of each thing in `x`
first3.list1 = unlist(regmatches(list1, m))

, list2. list2 %in% . (setdiff 3 , %in% list2 )

m = regexec("^(?:[^\t]+\t){3}", list2)
first3.list2 = unlist(regmatches(list2, m))
list2[!(first3.list2 %in% first3.list1)]

(, , , list2 , list1).


strsplit read.delim , paste, 3 , .

+4

Source: https://habr.com/ru/post/1656025/


All Articles