I have one table with coordinates ( start , end ) approx. 500,000 fragments and another table with 60,000 single coordinates, which I would like to compare with the previous fragments. Those. for each record from the dtCoords table dtCoords I need to search for a record in the dtFrags table with the same chr and start <= coord <= end (and get type from this dtFrags record). Is it good to use R for this at all, or should I look better at other languages?
Here is my example:
require(data.table) dtFrags <- fread( "id,chr,start,end,type 1,1,100,200,exon 2,2,300,500,intron 3,X,400,600,intron 4,2,250,600,exon ") dtCoords <- fread( "id,chr,coord 10,1,150 20,2,300 30,Y,500 ")
In the end, I would like to have something like this:
"idC,chr,coord,idF,type 10, 1, 150, 1, exon 20, 2, 300, 2, intron 20, 2, 300, 4, exon 30, Y, 500, NA, NA "
I can simplify the task a bit by dividing the table into subtopics by chr , so I would focus only on the coordinates
setkey(dtCoords, 'chr') setkey(dtFrags, 'chr') for (chr in unique(dtCoords$chr)) { dtCoordsSub <- dtCoords[chr]; dtFragsSub <- dtFrags[chr]; dtCoordsSub[, {
but itβs still not clear to me how to work inside ... I would be very grateful for any tips.
UPD just in case, I put my real table in the archive here . After unpacking, tables can be loaded into the working directory with the following code:
dtCoords <- fread("dtCoords.txt", sep="\t", header=TRUE) dtFrags <- fread("dtFrags.txt", sep="\t", header=TRUE)