Increased rendering performance of overlapping segments

I have a set of x point pairs for drawing segments along the x axis to create a custom reading map in R:

example read map

Half of the task of constructing these segments solves its y positions, so that no two segments that overlap are at the same y level. For each segment, I iterate over the y levels from the first position until I get to a position that does not yet contain a segment that will overlap the current one. Then I record the end position of the current segment and move on to the next.

The actual code is a function as follows:

# Dummy data # A list of start and end positions for each segment along the X axis. Sorted by start. # Passing the function few.reads draws a map in half a second. Passing it many.reads takes about half an hour to complete. few.reads <- data.frame( start=c(rep(10,150), rep(16,100), rep(43,50)), end=c(rep(30,150), rep(34,100), rep(57,50)) ); many.reads <- data.frame( start=c(rep(10,15000), rep(16,10000), rep(43,5000)), end=c(rep(30,15000), rep(34,10000), rep(57,5000)) ); #--- # A function to draw a series of overlapping segments (or "reads" in my along # The x-axis. Where reads overlap, they are "stacked" down the y axis #--- drawReads <- function(reads){ # sort the reads by their start positions reads <- reads[order(reads$start),]; # minimum and maximum for x axis minstart <- min(reads$start); maxend <- max(reads$end); # initialise yread: a list to keep track of used y levels yread <- c(minstart - 1); ypos <- c(); #holds the y position of the ith segment #--- # This iteration step is the bottleneck. Worst case, when all reads are stacked on top # of each other, it has to iterate over many y levels to find the correct position for # the later reads #--- # iterate over segments for (r in 1:nrow(reads)){ read <- reads[r,]; start <- read$start; placed <- FALSE; # iterate through yread to find the next availible # y pos at this x pos (start) y <- 1; while(!placed){ if(yread[y] < start){ ypos[r] <- y; yread[y] <- read$end; placed <- TRUE; } # current y pos is used by another segment, increment y <- y + 1; # initialize another y pos if we're at the end of the list if(y > length(yread)){ yread[y] <- minstart-1; } } } #--- # This is the plotting step # Once we are here the rest of the process is very quick #--- # find the maximum y pos that is used to size up the plot maxy <- length(yread); miny = 1; reads$ypos <- ypos + miny; print("New Plot...") # Now we have all the information, start the plot plot.new(); plot.window(xlim=c(minstart, maxend+((maxend-minstart)/10)), ylim=c(1,maxy)); axis(3,xaxp=c(minstart,maxend,(maxend-minstart)/10)); axis(2, yaxp=c(miny,maxy,3),tick=FALSE,labels=FALSE); print("Draw the reads..."); maxy <- max(reads$ypos); segments(reads$start, maxy-reads$ypos, reads$end, maxy-reads$ypos, col="blue"); } 

My actual dataset is very large and contains regions that can contain up to 600,000 readings, as far as I can tell. The readings naturally add up to each other, so it is very easy to implement the worst-case scenario, where all readings overlap with each other. The time taken to create a large number of readings is unacceptable to me, so I'm looking for a way to make the process more efficient. Can I replace my loops with something faster? Is there an algorithm that can read readings faster? I really can't think of a better way to do this at the moment.

Thanks for your help.

+4
source share
2 answers

Fill each level y in a greedy way. Once the level is full, go down one level and never return.

pseudo code:

  y <- 1 while segment-list.not-empty i <- 1 current <- segment-list[i] current.plot(y) segment-list.remove(i) i <- segment-list.find_first_greater(current.end) while (i > 0) current <- segment-list[i] current.plot(y) segment-list.remove(i) y <- y + 1 

This does not necessarily create an β€œoptimal” plot in any sense, but at least it is O (n log n).

+2
source

Can you sort the starting value? Then you scroll the list from front to back. For each item, write it down, then do a binary search in the rest of the list for the first item larger than the final coordinate of the item you just built. If none are found, increase Y. Remove each item when plotting it.

The sort is O (N lg N) and the binary search is O (lg N) for each element, so the total is O (N lg N).

+1
source

Source: https://habr.com/ru/post/1403441/


All Articles