I have a list of data.frame that should have applied a very specific method of removing duplicates. I have a reason for using special conditional duplicate removal for this data.frame list. However, the duplication deletion condition for each individual data frame is different. I want to do a complete duplicate removal for the first list item; for the second element of the list, I need to look for a line that appears more than two times (freq> 2) and contain only one line; for an element of the third list, search for a line that appears more than three times (freq> 3), and save two lines in this data.frame. I am trying to get more software, dynamic solutions for this data processing task. I tried to take a picture to get a good solution, but could not get the desired result.How can i make it easy? Any way to accomplish this task more efficiently for my specific result? Any idea please?
reproducible data.frame:
myList <- list(
bar= data.frame(start.pos=c(9,19,34,54,70,82,136,9,34,70,136,9,82,136),
end.pos=c(14,21,39,61,73,87,153,14,39,73,153,14,87,153),
pos.score=c(48,6,9,8,4,15,38,48,9,4,38,48,15,38)),
cat = data.frame(start.pos=c(7,21,21,72,142,7,16,21,45,72,100,114,142,16,72,114),
end.pos=c(10,34,34,78,147,10,17,34,51,78,103,124,147,17,78,124),
pos.score=c(53,14,14,20,4,53,20,14,11,20,7,32,4,20,20,32)),
foo= data.frame(start.pos=c(12,12,12,58,58,58,118,12,12,44,58,102,118,12,58,118),
end.pos=c(36,36,36,92,92,92,139,36,36,49,92,109,139,36,92,139),
pos.score=c(48,48,48,12,12,12,5,48,48,12,12,11,5,48,12,5))
)
Since it myListis the result of a user-defined function, data.frame cannot be separated. I am looking for a more software solution to do this specific duplicate removal for my data. How can I do a specific removal of duplicates if input is a data.frame list?
my desired result is as follows:
expectedList <- list(
bar= data.frame(start.pos=c(9,19,34,54,70,82,136),
end.pos=c(14,21,39,61,73,87,153),
pos.score=c(48,6,9,8,4,15,38)),
cat= data.frame(start.pos=c(7,21,72,142,7,16,45,100,114,142,16,114),
end.pos=c(10,34,78,147,10,17,51,103,124,147,17,124),
pos.score=c(53,14,20,4,53,20,11,7,32,4,20,32)),
foo= data.frame(start.pos=c(12,12,44,58,58,118,102,118,118),
end.pos=c(36,36,49,92,92,139,109,139,139),
pos.score=c(48,48,12,12,12,5,11,5,5))
)
Edit :
in the second data.frame cat, I'm going to look for lines that appear three times, and save these lines only once; if the line appears twice, I do not duplicate it.
data.frame foo, , , . , data.frame. ?
data.frame? ? !