Find common elements from several vectors that appear at least as a percentage of them

Say I have 4 vectors:

a <- c("Mark","Kate","Greg", "Mathew")
b <- c("Mark","Tobias","Mary", "Mathew", "Greg")
c <- c("Mary","Chuck","Igor", "Mathew", "Robin", "Tobias")
d <- c("Kate","Mark","Igor", "Greg", "Robin", "Mathew")

I would like to select overlapping names from these vectors with the assumption that the name should appear in at least 3 of these 4 vectors. Of course, I would like it to be easy to play with the percentage of vectors, the name should be present.

Is there any way to change intersect?

+4
source share
1 answer

I think this will work. We use the function tableto do most of the heavy lifting.

find_perc <- function(..., perc = .75){
    list_len <- length(list(...)) # how many vectors
    tab_it <- table(c(...)) # tabulate all the names
    tab_it_perc <- tab_it / list_len # calculate the frequencies
    names(tab_it_perc[tab_it_perc >= perc]) # return those with freq >= perc
}


> find_perc(a, b, c, d)
[1] "Greg"   "Mark"   "Mathew"
> find_perc(a, b, c, d, perc = .5)
[1] "Greg"   "Igor"   "Kate"   "Mark"   "Mary"   "Mathew" "Robin"  "Tobias"
+7
source

Source: https://habr.com/ru/post/1665886/


All Articles