I have an input file with a list of clusters ~ 50,000 and the presence of several factors in each of them (about 10 million records in total), see the example below below:
set.seed(1) x = paste("cluster-",sample(c(1:100),500,replace=TRUE),sep="") y = c( paste("factor-",sample(c(letters[1:3]),300, replace=TRUE),sep=""), paste("factor-",sample(c(letters[1]),100, replace=TRUE),sep=""), paste("factor-",sample(c(letters[2]),50, replace=TRUE),sep=""), paste("factor-",sample(c(letters[3]),50, replace=TRUE),sep="") ) data = data.frame(cluster=x,factor=y)
With a bit of help from another question, I got it to create a piechart for the joint occurrence of such factors:
counts = with(data, table(tapply(factor, cluster, function(x) paste(as.character(sort(unique(x))), collapse='+')))) pie(counts[counts>1])
But now I would like to have a Venn diagram for the co-occurrence of factors. Ideally, also in the form in which it can take a threshold for the minimum amount for each factor. For example, the Venn diagram for different factors, so that each of them should have n> 10 in each cluster that should be taken into account.
I tried to find a way to create a table count using an aggregate, but could not get it to work.
r data-visualization combinations
719016 Nov 14 2018-11-11T00: 00Z
source share