Sounds like a combinatorics problem.
MATCH (U:Url)-[:IS_ABOUT]->(T:Tag)
WITH U, T ORDER BY id(T)
WITH U,
collect(distinct T) as TAGS
WITH U, TAGS,
toInt(floor(exp(log(2) * size(TAGS)))) as numberOfCombinations
UNWIND RANGE(0, numberOfCombinations) as combinationIndex
WITH U, TAGS, combinationIndex
UNWIND RANGE(0, size(TAGS)-1) as tagIndex
WITH U, TAGS, combinationIndex, tagIndex,
toInt(ceil(exp(log(2) * tagIndex))) as pw2
call apoc.bitwise.op(combinationIndex, "&", pw2) YIELD value
WITH U, TAGS, combinationIndex, tagIndex,
value WHERE value > 0
WITH U, TAGS, combinationIndex,
collect(TAGS[tagIndex]) as combination
RETURN combination, count(combination) as freq, collect(U) as urls
ORDER BY freq DESC
I think it is best to compute and save a combination of tags using this algorithm during marking. And the request would be something like this:
MATCH (Comb:TagsCombination)<-[:IS_ABOUT]-(U:Url)
WITH Comb, collect(U) as urls, count(U) as freq
MATCH (Comb)-[:CONTAIN]->(T:Tag)
RETURN Comb, collect(T) as Tags, urls, freq ORDER BY freq DESC
source
share