How can I effectively find subsets of sets on a map?

Please note that I have a map of value sets for values, in Java the type of this map will be:

Map<Set<Object>, Object> setToObjMap; 

Given the new set of set objects, I want to find all the values ​​in setToObjMap, where the associated key is a subset of the "search set".

So, for example, if my card was:

 ["telephone", "hat"] -> "book" ["laugh", "fry", "mouse"] -> "house" ["dog", "cat"] -> "monster" 

Then, given the search set ["telephone", "hat", "book", "dog", "cat"] , I would get the values ​​"book" and "monster".

In practice, setToObjectMap can be tens of thousands of records in setToObjectMap with tens of thousands of possible values ​​in sets. A search set will usually have about 10 items.

I hope there is an effective way to do this that does not require iterating through all the keys on the map. Can anyone suggest any suggestions?

+3
source share
5 answers

You can create a search data structure

 Map<String,List<Finder>> 

With a Finder having the words int count and max and a res . Please note that this list contains information that many sets in setToObjMap can use the same word that is missing in your examples.

 "telephone" -> [{res:"book",count=0,max=2}] "hat" -> same object as above "laugh" -> [{res:"house",count=0,max=3}] ... 

This search collection is quickly built and even faster dumped after a search.

The search algorithm iterates through set , for each word and each Finder for that word, it increments the count variable. Second pass, take all the values ​​of the search map, if count==max , put res in the result.

Init Algorithm:

 for Entry e in setToObjMap Finder f = new Finder(e.value, 0, e.key.size) // res, count, max for String word in e.key lookup.get(word).add(f) 

Search Algorithm:

 for String word in set for Finder f in lookup.get(word) f.count ++ for Finder f in lookup.values() if (f.count==f.max) res.add(f.res) 

Reset Algorithm:

 for Finder f in lookup.values() f.count = 0 

As for complexity, if n is the number of elements in set and m is the number of values ​​in setToObjMap , then the complexity will be O (n + m)

+3
source

Map iteration is one option. This takes O (n Γ— m) time, where n is the number of entries on the map, and m is the number of elements in the query set; The m-factor arises from checking the subset.

Another option generates all subsets of the set for searching and searching them on the map. It takes O (2 m). This may be faster than the first option if 2 ^ m is small compared to n (so m should be very small). In your use case, 2 ^ m = 2 ^ 10 = 1024, which is less than tens of thousands.

If the size of the query set is known to change, you can even use a hybrid strategy: calculate the number 2 ^ m and check if it is less than n, then select the best of these two parameters, depending on the result of the check.

+1
source

If the sets in question are small and the map is large, the best way would be to generate all the subsets of the sets and see them on the map.

If your collection has k elements and there are n associations on the map that require 2^k lookups vs. n , the routine checks another way. You see that for n = 1000 and k = 20 this will be a bad idea, but for n = 100000 and k = 10 it will be a victory.

+1
source

Another option is to create an index from one element into a set of keys:

 "hat" -> ["telephone", "hat"] "telephone" -> ["telephone", "hat"] "laugh"->["laugh", "fry", "mouse"] "fry"->["laugh", "fry", "mouse"] "mouse"->["laugh", "fry", "mouse"] "dog" -> ["dog", "cat"] "cat" -> ["dog", "cat"] 

This allows you to quickly enter query key sets.

+1
source

If the members of your sets are subject to some ordering, you can hold them in a tree structure and attach mappings of key values ​​to the leaves. Then, when you follow the path of the subset down the tree, all the leaves under this subtree will be sets containing your subset.

0
source

Source: https://habr.com/ru/post/908545/


All Articles