Pythonic way to group a list using a dictionary that has lists as values

I am looking for a Putin way or a more efficient way to solve this problem. I have a dictionary that has many meanings (duplicates are allowed on all keys). Given a list, I have to create a dictionary that maps each category to an element using the key from the main dictionary. I will give an example to illustrate.

Main vocabulary

{ "KeyA": ['Aron', 'Ranom Value', 'Abhishek'], "KeyB": ['Ball', 'Foo', 'Bar', 'Badge', 'Dog'], "KeyZ": ['Random Value', 'Foo', 'Bar'] } 

entrance

 ['Foo', 'Bar', 'Dog', 'Aron'] 

Output

 { "KeyA": ['Aron'], "KeyB": ['Bar', 'Foo', 'Dog'], "KeyZ": ['Foo', 'Bar'] } 

My current thoughts

Invert individual elements in sets as keys, and then perform a search.

 { 'Aron' : ['KeyA'], 'Foo' : ['KeyB', 'KeyZ'], 'Bar' : ['KeyB', 'KeyZ'], 'Random Value' : ['KeyA', 'KeyZ'] } 

I would initialize the inverted dictionary by going through each element in each set. The approximate time to create such a dictionary is O (n). Look for the item in the list in the inverted dictionary created in this way. Say the value Bar . Create a new dictionary using the 'Bar': ['KeyB', 'KeyZ'] . The resulting dictionary will be {'KeyB': ['Bar'], 'KeyZ': ['Bar']} . For the next element, I will need to do some accounting work in the existing dictionary, for example, the key exists or not, if so, add the existing list, etc.

Use the in operator on the given mapping (check that it belongs) to each key

The main dictionary and input list will in most cases be quite small. (less than 500 unique items in all sets together). Therefore, I could check the membership in the set returned by each key and create a dictionary. This is obviously less effective, but works in most cases.

I have a few more manipulations that are similar to the example above. I do not want to do manual bookkeeping for everyone, because they are error prone and slower than the built-in functions.

What I need?

  • Better approaches (faster algorithm)
  • Built-in functions in itertools because they are faster
  • Third party library
  • Some esoteric considerations that a normal Python user would not think about?
+5
source share
5 answers

How about you convert a list to a set before starting the conversion? Search set is faster than linear search in lists.

 input_set = set(input) 

Once you have this, you can use regular dictation, in my opinion:

 output = {key: [x for x in value if x in input_set] for key, value in master_dict.items()} 

Result:

 output == {'KeyB': ['Foo', 'Bar', 'Dog'], 'KeyA': ['Aron'], 'KeyZ': ['Foo', 'Bar']} 
+5
source

one way uses intersection in python as follows:

 x={ "KeyA": ['Aron', 'Ranom Value', 'Abhishek'], "KeyB": ['Ball', 'Foo', 'Bar', 'Badge', 'Dog'], "KeyZ": ['Random Value', 'Foo', 'Bar'] } items = ['Foo', 'Bar', 'Dog', 'Aron'] {k: set(items).intersection(set(v)) for k, v in x.items()} 
+4
source

How about defining defaultdict and list.

 from collections import defaultdict result = defaultdict(list) d = { "KeyA": ['Aron', 'Ranom Value', 'Abhishek'], "KeyB": ['Ball', 'Foo', 'Bar', 'Badge', 'Dog'], "KeyZ": ['Random Value', 'Foo', 'Bar'] } items = ['Foo', 'Bar', 'Dog', 'Aron'] [result[k].append(e) for k,v in d.items() for e in v if e in items] print(result) # defaultdict(<type 'list'>, {'KeyB': ['Foo', 'Bar', 'Dog'], 'KeyA': ['Aron'], 'KeyZ': ['Foo', 'Bar']}) print(dict(result)) # {'KeyB': ['Foo', 'Bar', 'Dog'], 'KeyA': ['Aron'], 'KeyZ': ['Foo', 'Bar']} 
+1
source

Another possible approach:

One way to speed up the search time to check if a value exists in input_set is to use a binary search that is O(logn) .

Here is sample code that also uses convienient collections.defaultdict :

 from collections import defaultdict master = { "KeyA": ['Aron', 'Ranom Value', 'Abhishek'], "KeyB": ['Ball', 'Foo', 'Bar', 'Badge', 'Dog'], "KeyZ": ['Random Value', 'Foo', 'Bar'] } input_set = ['Foo', 'Bar', 'Dog', 'Aron'] sorted_list = sorted(input_set) d = defaultdict(list) for key, value in master.items(): for v in value: if binary_search(sorted_list, v): d[key].append(v) print(d) 

What outputs:

 defaultdict(<class 'list'>, {'KeyA': ['Aron'], 'KeyB': ['Foo', 'Bar', 'Dog'], 'KeyZ': ['Foo', 'Bar']}) 

Where binary_search() is defined below:

 def binary_search(item_list,item): first = 0 last = len(item_list)-1 while first <= last: mid = (first + last)//2 if item_list[mid] == item : return True elif item < item_list[mid]: last = mid - 1 else: first = mid + 1 return False 

The above code looks like the inventor of the wheel. You can see the bisect module, which provides some ways to invoke a binary search without having to write your own function.

Note. . To use binary search, you also need to sort the values ​​before hand, namely O(nlogn) . I'm not quite sure how much this will affect you, you will have to run several tests with a different approach to see the difference.

In addition, as @SuperSaiyan noted, converting input_set to set is the most efficient approach, since in the best case, O(1) and O(n) are searched in the worst case (rarely).

+1
source

OP suggested a reverse dictionary . This is perhaps still pythonic, so here's how to implement it.

Considering

 import collections as ct master_dict = { "KeyA": ['Aron', 'Random Value', 'Abhishek'], "KeyB": ['Ball', 'Foo', 'Bar', 'Badge', 'Dog'], "KeyZ": ['Random Value', 'Foo', 'Bar'] } input_list = ['Foo', 'Bar', 'Dog', 'Aron'] 

code

We use collections.defaultdict to facilitate the creation of list values.

 reverse_dict = ct.defaultdict(list) for k, v in master_dict.items(): for item in v: reverse_dict[item].append(k) reverse_dict 

Output

 defaultdict(list, {'Abhishek': ['KeyA'], 'Aron': ['KeyA'], 'Badge': ['KeyB'], 'Ball': ['KeyB'], 'Bar': ['KeyB', 'KeyZ'], 'Dog': ['KeyB'], 'Foo': ['KeyB', 'KeyZ'], 'Random Value': ['KeyA', 'KeyZ']}) 

Now that you can search for inputs by key, the search is faster than searching for each list of strings. We build the final dictionary from the list of input values.

 final_dict = ct.defaultdict(list) for v in input_list: for k in reverse_dict[v]: final_dict[k].append(v) final_dict 

Output

 defaultdict(list, {'KeyA': ['Aron'], 'KeyB': ['Foo', 'Bar', 'Dog'], 'KeyZ': ['Foo', 'Bar']}) 

@SuperSaiyan suggested rebuilding lists for each key in the main dictionary by searching for a set of input list. This is a brilliant and excellent approach for this particular application.

+1
source

Source: https://habr.com/ru/post/1274338/


All Articles