Smoothing nested loops / reducing complexity - algorithm for counting additional pairs

I recently tried to solve some kind of problem in Python, and I found a solution that seems to have O (n log n) complexity, but I believe that it is very inefficient for some inputs (for example, the first parameter is 0 and pairs are very long list of zeros).

It also has three levels of for loops. I believe that it can be optimized, but at the moment I can’t optimize it anymore, I probably just don’t see anything obvious;)

So basically the problem is this:

The specified list of integers ( values ), the function should return the number of pairs of indices that meet the following criteria:

  • suggests that one pair of indices is a tuple of type (index1, index2) ,
  • then values[index1] == complementary_diff - values[index2] true,

Example : If for a list such as [1, 3, -4, 0, -3, 5] as values and 1 as complementary_diff , the function should return 4 (which is the length of the following list of index pairs: [(0, 3), (2, 5), (3, 0), (5, 2)] ).

This is what I have so far, it should work fine in most cases, but, as I said, in some cases it could work very slowly, despite its complexity approaching O (n log n) (it looks like pessimistic complexity O (n ^ 2)).

 def complementary_pairs_number (complementary_diff, values): value_key = {} # dictionary storing indexes indexed by values for index, item in enumerate(values): try: value_key[item].append(index) except (KeyError,): # the item has not been found in value_key keys value_key[item] = [index] key_pairs = set() # key pairs are unique by nature for pos_value in value_key: # iterate through keys of value_key dictionary sym_value = complementary_diff - pos_value if sym_value in value_key: # checks if the symmetric value has been found for i1 in value_key[pos_value]: # iterate through pos_values' indexes for i2 in value_key[sym_value]: # as above, through sym_values # add indexes' pairs or ignore if already added to the set key_pairs.add((i1, i2)) key_pairs.add((i2, i1)) return len(key_pairs) 

In this example, it behaves like this:

 >>> complementary_pairs_number(1, [1, 3, -4, 0, -3, 5]) 4 

If you see how the code can be flattened or simplified, let me know.

I'm not sure if just checking for complementary_diff == 0 , etc. - the best approach - if you think so, let me know.

EDIT: I fixed the example (thanks, unutbu!).

+6
source share
5 answers

I think this improves O(n) complexity:

  • value_key.setdefault(item,[]).append(index) is faster than using try..except blocks. It is also faster than using collections.defaultdict(list) . (I tested this with ipython% timeit.)
  • The source code visits each solution twice. For each pos_value in value_key there is a unique sym_value associated with pos_value . There are solutions when sym_value also in value_key . But when we value_key over the keys in value_key , pos_value is ultimately assigned to the value of sym_value , which make the code repeat the calculation that has already been done. So you can cut the work in half if you can stop pos_value from aligning the old sym_value . I implemented this using seen = set() to save the track of the seen sym_value s.
  • This code only cares about len(key_pairs) , not key_pairs . Therefore, instead of tracking pairs (with set ), we can simply track the score (with num_pairs ). Thus, we can replace the two inner loops with

     num_pairs += 2*len(value_key[pos_value])*len(value_key[sym_value]) 

    or half as much as in the case of the "unique diagonal", pos_value == sym_value .


 def complementary_pairs_number(complementary_diff, values): value_key = {} # dictionary storing indexes indexed by values for index, item in enumerate(values): value_key.setdefault(item,[]).append(index) # print(value_key) num_pairs = 0 seen = set() for pos_value in value_key: if pos_value in seen: continue sym_value = complementary_diff - pos_value seen.add(sym_value) if sym_value in value_key: # print(pos_value, sym_value, value_key[pos_value],value_key[sym_value]) n = len(value_key[pos_value])*len(value_key[sym_value]) if pos_value == sym_value: num_pairs += n else: num_pairs += 2*n return num_pairs 
+4
source

You might want to explore functional programming idioms such as shorthand, etc.

Often, the nested logic of an array can be simplified using functions such as reduce, display, reject, etc.

For an example (in javascript) check the js underscore icon. I am not very smart in Python, so I don’t know what libraries they have.

+2
source

I think (some or all) that would help, but I'm not sure how to prove it.

1) Take the values ​​and reduce them to a specific set of values ​​by writing down the count of each element (O (n))

2) Sort the resulting array. (n log n)

3) If you can allocate a lot of memory, I think you could fill a sparse array with values, so if the range of values ​​is -100: +100, select the array from [201] and any value that exists in the given set causes one index index value in a large sparse array.

4) Any value you want to check if it matches your condition should now look for the index in the sparse array according to the xy relation and see if the value exists there.

5), as denoted by unutbu, it is trivially symmetric, therefore, if {a, b} is a pair, that is, {b, a}.

0
source

I think you can improve this by separating part of the algebra from the search and using more intelligent data structures.

  • Go through the list and subtract from the additional diff for each element in the list.

     resultlist[index] = complementary_diff - originallist[index] 

    You can use either a map or a simple loop. β†’ Accepts O (n) time.

  • See if the number in the resulting list exists in the source list.

    • Here, with a naive list, you really get O (n ^ 2), because in the end you can find the whole source list for each item.

    • However, there are more reasonable ways to organize your data than this. If you have an original list sorted , your search time boils down to O (nlogn + nlogn) = O (nlogn), nlogn for sorting, and nlogn for binary search for each item.

    • If you want to be smarter, you can make your list in a dictionary (or hash table) , and then this step becomes O (n + n) = O (n), n to create a dictionary and 1 * n to search for each element in dictionary. (* EDIT: * Since you cannot consider the uniqueness of each value in the original list. You may need to calculate how many times each value appears in the original list.)

So now you get O (n) total execution time.

Using your example:

 1, [1, 3, -4, 0, -3, 5], 
  • Create a list of results:

     >>> resultlist [0, -2, 5, 1, 4, -4]. 
  • Now we are looking for:

    • Smooth the source list in the dictionary. I decided to use the index of the source list as a value, as this seems like the side data that interests you.

       >>> original_table {(1,0), (3,1), (-4,2), (0,3), (-3,4), (5,5)} 
    • For each item in the list of results, search the hash table and create a tuple:

       (resultlist_index, original_table[resultlist[resultlist_index]]) 

      This should look like an example of the solution you had.

  • Now you just find the length of the resulting list of tuples.

Now here is the code:

 example_diff = 1 example_values = [1, 3, -4, 0, -3, 5] example2_diff = 1 example2_values = [1, 0, 1] def complementary_pairs_number(complementary_diff, values): """ Given an integer complement and a list of values count how many pairs of complementary pairs there are in the list. """ print "Input:", complementary_diff, values # Step 1. Result list resultlist = [complementary_diff - value for value in values] print "Result List:", resultlist # Step 2. Flatten into dictionary original_table = {} for original_index in xrange(len(values)): if values[original_index] in original_table: original_table[values[original_index]].append(original_index) else: original_table[values[original_index]] = [original_index] print "Flattened dictionary:", original_table # Step 2.5 Search through dictionary and count up the resulting pairs. pair_count = 0 for resultlist_index in xrange(len(resultlist)): if resultlist[resultlist_index] in original_table: pair_count += len(original_table[resultlist[resultlist_index]]) print "Complementary Pair Count:", pair_count # (Optional) Step 2.5 Search through dictionary and create complementary pairs. Adds O(n^2) complexity. pairs = [] for resultlist_index in xrange(len(resultlist)): if resultlist[resultlist_index] in original_table: pairs += [(resultlist_index, original_index) for original_index in original_table[resultlist[resultlist_index]]] print "Complementary Pair Indices:", pairs # Step 3 return pair_count if __name__ == "__main__": complementary_pairs_number(example_diff, example_values) complementary_pairs_number(example2_diff, example2_values) 

Conclusion:

 $ python complementary.py Input: 1 [1, 3, -4, 0, -3, 5] Result List: [0, -2, 5, 1, 4, -4] Flattened dictionary: {0: 3, 1: 0, 3: 1, 5: 5, -4: 2, -3: 4} Complementary Pair Indices: [(0, 3), (2, 5), (3, 0), (5, 2)] Input: 1 [1, 0, 1] Result List: [0, 1, 0] Flattened dictionary: {0: [1], 1: [0, 2]} Complementary Pair Count: 4 Complementary Pair Indices: [(0, 1), (1, 0), (1, 2), (2, 1)] 

Thanks!

0
source

Changed the solution provided by @unutbu:

The problem can be reduced to comparing these 2 dictionaries:

  • values

  • pre-computed dictionary for (complementary_diff - values ​​[i])

     def complementary_pairs_number(complementary_diff, values): value_key = {} # dictionary storing indexes indexed by values for index, item in enumerate(values): value_key.setdefault(item,[]).append(index) answer_key = {} # dictionary storing indexes indexed by (complementary_diff - values) for index, item in enumerate(values): answer_key.setdefault((complementary_diff-item),[]).append(index) num_pairs = 0 print(value_key) print(answer_key) for pos_value in value_key: if pos_value in answer_key: num_pairs+=len(value_key[pos_value])*len(answer_key[pos_value]) return num_pairs 
0
source

Source: https://habr.com/ru/post/905916/


All Articles