Align the two lists by adding special values ​​for the missing entries

I have two lists of values ​​of the same sortable type, which are sorted in ascending order, but (i) they do not have the same length, and (ii) the entries in one list may not be in the other and vice versa. However, I know that most of the values ​​in one list are in another, and that there are no duplicates in any list.

Thus, we may have this situation:

list1 = [value1-0, value1-1, value1-2, value1-3] list2 = [value2-0, value2-1, value2-2] 

If that happens, the order of values ​​from both lists:

 value1-0 < (value1-1 = value2-0) < value2-1 < value1-2 < value1-3 < value2-2 

we can assign the combined names of sorted values ​​to values ​​from two lists, for example:

 valueA < valueB < valueC < valueD < valueE < valueF 

so that the two lists can be written as:

 list1 = [valueA, valueB, valueD, valueE] list2 = [valueB, valueC, valueF] 

Given this, I want the lists to become:

 new_list1 = [valueA, valueB, "MISSING", valueD, valueE, "MISSING"] new_list2 = ["MISSING", valueB, valueC, "MISSING", "MISSING", valueF ] 

Can anyone help?

EDIT: An original question regarding datetime objects in particular (hence the comments regarding datetime s), but has been generalized to any sortable type.

+4
source share
3 answers

How about something like this:

 set1 = set(list1) set2 = set(list2) total = sorted(set1|set2) new_list1 = [x if x in set1 else "MISSING" for x in total] new_list2 = [x if x in set2 else "MISSING" for x in total] 
+4
source

This problem aroused my interest, so I wrote a too general solution.

Here is the function that

  • aligns any number of sequences
  • runs on iterators, so it can efficiently handle long (or infinite) sequences.
  • supports duplicate values
  • compatible with Python 2 and 3 (although I would use align_iterables(*inputs, missing_value=None) if I were not interested in historical versions of Python)

 import itertools def align_iterables(inputs, missing=None): """Align sorted iterables Yields tuples with values from the respective `inputs`, placing `missing` if the value does not exist in the corresponding iterable. Example: align_generator('bc', 'bf', '', 'abf') yields: (None, None, None, 'a') ('b', 'b', None, 'b') ('c', None, None, None) (None, 'f', None, 'f') """ End = object() iterators = [itertools.chain(i, [End]) for i in inputs] values = [next(i) for i in iterators] while not all(v is End for v in values): smallest = min(v for v in values if v is not End) yield tuple(v if v == smallest else missing for v in values) values = [next(i) if v == smallest else v for i, v in zip(iterators, values)] 

# Adapter for this problem:

 def align_two_lists(list1, list2, missing="MISSING"): value = list(zip(*list(align_iterables([list1, list2], missing=missing)))) if not value: return [[], []] else: a, b = value return [list(a), list(b)] 

# A set of tests for the issue issue:

 if __name__ == '__main__': assert align_two_lists('abcef', 'abcdef', '_') == [['a', 'b', 'c', '_', 'e', 'f'], ['a', 'b', 'c', 'd', 'e', 'f']] assert align_two_lists('a', 'abcdef', '_') == [['a', '_', '_', '_', '_', '_'], ['a', 'b', 'c', 'd', 'e', 'f']] assert align_two_lists('abcdef', 'a', '_') == [['a', 'b', 'c', 'd', 'e', 'f'], ['a', '_', '_', '_', '_', '_']] assert align_two_lists('', 'abcdef', '_') == [['_', '_', '_', '_', '_', '_'], ['a', 'b', 'c', 'd', 'e', 'f']] assert align_two_lists('abcdef', '', '_') == [['a', 'b', 'c', 'd', 'e', 'f'], ['_', '_', '_', '_', '_', '_']] assert align_two_lists('ace', 'abcdef', '_') == [['a', '_', 'c', '_', 'e', '_'], ['a', 'b', 'c', 'd', 'e', 'f']] assert align_two_lists('bdf', 'ace', '_') == [['_', 'b', '_', 'd', '_', 'f'], ['a', '_', 'c', '_', 'e', '_']] assert align_two_lists('ace', 'bdf', '_') == [['a', '_', 'c', '_', 'e', '_'], ['_', 'b', '_', 'd', '_', 'f']] assert align_two_lists('aaacd', 'acd', '_') == [['a', 'a', 'a', 'c', 'd'], ['a', '_', '_', 'c', 'd']] assert align_two_lists('acd', 'aaacd', '_') == [['a', '_', '_', 'c', 'd'], ['a', 'a', 'a', 'c', 'd']] assert align_two_lists('', '', '_') == [[], []] list1 = ["datetimeA", "datetimeB", "datetimeD", "datetimeE"] list2 = ["datetimeB", "datetimeC", "datetimeD", "datetimeF"] new_list1 = ["datetimeA", "datetimeB", "MISSING", "datetimeD", "datetimeE", "MISSING"] new_list2 = ["MISSING", "datetimeB", "datetimeC", "datetimeD", "MISSING", "datetimeF"] assert align_two_lists(list1, list2) == [new_list1, new_list2] 

# And some additional tests:

  # Also test multiple generators for expected, got in zip( [(None, None, None, 'a'), ('b', 'b', None, 'b'), ('c', None, None, None), (None, 'f', None, 'f')], align_iterables(['bc', 'bf', '', 'abf'])): assert expected == got assert list(align_iterables([])) == [] # And an infinite generator for expected, got in zip( [(0, 0), ('X', 1), (2, 2), ('X', 3), (4, 4)], align_iterables([itertools.count(step=2), itertools.count()], missing='X')): assert expected == got 
+3
source

You can try:

 new_list1=[] new_list2=[] i=j=0 while True: print '1 ' + str(new_list1) +' '+str(i) print '2 ' + str(new_list2) +' '+str(j) if list1[i]==list2[j]: new_list1 += [list1[i]] new_list2 += [list2[j]] i=i+1 j=j+1 elif list1[i]>list2[j]: new_list1 += ["MISSING"] new_list2 += [list2[j]] j=j+1 else: # list1[i]<list2[j] new_list1 += [list1[i]] new_list2 += ["MISSING"] i=i+1 if i>=len(list1) or j>=len(list2): break while i<len(list1): new_list1 += [list1[i]] new_list2 += ["MISSING"] i=i+1 while j<len(list2): new_list1 += ["MISSING"] new_list2 += [list2[j]] j=j+1 

It looks like a lot of code, but it should work and cycle through the lists only once.

+1
source

Source: https://habr.com/ru/post/1497702/


All Articles