Removing duplicates from a list of lists based on a comparison of an internal list item

I have a large list of lists and you need to remove duplicate items based on certain criteria:

  • Uniqueness is determined by the first element of lists.
  • Duplicate removal is determined by comparing the value of the second element of the duplicate lists, namely saving the list with the smallest second element.

[[1, 4, 5], [1, 3, 4], [1, 2, 3]]

All the above lists are considered duplicate, since their first elements are equal. The third list needs to be kept since the second element is the smallest. Please note that the actual list of lists contains more than 4 million items, is double sorted and the save order is saved.

The list is first sorted based on the second element of the internal lists and in reverse (descending) order, followed by the normal (ascending) order based on the first element:

sorted(sorted(the_list, key=itemgetter(1), reverse=True), key=itemgetter(0))

An example of three duplicate lists in their actual order:

[...
[33554432, 50331647, 1695008306],
[33554432, 34603007, 1904606324],
[33554432, 33554687, 2208089473],
...]

The goal is to prepare a search list in half. Can someone let me know how this can be achieved using Python?

+4
source share
2 answers

You can group items with a dict, always keeping a sublist with a smaller second item:

l = [[1, 2, 3], [1, 3, 4], [1, 4, 5], [2, 4, 3], [2, 5, 6], [2, 1, 3]]
d = {}
for sub in l:
    k = sub[0]
    if k not in d or sub[1] < d[k][1]:
        d[k] = sub

You can also pass two keys for sorting, you do not need to sort calls twice:

In [3]:  l = [[1,4,6,2],[2,2,4,6],[1,2,4,5]]
In [4]: sorted(l,key=lambda x: (-x[1],x[0]))
Out[4]: [[1, 4, 6, 2], [1, 2, 4, 5], [2, 2, 4, 6]]

If you want to keep order in a dict, since the order has to be saved .:

from collections import OrderedDict

l = [[1, 2, 3], [1, 3, 4], [1, 4, 5], [2, 4, 3], [2, 5, 6], [2, 1, 3]]
d = OrderedDict()
for sub in l:
    k = sub[0]
    if k not in d or sub[1] < d[k][1]:
        d[sub[0]] = sub

, , .

, sortedcontainers.sorteddict:

A SortedDict , dict. , SortedDict . , , popitem ..

, , Pythons, dict. , dict. .

from sortedcontainers import SortedDict

l = [[1, 2, 3], [1, 3, 4], [1, 4, 5], [2, 4, 3], [2, 5, 6], [2, 1, 3]]
d = SortedDict()
for sub in l:
    k = sub[0]
    if k not in d or sub[1] < d[k][1]:
        d[k] = sub


print(list(d.values()))

, bisect, bisect_left ..

+2

, :

mylist = [[1, 2, 3], [1, 3, 4], [1, 4, 5], [7, 3, 6], [7, 1, 8]]

ordering = []
newdata = {}

for a, b, c in mylist:
    if a in newdata:
        if b < newdata[a][1]:
            newdata[a] = [a, b, c]
    else:
        newdata[a] = [a, b, c]
        ordering.append(a)

newlist = [newdata[v] for v in ordering]

, newlist [[1, 2, 3], [7, 1, 8]].

+1

Source: https://habr.com/ru/post/1620547/


All Articles