Remove duplicates in a list of lists based on the third item in each sublist

I have a list of lists that looks like this:

c = [['470', '4189.0', 'asdfgw', 'fds'],
     ['470', '4189.0', 'qwer', 'fds'],
     ['470', '4189.0', 'qwer', 'dsfs fdv'] 
      ...]

chas about 30,000 internal listings. What I would like to do is to eliminate duplicates based on the 4th item in each internal list. Thus, the list of the above lists will look like this:

c = [['470', '4189.0', 'asdfgw', 'fds'],['470', '4189.0', 'qwer', 'dsfs fdv'] ...]

Here is what I still have:

d = [] #list that will contain condensed c
d.append(c[0]) #append first element, so I can compare lists
for bact in c: #c is my list of lists with 30,000 interior list
    for items in d:
        if bact[3] != items[3]:
            d.append(bact)  

I think this should work, but it just starts and starts. I let him work for 30 minutes and then killed him. I don’t think the program should take so much time, so I assume that something is wrong with my logic.

, . , , , , . , , , .

+1
3

:

seen = set()
cond = [x for x in c if x[3] not in seen and not seen.add(x[3])]

:

seen - , . cond - . , x[3] ( x - c) seen, x cond, x[3] seen.

seen.add(x[3]) None, not seen.add(x[3]) True, , x[3] not in seen True, Python . , True x[3] seen. , (print None " " -):

>>> False and not print('hi')
False
>>> True and not print('hi')
hi
True
+3

:

for items in d:
    if bact[3] != items[3]:
        d.append(bact)  

bact d d, . :

for items in d:
    if bact[3] == items[3]:
        break
else:
    d.append(bact)  

bact , d . , , .


, ( , ) set , . , () .

d = []
seen = set()
for bact in c:
    if bact[3] not in seen: # membership test
        seen.add(bact[3])
        d.append(bact)
+1

Use pandas. I assume you also have column names.

c = [['470', '4189.0', 'asdfgw', 'fds'],
     ['470', '4189.0', 'qwer', 'fds'],
     ['470', '4189.0', 'qwer', 'dsfs fdv']]
import pandas as pd
df = pd.DataFrame(c, columns=['col_1', 'col_2', 'col_3', 'col_4'])
df.drop_duplicates('col_4', inplace=True)
print df

  col_1   col_2   col_3     col_4
0   470  4189.0  asdfgw       fds
2   470  4189.0    qwer  dsfs fdv
0
source

Source: https://habr.com/ru/post/1545395/


All Articles