How to remove case insensitive duplicates from a list while maintaining the original list?

I have a list of strings, for example:

myList = ["paper", "Plastic", "aluminum", "PAPer", "tin", "glass", "tin", "PAPER", "Polypropylene Plastic"]

I want this result (and this is the only acceptable result):

myList = ["paper", "Plastic", "aluminum", "tin", "glass", "Polypropylene Plastic"]

Note that if element ( "Polypropylene Plastic") contains another element ( "Plastic"), I would still like to keep both elements. Thus, the cases may be different, but this element must be alphabetic so that it can be removed.

The original list order must be kept. All duplicates after the first instance of this item must be deleted. The original case of this first instance must be preserved, as well as the original cases of all non-duplicated elements.

I searched and found only questions that relate to one or the other problem, and not both.

+4
6

( ) - /, .

set, .

set, , . , ​​

myList = ["paper", "Plastic", "aluminum", "PAPer", "tin", "glass", "tin", "PAPER", "Polypropylene Plastic"]
result=[]

marker = set()

for l in myList:
    ll = l.lower()
    if ll not in marker:   # test presence
        marker.add(ll)
        result.append(l)   # preserve order

print(result)

:

['paper', 'Plastic', 'aluminum', 'tin', 'glass', 'Polypropylene Plastic']

.casefold() .lower(), "" (, "s" Strasse/Straße).

: , :

marker = set()
result = [not marker.add(x.casefold()) and x for x in myList if x.casefold() not in marker]

and None set.add ( , ...), x , . :

  • , casefold() , ,
+12

EDIT: , , . , , .

import string

def custom_filter(my_list):
    seen = set()
    result_list = []
    for i in my_list:
        item = string.capwords(i)
        if item not in my_list:
            item = item.lower()
        if item not in seen:
            result_list.append(item)
            seen.add(item)
    return result_list


print(custom_filter(myList))

:

['paper', 'Plastic', 'aluminum', 'tin', 'glass', 'Polypropylene Plastic']
0
mydict = {}
myList = ["paper", "Plastic", "aluminum", "tin", "glass", "Polypropylene Plastic"]
mynewList = []
for elem in myList:
  if elem.lower() in mydict:
     continue
  else:
     mydict[elem.lower()] = elem.lower()
     mynewList.append(elem)
print(mynewList)

['paper', 'Plastic', 'aluminum', 'tin', 'glass', 'Polypropylene Plastic']

, , Jean-François Fabre, .

0
import pandas as pd
df=pd.DataFrame(myList)
df['lower']=df[0].apply(lambda x: x.lower())
df.groupby('lower',sort=0)[0].first().tolist()

:

['paper', 'Plastic', 'aluminum', 'tin', 'glass','Polypropylene Plastic']
0

: collections.defaultdict

from collections import defaultdict

myList = ["paper", "Plastic", "aluminum", "PAPer", "tin", "glass", "tin", "PAPER", "Polypropylene Plastic"]
d_dict = defaultdict(list)
for k,v in enumerate(myList):
    d_dict[v.lower()].append(k)

[myList[j] for j in sorted(i[0] for i in d_dict.values())]

['paper', 'Plastic', 'aluminum', 'tin', 'glass', 'Polypropylene Plastic']
0

@Gábor Fekete . :

myList = ["paper", "Plastic", "aluminum", "PAPer", "tin", "glass",
          "tin", "PAPER", "Polypropylene Plastic"]

def is_already_in(value, used_elements):
  low = value.lower()
  if low in used_elements:
    return True
  used_elements.add(low)
  return False

used_elements = set()
print([ e for e in myList if not is_already_in(e, used_elements) ])
-1
source

Source: https://habr.com/ru/post/1692266/


All Articles