List unique lines in a list

Disclaimer: I am not an experienced Python user.

I ran into a task, and now I'm trying to find the most elegant way to do this in Python.

Here the task itself: the given list string returns an int list (each int from 0 to N - 1, where N is the number of unique lines in the list), where each int corresponds to a specific line from the original list. The same lines must be matched with the same numbers, different lines with different numbers.

The first thing I came up with seems "a bit" complicated:

 a = ["a","b","a","c","b","a"] map(lambda x: dict(map(lambda x: reversed(x), enumerate(set(a))))[x], a) 

The result of the code above:

 [0, 2, 0, 1, 2, 0] 
+4
source share
5 answers

You can use the dict and list expressions:

 >>> a = ["a","b","a","c","b","a"] >>> d = {x:i for i, x in enumerate(set(a))} >>> [d[item] for item in a] [0, 2, 0, 1, 2, 0] 

To keep order:

 >>> seen = set() >>> d = { x:i for i, x in enumerate(y for y in a if y not in seen and not seen.add(y))} >>> [d[item] for item in a] [0, 1, 0, 2, 1, 0] 

The above understanding of dict is equivalent:

 >>> seen = set() >>> lis = [] for item in a: if item not in seen: seen.add(item) lis.append(item) ... >>> lis ['a', 'b', 'c'] >>> d = {x:i for i,x in enumerate(lis)} 
+4
source

I think your typing approach can lead to errors if you want to keep the character order of the approach. In fact, you can see this in your example - 'b' got index 2 instead of 1 . If you want to keep order, you can use OrderedDict :

 >>> a = ["a","b","a","c","b","a"] >>> d = {x:i for i, x in enumerate(OrderedDict(izip(a, a)).values())} >>> [d[x] for x in a] [0, 1, 0, 2, 1, 0] 
+2
source

Focus on readability, not speed: I would use the list index method with a list:

 >>> a = ["a","b","a","c","b","a"] >>> b = list(set(a)) >>> c = [b.index(x) for x in a] >>> c [0, 2, 0, 1, 2, 0] 
+1
source

First you will get unique lines from the list and list it so that you have a number (from 0 to N-1) for each line. then get this value for each of the lines and put it on the list. this is how it is done, in one line:

 a = ["a","b","a","c","b","a"] [{s:i for i, s in enumerate(set(a))}[s] for s in a] 
0
source

You can also do this with defaultdict and count iterator.

 >>> from collections import defaultdict >>> from itertools import count >>> a = ["a","b","a","c","b","a"] >>> x = defaultdict(count().next) >>> [x[i] for i in a] [0, 1, 0, 2, 1, 0] 
0
source

Source: https://habr.com/ru/post/1502383/