Matching two lists without loops

I have two lists of the same length. The first list l1 contains data.

 l1 = [2, 3, 5, 7, 8, 10, ... , 23] 

The second list l2 contains a category in which data in l1 belongs to:

 l2 = [1, 1, 2, 1, 3, 4, ... , 3] 

How can I split the first list based on positions defined by numbers, such as 1, 2, 3, 4 in the second list, using list comprehension or lambda function. For example, 2, 3, 7 from the first list refers to the same section as the corresponding values ​​in the second list.

The number of sections is known at the beginning.

+5
source share
7 answers

You can use the dictionary:

 >>> l1 = [2, 3, 5, 7, 8, 10, 23] >>> l2 = [1, 1, 2, 1, 3, 4, 3] >>> d = {} >>> for i, j in zip(l1, l2): ... d.setdefault(j, []).append(i) ... >>> >>> d {1: [2, 3, 7], 2: [5], 3: [8, 23], 4: [10]} 
+9
source

If a dict is ok, I suggest using defaultdict :

 >>> from collections import defaultdict >>> d = defaultdict(list) >>> for number, category in zip(l1, l2): ... d[category].append(number) ... >>> d defaultdict(<type 'list'>, {1: [2, 3, 7], 2: [5], 3: [8, 23], 4: [10]}) 

Use itertools.izip consider using memory if you are using Python 2.

This is basically the same solution as Kasramvd, but I think defaultdict makes it a little easier to read.

+8
source

This will give a list of sections using list comprehension:

 >>> l1 = [2, 3, 5, 7, 8, 10, 23] >>> l2 = [1, 1, 2, 1, 3, 4, 3] >>> [[value for i, value in enumerate(l1) if j == l2[i]] for j in set(l2)] [[2, 3, 7], [5], [8, 23], [10]] 
+2
source

Nested list comprehension:

[ [ l1[j] for j in range(len(l1)) if l2[j] == i ] for i in range(1, max(l2)+1 )]

+1
source

If it is reasonable to store your data in numpy ndarrays, you can use advanced indexing

 {i:l1[l2==i] for i in set(l2)} 

build a ndarrays dictionary indexed by category code.

There is overhead information related to l2==i (i.e. building a new logical array for each category) that grows with the number of categories, so you can check which alternative - numpy or defaultdict is faster with your data.

I tested with n=200000 , nc=20 and numpy was faster than defaultdict + izip (124 vs 165 ms), but with nc=10000 numpy was (much) slower (11300 vs 251 ms)

+1
source

Using some itertools and operator goodies and sorting you can do this in one liner:

 >>> l1 = [2, 3, 5, 7, 8, 10, 23] >>> l2 = [1, 1, 2, 1, 3, 4, 3] >>> itertools.groupby(sorted(zip(l2, l1)), operator.itemgetter(0)) 

The result of this is the itertools.groupby object, which can be repeated:

 >>> for g, li in itertools.groupby(sorted(zip(l2, l1)), operator.itemgetter(0)): >>> print(g, list(map(operator.itemgetter(1), li))) 1 [2, 3, 7] 2 [5] 3 [8, 23] 4 [10] 
+1
source

This is not a list comprehension, but a dictionary comprehension. It is similar to @cromod's solution, but retains the β€œcategories” from l2 :

 {k:[val for i, val in enumerate(l1) if k == l2[i]] for k in set(l2)} 

Output:

 >>> l1 [2, 3, 5, 7, 8, 10, 23] >>> l2 [1, 1, 2, 1, 3, 4, 3] >>> {k:[val for i, val in enumerate(l1) if k == l2[i]] for k in set(l2)} {1: [2, 3, 7], 2: [5], 3: [8, 23], 4: [10]} >>> 
+1
source

Source: https://habr.com/ru/post/1247806/


All Articles