How to replace a list of values ​​in a numpy array?

I have an unsorted array of numbers.

I need to replace certain numbers (given in the list) with specific alternatives (also indicated in the corresponding list)

I wrote the following code (which seems to work):

import numpy as np numbers = np.arange(0,40) np.random.shuffle(numbers) problem_numbers = [33, 23, 15] # table, night_stand, plant alternative_numbers = [12, 14, 26] # desk, dresser, flower_pot for i in range(len(problem_numbers)): idx = numbers == problem_numbers[i] numbers[idx] = alternative_numbers[i] 

However, this seems extremely inefficient (this needs to be done several million times for much larger arrays).

I found this question that answers a similar problem, however in my case the numbers are not sorted and they need to keep their original location.

Note: numbers may contain few or no occurrences of elements in problem_numbers

+6
source share
2 answers

EDIT: I implemented a version of TensorFlow about this in this answer (almost the same, except for the replacement are dictate).


Here is an easy way to do this:

 import numpy as np numbers = np.arange(0,40) np.random.shuffle(numbers) problem_numbers = [33, 23, 15] # table, night_stand, plant alternative_numbers = [12, 14, 26] # desk, dresser, flower_pot # Replace values problem_numbers = np.asarray(problem_numbers) alternative_numbers = np.asarray(alternative_numbers) n_min, n_max = numbers.min(), numbers.max() replacer = np.arange(n_min, n_max + 1) # Mask replacements out of range mask = (problem_numbers >= n_min) & (problem_numbers <= n_max) replacer[problem_numbers[mask] - n_min] = alternative_numbers[mask] numbers = replacer[numbers - n_min] 

This works well and should be effective as long as the range of values ​​in numbers (the difference between the smallest and the largest) is not large (for example, you don't have something like 1 , 7 and 10000000000 ).

Benchmarking

I compared the code in OP with the three (currently) proposed solutions with this code:

 import numpy as np def method_itzik(numbers, problem_numbers, alternative_numbers): numbers = np.asarray(numbers) for i in range(len(problem_numbers)): idx = numbers == problem_numbers[i] numbers[idx] = alternative_numbers[i] return numbers def method_mseifert(numbers, problem_numbers, alternative_numbers): numbers = np.asarray(numbers) replacer = dict(zip(problem_numbers, alternative_numbers)) numbers_list = numbers.tolist() numbers = np.array(list(map(replacer.get, numbers_list, numbers_list))) return numbers def method_divakar(numbers, problem_numbers, alternative_numbers): numbers = np.asarray(numbers) problem_numbers = np.asarray(problem_numbers) problem_numbers = np.asarray(alternative_numbers) # Pre-process problem_numbers and correspondingly alternative_numbers # such that repeats and no matches are taken care of sidx_pn = problem_numbers.argsort() pn = problem_numbers[sidx_pn] mask = np.concatenate(([True],pn[1:] != pn[:-1])) an = alternative_numbers[sidx_pn] minN, maxN = numbers.min(), numbers.max() mask &= (pn >= minN) & (pn <= maxN) pn = pn[mask] an = an[mask] # Pre-pocessing done. Now, we need to use pn and an in place of # problem_numbers and alternative_numbers repectively. Map, index and assign. sidx = numbers.argsort() idx = sidx[np.searchsorted(numbers, pn, sorter=sidx)] valid_mask = numbers[idx] == pn numbers[idx[valid_mask]] = an[valid_mask] def method_jdehesa(numbers, problem_numbers, alternative_numbers): numbers = np.asarray(numbers) problem_numbers = np.asarray(problem_numbers) alternative_numbers = np.asarray(alternative_numbers) n_min, n_max = numbers.min(), numbers.max() replacer = np.arange(n_min, n_max + 1) # Mask replacements out of range mask = (problem_numbers >= n_min) & (problem_numbers <= n_max) replacer[problem_numbers[mask] - n_min] = alternative_numbers[mask] numbers = replacer[numbers - n_min] return numbers 

Results, achievements:

 import numpy as np np.random.seed(100) MAX_NUM = 100000 numbers = np.random.randint(0, MAX_NUM, size=100000) problem_numbers = np.unique(np.random.randint(0, MAX_NUM, size=500)) alternative_numbers = np.random.randint(0, MAX_NUM, size=len(problem_numbers)) %timeit method_itzik(numbers, problem_numbers, alternative_numbers) 10 loops, best of 3: 63.3 ms per loop # This method expects lists problem_numbers_l = list(problem_numbers) alternative_numbers_l = list(alternative_numbers) %timeit method_mseifert(numbers, problem_numbers_l, alternative_numbers_l) 10 loops, best of 3: 20.5 ms per loop %timeit method_divakar(numbers, problem_numbers, alternative_numbers) 100 loops, best of 3: 9.45 ms per loop %timeit method_jdehesa(numbers, problem_numbers, alternative_numbers) 1000 loops, best of 3: 822 Β΅s per loop 
+3
source

If not all problem_values are in numbers , and they can occur several times:

In this case, I would simply use dict to save the values ​​for the replacement and use dict.get to translate the problem numbers:

 replacer = dict(zip(problem_numbers, alternative_numbers)) numbers_list = numbers.tolist() numbers = np.array(list(map(replacer.get, numbers_list, numbers_list))) 

Even though it needs to go through Python, it is almost self-evident and not much slower than a NumPy solution (maybe).

If each problem_value really present in the numbers array and only once:

If you have numpy_indexed , you can just use numpy_indexed.indices :

 >>> import numpy_indexed as ni >>> numbers[ni.indices(numbers, problem_numbers)] = alternative_numbers 

This should be quite effective even for large arrays.

+2
source

Source: https://habr.com/ru/post/1270961/


All Articles