Replacing NumPy array entries with their frequencies / dictionary values

Problem: from two input arrays I want to output an array with a frequency of True values ​​(from input_2) corresponding to each value of input_1.

import numpy as np   # import everything from numpy
from scipy.stats import itemfreq
input_1 = np.array([3,6,6,3,6,4])
input_2 = np.array([False, True, True, False, False, True])

In this example, the output I want is:

output_1 = np.array([0,2,2,0,2,1])

My current approach involves editing input_1, so only values ​​matching True are left:

locs=np.where(input_2==True,input_1,0)

Then, counting the frequency of each answer, creating a dictionary and replacing the corresponding enter_1 keys with values ​​(true frequencies).

loc_freq = itemfreq(locs)
dic = {}
for key,val in loc_freq:
    dic[key]=val
print dic
for k, v in dic.iteritems():
    input_1[input_1==k]=v

which outputs [3,2,2,3,2,1].

The problem here is twofold: 1) it still does not do anything with keys that are not in the dictionary (and therefore should be replaced with 0). For example, how can I turn 3s into 0s? 2) It seems very inefficient / inefficient. Is there a better way to approach this?

+4
3

np.bincount - , .

output_1 = np.bincount(input_1[input_2])[input_1]
+2

@memecs , +1. , input_1 , .. , , , .

np.bincount(input_1[input_2]).size, input_1 True input_2.

unique bincount. input_1, bincount , , 1 0 input_2 (True False):

# extract unique elements and the indices to reconstruct the array
unq, idx = np.unique(input_1, return_inverse=True)
# calculate the weighted frequencies of these indices
freqs_idx = np.bincount(idx, weights=input_2)
# reconstruct the array of frequencies of the elements
frequencies = freqs_idx[idx]
print(frequencies)

. @Jaime, . . , unique .

, , unique:

import numpy as np
input_1 = np.array([3, 6, 6, 3, 6, 4])
input_2 = np.array([False, True, True, False, False, True])

non_zero_hits, counts = np.unique(input_1[input_2], return_counts=True)
all_hits, idx = np.unique(input_1, return_inverse=True)
frequencies = np.zeros_like(all_hits)

#2nd step, with broadcasting
idx_non_zero_hits_in_all_hits = np.where(non_zero_hits[:, np.newaxis] - all_hits == 0)[1]
frequencies[idx_non_zero_hits_in_all_hits] = counts
print(frequencies[idx])

, input_1 True input_2 - 2D- where, , for :

#2nd step, but with a for loop.
for j, val in enumerate(non_zero_hits):
    index = np.where(val == all_hits)[0]
    frequencies[index] = counts[j]
print(frequencies[idx])

, for. , .

+2

The current solution for binouts is pretty elegant, but the numpy_indexed package provides more general solutions for such problems:

import numpy_indexed as npi
idx = npi.as_index(input_1)
unique_labels, true_count_per_label = npi.group_by(idx).sum(input_2)
print(true_count_per_label[idx.inverse])
0
source

Source: https://habr.com/ru/post/1570855/


All Articles