Fill an array of 1D numpy arrays with indices

Question

Fill an array of 1D numpy arrays with indices

Background

I have one 1D NumPy array initialized with zeros.

import numpy as np
section = np.zeros(1000)

Then I have a Pandas DataFrame, where I have indexes in two columns:

d= {'start': {0: 7200, 1: 7500, 2: 7560, 3: 8100, 4: 11400},
    'end': {0: 10800, 1: 8100, 2: 8100, 3: 8150, 4: 12000}}

df = pd.DataFrame(data=d, columns=['start', 'end'])

For each pair of indices, I want to set the value of the corresponding indices in the numpy array to True.

My current solution

I can do this by applying a function to a DataFrame:

def fill_array(row):
    section[row.start:row.end] = True

df.apply(fill_array, axis=1)

I want to vectorize this operation

This works as I expect, but for the pleasure of it I would like to do a vector operation. I do not really understand this, and my search on the Internet did not set me on the right path.

I would really appreciate any suggestions on how to do this in a vector operation, if at all possible.

+3

python arrays vectorization numpy pandas

Knut Flage Henriksen 12 . '17 11:59

2

, , , , python " ".

, "" ,

indices = np.r_[tuple(slice(row.start, row.end) for row in df.itertuples())]
section[indices] = True

, , , .

, . , , .

, ,

d= {'start': {0: 7200, 1: 11400},
    'end': {0: 10800, 1: 12000}}

60%! . , :

slices = [(row.start, row.end) for row in df.itertuples()]
slices_union = []
for start, end in sorted(slices):
    if slices_union and slices_union[-1][1] >= start - 1:
        slices_union[-1][1] = max(slices_union[-1][1], end)
    else:
        slices_union.append([start, end])

(, ),

for start, end in slices_union:
    section[start:end] = True

+1

Jonas Adler 12 . '17 12:41

Divakar · Accepted Answer · 2017-07-12T12:43:57+0000

, 1s -1s int . , , , (-). , - - . , , -

def filled_array(start, end, length):
    out = np.zeros((length), dtype=int)
    np.add.at(out,start,1)
    np.add.at(out,end,-1)
    return out.cumsum()>0

def filled_array_v2(start, end, length): #Using @Daniel suggestion
    out =np.bincount(start, minlength=length) - np.bincount(end, minlength=length)
    return out.cumsum().astype(bool)

-

In [2]: start
Out[2]: array([ 4,  7,  5, 15])

In [3]: end
Out[3]: array([12, 12,  7, 17])

In [4]: out = filled_array(start, end, length=20)

In [7]: pd.DataFrame(out) # print as dataframe for easy verification
Out[7]: 
        0
0   False
1   False
2   False
3   False
4    True
5    True
6    True
7    True
8    True
9    True
10   True
11   True
12  False
13  False
14  False
15   True
16   True
17  False
18  False
19  False

Fill an array of 1D numpy arrays with indices

More articles: