Fill an array of 1D numpy arrays with indices

Background

I have one 1D NumPy array initialized with zeros.

import numpy as np
section = np.zeros(1000)

Then I have a Pandas DataFrame, where I have indexes in two columns:

d= {'start': {0: 7200, 1: 7500, 2: 7560, 3: 8100, 4: 11400},
    'end': {0: 10800, 1: 8100, 2: 8100, 3: 8150, 4: 12000}}

df = pd.DataFrame(data=d, columns=['start', 'end'])

For each pair of indices, I want to set the value of the corresponding indices in the numpy array to True.

My current solution

I can do this by applying a function to a DataFrame:

def fill_array(row):
    section[row.start:row.end] = True

df.apply(fill_array, axis=1)

I want to vectorize this operation

This works as I expect, but for the pleasure of it I would like to do a vector operation. I do not really understand this, and my search on the Internet did not set me on the right path.

I would really appreciate any suggestions on how to do this in a vector operation, if at all possible.

+3
2

, 1s -1s int . , , , (-). , - - . , , -

def filled_array(start, end, length):
    out = np.zeros((length), dtype=int)
    np.add.at(out,start,1)
    np.add.at(out,end,-1)
    return out.cumsum()>0

def filled_array_v2(start, end, length): #Using @Daniel suggestion
    out =np.bincount(start, minlength=length) - np.bincount(end, minlength=length)
    return out.cumsum().astype(bool)

-

In [2]: start
Out[2]: array([ 4,  7,  5, 15])

In [3]: end
Out[3]: array([12, 12,  7, 17])

In [4]: out = filled_array(start, end, length=20)

In [7]: pd.DataFrame(out) # print as dataframe for easy verification
Out[7]: 
        0
0   False
1   False
2   False
3   False
4    True
5    True
6    True
7    True
8    True
9    True
10   True
11   True
12  False
13  False
14  False
15   True
16   True
17  False
18  False
19  False
+5

, , , , python " ".

, "" ,

indices = np.r_[tuple(slice(row.start, row.end) for row in df.itertuples())]
section[indices] = True

, , , .

, . , , .

, ,

d= {'start': {0: 7200, 1: 11400},
    'end': {0: 10800, 1: 12000}}

60%! . , :

slices = [(row.start, row.end) for row in df.itertuples()]
slices_union = []
for start, end in sorted(slices):
    if slices_union and slices_union[-1][1] >= start - 1:
        slices_union[-1][1] = max(slices_union[-1][1], end)
    else:
        slices_union.append([start, end])

(, ),

for start, end in slices_union:
    section[start:end] = True
+1

Source: https://habr.com/ru/post/1684162/


All Articles