Python Data Frame: sum of column before condition and index return

I am new to Python and am currently facing a problem that I cannot solve. I really hope you can help me. English is not my native language, so I'm sorry if I can not express myself properly.

Let's say I have a simple data frame with two columns:

index  Num_Albums  Num_authors
0      10          4
1      1           5
2      4           4
3      7           1000
4      1           44
5      3           8

Num_Abums_tot = sum(Num_Albums) = 30

I need to make a cumulative sum of data in Num_Albumsuntil a certain condition is reached. Register the index in which the condition is satisfied and get the corresponding value from Num_authors.

Example: cumulative amount Num_Albumsuntil the amount is 50% ยฑ 1/15 of 30 (-> 15 ยฑ 2):

10 = 15ยฑ2? No, then continue;
10+1 =15ยฑ2? No, then continue
10+1+41 = 15ยฑ2? Yes, stop. 

The condition achieved with index 2. Then we get Num_authorsin this index:Num_Authors(2)=4

, , pandas, , while/for....

[ , ( , , , 4 , 1, = , 2, 3 4)].

+4
3

Opt - 1:

, cumsum. np.isclose , , , , 15 +/- 2. .

np.flatnonzero , True. True.

, .iloc .

val = np.flatnonzero(np.isclose(df.Num_Albums.cumsum().values, 15, atol=2))[0]
df['Num_authors'].iloc[val]      # for faster access, use .iat 
4

np.isclose series :

np.isclose(df.Num_Albums.cumsum().values, 15, atol=2)
array([False, False,  True, False, False, False], dtype=bool)

- 2:

pd.Index.get_loc cumsum, tolerance nearest.

val = pd.Index(df.Num_Albums.cumsum()).get_loc(15, 'nearest', tolerance=2)
df.get_value(val, 'Num_authors')
4

Opt - 3:

idxmax, True , sub abs cumsum:

df.get_value(df.Num_Albums.cumsum().sub(15).abs().le(2).idxmax(), 'Num_authors')
4
+4

, :

In [3]: df
Out[3]: 
   index  Num_Albums  Num_authors
0      0          10            4
1      1           1            5
2      2           4            4
3      3           7         1000
4      4           1           44
5      5           3            8

In [4]: df['cumsum'] = df['Num_Albums'].cumsum()

In [5]: df
Out[5]: 
   index  Num_Albums  Num_authors  cumsum
0      0          10            4      10
1      1           1            5      11
2      2           4            4      15
3      3           7         1000      22
4      4           1           44      23
5      5           3            8      26

, cumsum. , where, . tol:

In [18]: tol = 2

In [19]: cond = df.where((df['cumsum']>=15-tol)&(df['cumsum']<=15+tol)).dropna()

In [20]: cond
Out[20]: 
   index  Num_Albums  Num_authors  cumsum
2    2.0         4.0          4.0    15.0
+2

:

def your_function(df):
     sum=0
     index=-1
     for i in df['Num_Albums'].tolist():
       sum+=i
       index+=1
       if sum == ( " your_condition " ):
              return (index,df.loc([df.Num_Albums==i,'Num_authors']))

Num_authors, " ".

def your_function(df):
     sum=0
     index=-1
     for i in df['Num_Albums'].tolist():
       sum+=i
       index+=1
       if sum == ( " your_condition " ):
              return df.loc([df.Num_Albums==i,'Num_authors']).index.values

, , , "_" !!

I am also a beginner, so I hope this helps!

+1
source

Source: https://habr.com/ru/post/1665894/


All Articles