I am trying to detect the first dates when an event occurs: here, in my data frame for product A (see pivot table) I have 20 items saved for the first time on 2017-04-03.
so I want to create a new variable calle new_var_2017-04-03, which stores the increment. On the other hand, the next day 2017-04-04 I do not mind if now the element is 50 instead of 20, I want to save only one event
This gives me some errors, I would like to know, at least if all the logic underlying it makes sense, is it “pythonic”, or if I'm completely wrong.
raw_data = {'name': ['B','A','A','B'],'date' : pd.to_datetime(pd.Series(['2017-03-30','2017-03-31','2017-04-03','2017-04-04'])),
'age': [10,20,50,30]}
df1 = pd.DataFrame(raw_data, columns = ['date','name','age'])
table=pd.pivot_table(df1,index=['name'],columns=['date'],values=['age'],aggfunc='sum')
table
I pass dates to a list
dates=df1['date'].values.tolist()
"" , .
-: i-1
def my_fun(x,list):
for i in reversed(list):
if (x[i]-x[i-1])>0 :
x[new_var+i]=x[i]-x[i-1]
else:
x[new_var+i]=0
return x
print (df.apply(lambda x: my_fun(x,dates), axis=1))
:
raw_data2 = {'new_var': ['new_var_2017-03-30','new_var_2017-03-31','new_var_2017-04-03','new_var_2017-04-04'],'result_a': [np.nan,20,np.nan,np.nan],'result_b': [10,np.nan,np.nan,np.nan]}
df2= pd.DataFrame(raw_data2, columns = ['new_var','result_a','result_b'])
df2.T