Using the values ​​of the previous "string" in the pandas series

I have a CSV that looks like this (and when typed into a pandas Dataframe with read_csv() , it looks the same).

enter image description here

I want to update the values ​​in the ad_requests column according to the following logic:

For a given string, if the value of ad_requests matters, leave it alone. Else, set the value of the previous row value to ad_requests minus the value of the previous row for impressions . So, in the first example, we would like to:

enter image description here

I get partially there:

 df["ad_requests"] = [i if not pd.isnull(i) else ??? for i in df["ad_requests"]] 

And here I am stuck. After else I want to "go back" and access the previous "line", although I know that this does not mean that pandas is supposed to be used. Another thing to note is that the rows will always be grouped into three, according to the ad_tag_name column. If I pd.groupby["ad_tag_name"] , I can turn this into a list and start slicing and indexing, but again, I think pandas should be the best way to do this (as there are a lot of things).

Python: 2.7.10

Pandas: 0.18.0

+5
source share
1 answer

You need to do something like this:

 pd.options.mode.chained_assignment = None #suppresses "SettingWithCopyWarning" for index, elem in enumerate(df['ad_requests']): if pd.isnull(elem): df['ad_requests'][index]=df['ad_requests'][index-1]-df['impressions'][index-1] 

The warning comes from the fact that we are changing the values ​​of the appearance of the data frame, which affects the original data frame. This is what we want to do, however, it really does not concern us.

(Python 2.7.12 and Pandas 0.19.0)

EDIT:

Change the last line of code from

 df['ad_requests'][index]=df['ad_requests'][index-1]-df['impressions'][index-1] 

to

 df.at[index,'ad_requests']=df.at[index-1,'ad_requests']-df.at[index-1,'impressions'] 

Eliminates the need to suppress any warnings:

 for index, elem in enumerate(df['ad_requests']): if pd.isnull(elem): df.at[index,'ad_requests']=df.at[index-1,'ad_requests']-df.at[index-1,'impressions'] 
+3
source

Source: https://habr.com/ru/post/1260112/


All Articles