I have a csv that contains 3 columns, count_id, AMV and time.
I use pandas and read this as a data frame.
results= pd.read_csv('./output.csv')
First, I sort the data frame first for count_id, and then for AMV.
results_sorted = results.sort_index(by=['count_id','AMV'], ascending=[True, True])
This gives
count_id AMV Hour 0 16012E 4004 14 1 16012E 4026 12 2 16012E 4099 15 3 16012E 4167 11 4 16012E 4239 10 5 16012E 4324 13 6 16012E 4941 16 7 16012E 5088 17 8 16012E 5283 9 9 16012E 5620 8 10 16012E 5946 18 11 16012E 6146 7 12 16012W 3622 10 13 16012W 3904 12 14 16012W 3979 11 15 16012W 4076 9 16 16012W 4189 13 17 16012W 4870 14 18 16012W 4899 18 19 16012W 5107 15 20 16012W 5659 8 21 16012W 6325 7 22 16012W 6460 17 23 16012W 6500 16
Now I want to do some normalization of the data so that I can ultimately build it on the same plot. I want to find the minimum value for AMV for each series (count_id), and then subtract this minimum value from the given AMV. This will give me a new AMV_norm column.
What it looks like:
count_id AMV Hour AMV_norm 0 16012E 4004 14 0 1 16012E 4026 12 22 2 16012E 4099 15 95 3 16012E 4167 11 163 4 16012E 4239 10 235 5 16012E 4324 13 320 6 16012E 4941 16 937 7 16012E 5088 17 1084 8 16012E 5283 9 1279 9 16012E 5620 8 1616 10 16012E 5946 18 1942 11 16012E 6146 7 2142 12 16012W 3622 10 0 13 16012W 3904 12 282 14 16012W 3979 11 357 15 16012W 4076 9 454 16 16012W 4189 13 567 17 16012W 4870 14 1248 18 16012W 4899 18 1277 19 16012W 5107 15 1485 20 16012W 5659 8 2037 21 16012W 6325 7 2703 22 16012W 6460 17 2838 23 16012W 6500 16 2878
How to define a function that finds the minimum AMV for each series, rather than the minimum AMV in general? It will look something like this:
def minimum_series_value(AMV): return AMV.argmin()
Then I need to create a new column, and with the help of the lambda function this row will be filled. I know this will look something like this:
results_sorted['AMV_norm'] = results_sorted.apply(lambda row:results_sorted(row['AMV']))