Multi Indexing - Last Access Every Day

New for multi-indexing in Pandas. I have data that looks like

Date        Time      value
2014-01-14  12:00:04   .424
            12:01:12   .342
            12:01:19   .341
            ...
            12:05:49   .23
2014-05-12  ...
            1:02:42    .23
....

Now I want to access for the last time for each individual date and store the value in some kind of array. I made a multi-index in this way

df= pd.read_csv("df.csv",index_col=0)
df.index = pd.to_datetime(df.index,infer_datetime_format=True)
df.index =        pd.MultiIndex.from_arrays([df.index.date,df.index.time],names=['Date','Time'])

df= df[~df.index.duplicated(keep='first')]
dates = df.index.get_level_values(0)

So, I have dates stored as an array. I want to iterate over dates, but I cannot either get the syntax correctly or get the values ​​incorrectly. I tried the for loop, but cannot run it ( for date in dates) and cannot do direct access ( df.loc[dates[i]]or something like that). Also, the number of time variables in each date changes. Is there any way to fix this?

+4
1

groupby/max. , Date Time, max. , Time ( reset_index):

import pandas as pd

df = pd.DataFrame({'Date': ['2014-01-14', '2014-01-14', '2014-01-14', '2014-01-14', '2014-05-12', '2014-05-12'], 'Time': ['12:00:04', '12:01:12', '12:01:19', '12:05:49', '01:01:59', '01:02:42'], 'value': [0.42399999999999999, 0.34200000000000003, 0.34100000000000003, 0.23000000000000001, 0.0, 0.23000000000000001]})
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index(['Date', 'Time'])

df = df.reset_index('Time', drop=False)
max_times = df.groupby(level=0)['Time'].max()
print(max_times)

Date
2014-01-14    12:05:49
2014-05-12     1:02:42
Name: Time, dtype: object

, idxmax - . idxmax . , . Date , idxmax reset_index ( ):

df = pd.DataFrame({'Date': ['2014-01-14', '2014-01-14', '2014-01-14', '2014-01-14', '2014-05-12', '2014-05-12'], 'Time': ['12:00:04', '12:01:12', '12:01:19', '12:05:49', '01:01:59', '1:02:42'], 'value': [0.42399999999999999, 0.34200000000000003, 0.34100000000000003, 0.23000000000000001, 0.0, 0.23000000000000001]})
df['Date'] = pd.to_datetime(df['Date'])
df['Time'] = pd.to_timedelta(df['Time'])
df = df.set_index(['Date', 'Time'])

df = df.reset_index()
idx = df.groupby(['Date'])['Time'].idxmax()
print(df.loc[idx])

        Date     Time  value
3 2014-01-14 12:05:49   0.23
5 2014-05-12 01:02:42   0.23

, MultiIndex. MultiIndex groupby. , datetimes , . , datetime/period-like, .dt accessor Date Time . , Date Date:

df = pd.DataFrame({'DateTime': ['2014-01-14 12:00:04', '2014-01-14 12:01:12', '2014-01-14 12:01:19', '2014-01-14 12:05:49', '2014-05-12 01:01:59', '2014-05-12 01:02:42'], 'value': [0.42399999999999999, 0.34200000000000003, 0.34100000000000003, 0.23000000000000001, 0.0, 0.23000000000000001]})
df['DateTime'] = pd.to_datetime(df['DateTime'])
# df = pd.read_csv('df.csv', parse_dates=[0])

idx = df.groupby(df['DateTime'].dt.date)['DateTime'].idxmax()
result = df.loc[idx]
print(result)

             DateTime  value
3 2014-01-14 12:05:49   0.23
5 2014-05-12 01:02:42   0.23
+3

Source: https://habr.com/ru/post/1648703/


All Articles