Get the latest date in each month of the pandas time series

Question

Get the latest date in each month of the pandas time series

I am currently generating DateTimeIndex using a specific function, zipline.utils.tradingcalendar.get_trading_days . Time series are approximately daily, but with some spaces.

My goal is to get the latest date in DateTimeIndex for each month.

.to_period('M') and .to_timestamp('M') do not work, because they give the last day of the month, and not the last value of the variable for each month.

As an example, if this is my time series, I would like to select "2015-05-29", and the last day of the month is "2015-05-31".

['2015-05-18', '2015-05-19', '2015-05-20', '2015-05-21', "2015-05-22", "2015-05-26", " 2015-05-27 "," 2015-05-28 ", '2015-05-29', '2015-06-01']

+9

python pandas zipline

ikemblem Jun 09 '15 at 10:21

source share

5 answers

My strategy would be to group by month, and then choose the “maximum” for each group:

If "dt" is your DatetimeIndex object:

 last_dates_of_the_month = [] dt_month_group_dict = dt.groupby(dt.month) for month in dt_month_group_dict: last_date = max(dt_month_group_dict[month]) last_dates_of_the_month.append(last_date)

The list "last_date_of_the_month" contains all the last dates of each month in your dataset. You can use this list to create DatetimeIndex in pandas again (or what you want to do with it).

+3

Condla Jun 09 '15 at 23:05

source share

This is an old question, but all of the existing answers are not perfect. This is the solution I came up with (assuming the date is a sorted index) that can even be written on one line, but I divided it into readability:

 month1 = pd.Series(apple.index.month) month2 = pd.Series(apple.index.month).shift(-1) mask = (month1 != month2) apple[mask.values].head(10)

A few notes here:

Another instance of pd.Series is required to switch the datetime series (see here )
Indexing a boolean mask requires .values (see here )

By the way, when dates are business days, it would be easier to use re- apple.resample('BM') : apple.resample('BM')

+3

Maxim Feb 21 '18 at 18:17

source share

Maybe the answer is no longer needed, but looking for the answer to the same question, I found maybe a simpler solution:

 import pandas as pd sample_dates = pd.date_range(start='2010-01-01', periods=100, freq='B') month_end_dates = sample_dates[sample_dates.is_month_end]

+2

MMCM_ Aug 21 '15 at 8:04

source share

Suppose your data frame looks like this

source data frame

Then the following code will give you the last day of each month.

 df_monthly = df.reset_index().groupby([df.index.year,df.index.month],as_index=False).last().set_index('index')

transformed_dataframe

This one line code does its job :)

0

user3570984 May 24 '19 at 20:56

source share

ikemblem · Accepted Answer · 2015-06-10T12:15:02+0000

Condla's answer came closest to what I needed, except that since my time index stretched over a year, I needed to group by month and year, and then select the maximum date. Below is the code I ended up in.

 # tempTradeDays is the initial DatetimeIndex dateRange = [] tempYear = None dictYears = tempTradeDays.groupby(tempTradeDays.year) for yr in dictYears.keys(): tempYear = pd.DatetimeIndex(dictYears[yr]).groupby(pd.DatetimeIndex(dictYears[yr]).month) for m in tempYear.keys(): dateRange.append(max(tempYear[m])) dateRange = pd.DatetimeIndex(dateRange).order()

Get the latest date in each month of the pandas time series

More articles: