How to fill in the date before the first date of this month?

I have a pandas DataFrame with index column = date .

Input:

  value date 1986-01-31 22.93 1986-02-28 15.46 

I want to put the date on the first day of this month

Output:

  value date 1986-01-01 22.93 1986-02-01 15.46 

What I tried:

 df.index.floor('M') ValueError: <MonthEnd> is a non-fixed frequency 

This is potentially because df is generated df = df.resample("M").sum() (The result of this code is the input at the beginning of the question)

I also tried df = df.resample("M", convention='start').sum() . However, this does not work.

I know in R, you can just call floor(date, 'M') .

+9
source share
7 answers

You can use MonthBegin time series offset

 from pandas.tseries.offsets import MonthBegin df['date'] = pd.to_datetime(df['date']) - MonthBegin(1) 

Edit: The above solution does not handle dates that are already set by the beginning of the month. Here is an alternative solution.

Here is a data frame with additional test cases:

  value date 1986-01-31 22.93 1986-02-28 15.46 2018-01-01 20.00 2018-02-02 25.00 

Using the timedelta method,

 df.index = pd.to_datetime(df.index) df.index = df.index - pd.to_timedelta(df.index.day - 1, unit='d') value date 1986-01-01 22.93 1986-02-01 15.46 2018-01-01 20.00 2018-02-01 25.00 
+9
source

Here is another β€œpandonic” way to do this:

 df.date - pd.Timedelta('1 day') * (df.date.dt.day - 1) 
+4
source

This will do the trick and will not require imports. Numpy has a dtype datetime64 , which pandas sets to [ns] by default, as can be seen from the dtype check. You can change this to the month that begins in the first month by accessing the numpy array and changing the type.

 df.date = pd.to_datetime(df.date.values.astype('datetime64[M]')) 

It would be nice if pandas implemented this with its own astype() method, but unfortunately you cannot.

The above data for data as values ​​or date and time strings, if you already have data of type datetime[ns] , you can omit pd.to_datetime() and simply:

 df.date = df.date.values.astype('datetime64[M]') 
+3
source

there is a problem of pandas with a gender problem

the proposed method

 import pandas as pd pd.to_datetime(df.date).dt.to_period('M').dt.to_timestamp() 
+3
source
 dt_1 = "2016-02-01" def first_day(dt): lt_split = dt.split("-") return "-".join([lt_split[0], lt_split[1], "01"]) print first_day(dt_1) 

For Panda DataFrame, you can use dt["col_name_date"].apply(first_day) .

+2
source

You can also use string formatting of date and time:

df['month'] = df['date'].dt.strftime('%Y-%m-01')

0
source

From August 2019:

This should work:

 [x.replace(day=1).date() for x in df['date']] 

The only requirement is to make sure that date is a date and time, which we can guarantee by calling pd.to_datetime(df['date']) .

0
source

Source: https://habr.com/ru/post/1264318/