Pandas: Type conversion using `df.loc` from datetime64 to int

When I try to reassign certain values ​​in a column with, df.loc[]I get a strange type conversion error, converting datetimes to integers.

Minimal example:

import numpy as np
import pandas as pd
import datetime
d = pd.DataFrame(zip(['12/6/2015', np.nan], [1, 2]), columns=list('ab'))
print(d)
d.loc[pd.notnull(d.a), 'a'] = d.a[pd.notnull(d.a)].apply(lambda x: datetime.datetime(2015,12,6))
print(d)

Full example:

Here is my dataframe (contains NaN):

>>> df.head()

  prior_ea_date quarter
0    12/31/2015      Q2
1    12/31/2015      Q3
2    12/31/2015      Q3
3    12/31/2015      Q3
4    12/31/2015      Q2

>>> df.prior_ea_date

0         12/31/2015
1         12/31/2015
...
341486     1/19/2016
341487      1/6/2016
Name: prior_ea_date, dtype: object

I want to run the following line of code:

df.loc[pd.notnull(df.prior_ea_date), 'prior_ea_date'] = df.prior_ea_date[pd.notnull(df.prior_ea_date)].apply(dt, usa=True)

where dtis the line for the datetime syntax, which at startup usually gives:

>>> df.prior_ea_date[pd.notnull(df.prior_ea_date)].apply(dt, usa=True).head()

0   2015-12-31
1   2015-12-31
2   2015-12-31
3   2015-12-31
4   2015-12-31
Name: prior_ea_date, dtype: datetime64[ns]

However, when I run .loc[], I get the following:

>>> df.loc[pd.notnull(df.prior_ea_date), 'prior_ea_date'] = df.prior_ea_date[pd.notnull(df.prior_ea_date)].apply(dt, usa=True)
>>> df.head()

         prior_ea_date quarter
0  1451520000000000000      Q2
1  1451520000000000000      Q3
2  1451520000000000000      Q3
3  1451520000000000000      Q3
4  1451520000000000000      Q2

and he turned my datetime objects into integers.

  • Why is this happening?
  • How to avoid this behavior?

I managed to create a temporary job, so I, while any single-line hacks would be appreciated, I would like a pandas style solution.

Thank.

+4
1

:

, prior_eta_date datetime. Pandas to_datetime:

df.prior_ea_date = pd.to_datetime(df.prior_ea_date, format='%m/%d/%Y')
df.prior_ea_date

0   2015-12-31
1   2015-12-31
2   2015-12-31
3   2015-12-31
4   2015-12-31
5          NaT
Name: prior_ea_date, dtype: datetime64[ns]

: ?

, , df.loc[pd.notnull(df.prior_ea_date), 'prior_ea_date'] = ...., prior_ea_date , . Pandas , prior_ea_date. , .

:

##
# Example of type casting on slice
##

d = pd.DataFrame(zip(['12/6/2015', np.nan], [1, 2]), columns=list('ab'))

# Column-a is still dtype: object
d.a
0    12/6/2015
1          NaN
Name: a, dtype: object

d.loc[pd.notnull(d.a), 'a'] = d.a[pd.notnull(d.a)].apply(lambda x: datetime.datetime(2015,12,6))

# Column-a is still dtype: object
d.a
0    1449360000000000000
1                    NaN
Name: a, dtype: object

##
# Example of overwriting whole column
##

d = pd.DataFrame(zip(['12/6/2015', np.nan], [1, 2]), columns=list('ab'))
d.a = pd.to_datetime(d.a, format='%m/%d/%Y')

# Column-a dtype is now datetime
d.a
0   2015-12-06
1          NaT
Name: a, dtype: datetime64[ns]

:

OP Pycharm, , . TL;DR: , datetime dtypes Numpy.

d = np.datetime64('2015-12-30T16:00:00.000000000-0800')
d.astype(np.dtype(object))
#>>> 1451520000000000000L

... , .loc ...

. , datetime object. , loc dtype , .

loc, Pandas _LocationIndexer indexing. self.obj._data = self.obj._data.setitem(indexer, value) .

, , , , 742 pandas.core.internals.py:

values[indexer] = value  

values Numpy ndarray dtypes. . . indexer - . value ndarray Numpy datetime64.

setitem Numpy, "" np.asarray(value, self.dtype). self.dtype - : object, - .

np.asarray(d, np.dtype(object))
#>>> array(1451520000000000000L, dtype=object)

... ...
loc. , .

... , dtype = object Pandas, . , int, NaNs.

, , Numpy datetime . Numpy ? . .

+4

Source: https://habr.com/ru/post/1651810/


All Articles