Why does Pandas force my numpy float32 to float64?

Question

Why does Pandas force my numpy float32 to float64?

Why does Pandas force my numpy float32 to float64 in this code snippet:

>>> import pandas as pd >>> import numpy as np >>> df = pd.DataFrame([[1, 2, 'a'], [3, 4, 'b']], dtype=np.float32) >>> A = df.ix[:, 0:1].values >>> df.ix[:, 0:1] = A >>> df[0].dtype dtype('float64')

The behavior seems so strange to me that I wonder if this is a mistake. I am in Pandas version 0.17.1 (updated version of PyPI) and I note that bugs have been fixed recently, see https://github.com/pydata/pandas/issues/11847 . I have not tried a piece of code with the updated GitHub wizard.

Is this a mistake or am I misunderstanding a “feature” in Pandas? If this is a function, then how do I get around it?

(The coherence problem is related to the question I recently asked about the performance of Pandas assignments: Assigning a Pandas DataFrame with float32 and float64 slow )

+5

python numpy pandas coercion

Finn Årup Nielsen Feb 05 '16 at 17:47

source share

2 answers

Martin valgur · Answer 1 · 2016-02-05T18:32:45+0000

I think it's worth posting this as a GitHub issue. The behavior is certainly inconsistent.

The code uses a different branch based on whether the DataFrame is mixed or not ( source ).

In the case of a mixed type, ndarray is converted to a list of Python floating-point numbers, and then converted back to float64 ndarray, ignoring the dtypes of the DataFrame data ( function maybe_convert_objects () ).
In the case of a non-mixed type, the contents of the DataFrame are updated almost directly ( source ), and the DataFrame retains its float32 dtypes.

hpaulj · Answer 2 · 2016-02-05T20:44:56+0000

Not the answer, but my recreation of the problem:

 In [2]: df = pd.DataFrame([[1, 2, 'a'], [3, 4, 'b']], dtype=np.float32) In [3]: df.dtypes Out[3]: 0 float32 1 float32 2 object dtype: object In [4]: A=df.ix[:,:1].values In [5]: A Out[5]: array([[ 1., 2.], [ 3., 4.]], dtype=float32) In [6]: df.ix[:,:1] = A In [7]: df.dtypes Out[7]: 0 float64 1 float64 2 object dtype: object In [8]: pd.__version__ Out[8]: '0.15.0'

I am not familiar with pandas as numpy , but I am puzzled by why ix[:,:1] gives me a 2-column result. In numpy such an index gives only 1 column.

If I assign one column dtype does not change

 In [47]: df.ix[:,[0]]=A[:,0] In [48]: df.dtypes Out[48]: 0 float32 1 float32 2 object

The same actions without mixed data types do not change dtypes

 In [100]: df1 = pd.DataFrame([[1, 2, 1.23], [3, 4, 3.32]], dtype=np.float32) In [101]: A1=df1.ix[:,:1].values In [102]: df1.ix[:,:1]=A1 In [103]: df1.dtypes Out[103]: 0 float32 1 float32 2 float32 dtype: object

The key should be that with mixed values, the dataframe is in one way or another an array of dtype=object , whether true for its internal data store or just its numpy interface.

 In [104]: df1.as_matrix() Out[104]: array([[ 1. , 2. , 1.23000002], [ 3. , 4. , 3.31999993]], dtype=float32) In [105]: df.as_matrix() Out[105]: array([[1.0, 2.0, 'a'], [3.0, 4.0, 'b']], dtype=object)

Why does Pandas force my numpy float32 to float64?

More articles: