Why does Pandas force my numpy float32 to float64?

Why does Pandas force my numpy float32 to float64 in this code snippet:

>>> import pandas as pd >>> import numpy as np >>> df = pd.DataFrame([[1, 2, 'a'], [3, 4, 'b']], dtype=np.float32) >>> A = df.ix[:, 0:1].values >>> df.ix[:, 0:1] = A >>> df[0].dtype dtype('float64') 

The behavior seems so strange to me that I wonder if this is a mistake. I am in Pandas version 0.17.1 (updated version of PyPI) and I note that bugs have been fixed recently, see https://github.com/pydata/pandas/issues/11847 . I have not tried a piece of code with the updated GitHub wizard.

Is this a mistake or am I misunderstanding a β€œfeature” in Pandas? If this is a function, then how do I get around it?

(The coherence problem is related to the question I recently asked about the performance of Pandas assignments: Assigning a Pandas DataFrame with float32 and float64 slow )

+5
source share
2 answers

I think it's worth posting this as a GitHub issue. The behavior is certainly inconsistent.

The code uses a different branch based on whether the DataFrame is mixed or not ( source ).

  • In the case of a mixed type, ndarray is converted to a list of Python floating-point numbers, and then converted back to float64 ndarray, ignoring the dtypes of the DataFrame data ( function maybe_convert_objects () ).

  • In the case of a non-mixed type, the contents of the DataFrame are updated almost directly ( source ), and the DataFrame retains its float32 dtypes.

+1
source

Not the answer, but my recreation of the problem:

 In [2]: df = pd.DataFrame([[1, 2, 'a'], [3, 4, 'b']], dtype=np.float32) In [3]: df.dtypes Out[3]: 0 float32 1 float32 2 object dtype: object In [4]: A=df.ix[:,:1].values In [5]: A Out[5]: array([[ 1., 2.], [ 3., 4.]], dtype=float32) In [6]: df.ix[:,:1] = A In [7]: df.dtypes Out[7]: 0 float64 1 float64 2 object dtype: object In [8]: pd.__version__ Out[8]: '0.15.0' 

I am not familiar with pandas as numpy , but I am puzzled by why ix[:,:1] gives me a 2-column result. In numpy such an index gives only 1 column.

If I assign one column dtype does not change

 In [47]: df.ix[:,[0]]=A[:,0] In [48]: df.dtypes Out[48]: 0 float32 1 float32 2 object 

The same actions without mixed data types do not change dtypes

 In [100]: df1 = pd.DataFrame([[1, 2, 1.23], [3, 4, 3.32]], dtype=np.float32) In [101]: A1=df1.ix[:,:1].values In [102]: df1.ix[:,:1]=A1 In [103]: df1.dtypes Out[103]: 0 float32 1 float32 2 float32 dtype: object 

The key should be that with mixed values, the dataframe is in one way or another an array of dtype=object , whether true for its internal data store or just its numpy interface.

 In [104]: df1.as_matrix() Out[104]: array([[ 1. , 2. , 1.23000002], [ 3. , 4. , 3.31999993]], dtype=float32) In [105]: df.as_matrix() Out[105]: array([[1.0, 2.0, 'a'], [3.0, 4.0, 'b']], dtype=object) 
+1
source

Source: https://habr.com/ru/post/1242392/


All Articles