Mixed item types in a DataFrame column

Consider the following three DataFrame :

 df1 = pd.DataFrame([[1,2],[4,3]]) df2 = pd.DataFrame([[1,.2],[4,3]]) df3 = pd.DataFrame([[1,'a'],[4,3]]) 

The following are the types of the second column of the DataFrame :

 In [56]: map(type,df1[1]) Out[56]: [numpy.int64, numpy.int64] In [57]: map(type,df2[1]) Out[57]: [numpy.float64, numpy.float64] In [58]: map(type,df3[1]) Out[58]: [str, int] 

In the first case, all numpy.int64 are issued on numpy.int64 . Good. In the third case, there is basically no casting. However, in the second case, the integer ( 3 ) is dropped to numpy.float64 ; probably since the other number is a floating point.

How can I manage the casting? In the second case, I want to have both [float64, int64] and [float, int] as types.

Workaround:

Using the called print function can be a workaround, as shown here .

 def printFloat(x): if np.modf(x)[0] == 0: return str(int(x)) else: return str(x) pd.options.display.float_format = printFloat 
+4
source share
1 answer

The pandas DataFrame (or series) columns are uniform in type. You can check this with dtype (or DataFrame.dtypes ):

 In [14]: df1[1].dtype Out[14]: dtype('int64') In [15]: df2[1].dtype Out[15]: dtype('float64') In [16]: df3[1].dtype Out[16]: dtype('O') 

Only a generic 'object' dtype can contain any python object, and thus can also contain mixed types:

 In [18]: df2 = pd.DataFrame([[1,.2],[4,3]], dtype='object') In [19]: df2[1].dtype Out[19]: dtype('O') In [20]: map(type,df2[1]) Out[20]: [float, int] 

But it really is not recommended, as it defeats the goal (or at least the performance) of pandas.

Is there a reason why you specifically want both int and float in the same column?

+4
source

Source: https://habr.com/ru/post/1208639/


All Articles