Convert pandas integers to integers in pandas (0.17.0)

My question is very similar to this one , but I need to convert the entire entire data frame, not just a series. The to_numeric function works only one series at a time and is not a good replacement for the obsolete convert_objects . Is there a way to get similar results in the convert_objects(convert_numeric=True) command in the new version of pandas?

Thanks to Mike Muller for your example. df.apply(pd.to_numeric) works very well if the values โ€‹โ€‹can be converted to integers. What should I do if there were lines in my data frame that could not be converted to integers? Example:

 df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']}) df.dtypes Out[59]: Words object ints object dtype: object 

Then I could run the deprecated function and get:

 df = df.convert_objects(convert_numeric=True) df.dtypes Out[60]: Words object ints int64 dtype: object 

Running the apply command gives me errors, even when trying and excluding processing.

+29
python pandas
Jan 17 '16 at 22:48
source share
2 answers

All columns are convertible

You can apply the function to all columns:

 df.apply(pd.to_numeric) 

Example:

 >>> df = pd.DataFrame({'a': ['1', '2'], 'b': ['45.8', '73.9'], 'c': [10.5, 3.7]}) >>> df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 2 entries, 0 to 1 Data columns (total 3 columns): a 2 non-null object b 2 non-null object c 2 non-null float64 dtypes: float64(1), object(2) memory usage: 64.0+ bytes >>> df.apply(pd.to_numeric).info() <class 'pandas.core.frame.DataFrame'> Int64Index: 2 entries, 0 to 1 Data columns (total 3 columns): a 2 non-null int64 b 2 non-null float64 c 2 non-null float64 dtypes: float64(2), int64(1) memory usage: 64.0 bytes 

Not all columns are convertible.

pd.to_numeric has keyword argument errors :

  Signature: pd.to_numeric(arg, errors='raise') Docstring: Convert argument to a numeric type. Parameters ---------- arg : list, tuple or array of objects, or Series errors : {'ignore', 'raise', 'coerce'}, default 'raise' - If 'raise', then invalid parsing will raise an exception - If 'coerce', then invalid parsing will be set as NaN - If 'ignore', then invalid parsing will return the input 

Setting ignore will return the column unchanged if it cannot be converted to a numeric type.

As Anton Protopopov noted, the most elegant way is to specify ignore as the key argument to apply() :

 >>> df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']}) >>> df.apply(pd.to_numeric, errors='ignore').info() <class 'pandas.core.frame.DataFrame'> Int64Index: 2 entries, 0 to 1 Data columns (total 2 columns): Words 2 non-null object ints 2 non-null int64 dtypes: int64(1), object(1) memory usage: 48.0+ bytes 

My previously proposed method, using partial from the functools module, is more detailed:

 >>> from functools import partial >>> df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']}) >>> df.apply(partial(pd.to_numeric, errors='ignore')).info() <class 'pandas.core.frame.DataFrame'> Int64Index: 2 entries, 0 to 1 Data columns (total 2 columns): Words 2 non-null object ints 2 non-null int64 dtypes: int64(1), object(1) memory usage: 48.0+ bytes 
+71
Jan 17 '16 at 23:05
source share

apply() pd.to_numeric with errors='ignore' and assign it back to the DataFrame:

 df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']}) print ("Orig: \n",df.dtypes) df.apply(pd.to_numeric, errors='ignore') print ("\nto_numeric: \n",df.dtypes) df = df.apply(pd.to_numeric, errors='ignore') print ("\nto_numeric with assign: \n",df.dtypes) 

Exit:

 Orig: ints object Words object dtype: object to_numeric: ints object Words object dtype: object to_numeric with assign: ints int64 Words object dtype: object 
0
Jun 05 '19 at 10:31 on
source share



All Articles