Check which columns in the DataFrame are categorical

Question

Check which columns in the DataFrame are categorical

I'm new to Pandas ... I want a simple and general way to find which columns are categorical in my DataFrame when I don't manually specify each column type, unlike this SO question . df is created using:

 import pandas as pd df = pd.read_csv("test.csv", header=None)

eg

  0 1 2 3 4 0 1.539240 0.423437 -0.687014 Chicago Safari 1 0.815336 0.913623 1.800160 Boston Safari 2 0.821214 -0.824839 0.483724 New York Safari

,

UPDATE (2018/02/04) The question assumes that numeric columns are NOT categorical, @Zero's accepted answer solves this .

BE CAREFUL - As @Sagarkar notes, this is not always the case. The difficulty lies in the fact that data types and categorical / ordinal / nominal types are orthogonal concepts, so the comparison between them is not simple. @ Jeff's answer below indicates the exact way to achieve manual display.

+20

python pandas

pds Apr 22 '15 at 16:03

source share

10 answers

I found a way to upgrade to Pandas v0.16.0, and then excluded the number of dtypes using:

 df.select_dtypes(exclude=["number","bool_","object_"])

What works if the types are not changed and are no longer added to NumPy. The suggestion in the comments on the @Jeff question suggests include=["category"] , but that didn't seem to work.

NumPy Types: Link

+15

pds Apr 22 '15 at 16:12

source share

For posterity. The canonical method for selecting dtypes .select_dtypes . You can specify the actual numpy dtype or convertible, or a category that is not a numpy dtype.

 In [1]: df = DataFrame({'A' : Series(range(3)).astype('category'), 'B' : range(3), 'C' : list('abc'), 'D' : np.random.randn(3) }) In [2]: df Out[2]: ABCD 0 0 0 a 0.141296 1 1 1 b 0.939059 2 2 2 c -2.305019 In [3]: df.select_dtypes(include=['category']) Out[3]: A 0 0 1 1 2 2 In [4]: df.select_dtypes(include=['object']) Out[4]: C 0 a 1 b 2 c In [5]: df.select_dtypes(include=['object']).dtypes Out[5]: C object dtype: object In [6]: df.select_dtypes(include=['category','int']).dtypes Out[6]: A category B int64 dtype: object In [7]: df.select_dtypes(include=['category','int','float']).dtypes Out[7]: A category B int64 D float64 dtype: object

+9

Jeff Apr 23 '15 at 11:15

source share

Use .dtypes

 In [10]: df.dtypes Out[10]: 0 float64 1 float64 2 float64 3 object 4 object dtype: object

+1

Liam Foley Apr 22 '15 at 16:11

source share

 numeric_var = [key for key in dict(df.dtypes) if dict(pd.dtypes)[key] in ['float64','float32','int32','int64']] # Numeric Variable cat_var = [key for key in dict(df.dtypes) if dict(df.dtypes)[key] in ['object'] ] # Categorical Varible

+1

Sudhir tiwari Jun 03 '18 at 22:00

source share

You can get a list of categorical columns using this code:

 dfName.select_dtypes(exclude=['int', 'float']).columns

And intuitively for numeric columns:

 dfName.select_dtypes(include=['int', 'float']).columns

Hope this helps.

+1

Shikhar mar Aug 2 '18 at 18:30

source share

This will give an array of all categorical variables in the data frame.

 dataset.select_dtypes(include=['O']).columns.values

0

ankit2saxena Apr 15 '18 at 17:37

source share

 # Import packages import numpy as np import pandas as pd # Data df = pd.DataFrame({"Country" : ["France", "Spain", "Germany", "Spain", "Germany", "France"], "Age" : [34, 27, 30, 32, 42, 30], "Purchased" : ["No", "Yes", "No", "No", "Yes", "Yes"]}) df Out[1]: Country Age Purchased 0 France 34 No 1 Spain 27 Yes 2 Germany 30 No 3 Spain 32 No 4 Germany 42 Yes 5 France 30 Yes # Checking data type df.dtypes Out[2]: Country object Age int64 Purchased object dtype: object # Saving CATEGORICAL Variables cat_col = [c for i, c in enumerate(df.columns) if df.dtypes[i] in [np.object]] cat_col Out[3]: ['Country', 'Purchased']

0

Hamza chennaq Dec 9 '18 at 0:09

source share

Use pandas.DataFrame.select_dtypes . There are categorical dtypes that can be found using the "categorical" flag. For strings you can use an object like dumpy

Additional information: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.select_dtypes.html

Exemple:

 import pandas as pd df = pd.DataFrame({'Integer': [1, 2] * 3,'Bool': [True, False] * 3,'Float': [1.0, 2.0] * 3,'String': ['Dog', 'Cat'] * 3}) df Out[1]: Integer Bool Float String 0 1 True 1.0 Dog 1 2 False 2.0 Cat 2 1 True 1.0 Dog 3 2 False 2.0 Cat 4 1 True 1.0 Dog 5 2 False 2.0 Cat df.select_dtypes(include=['category', object]).columns Out[2]: Index(['String'], dtype='object')

0

dcrystal Dec 17 '18 at 13:29

source share

''

select categorical column names

cat_features = [i for i in df.columns if df.dtypes [i] == 'object'] '

0

Gucci148 Apr 16 '19 at 17:49

source share

Zero · Accepted Answer · 2015-04-22T16:11:59+0000

You can use df._get_numeric_data() to get numeric columns and then find out categorical columns

 In [66]: cols = df.columns In [67]: num_cols = df._get_numeric_data().columns In [68]: num_cols Out[68]: Index([u'0', u'1', u'2'], dtype='object') In [69]: list(set(cols) - set(num_cols)) Out[69]: ['3', '4']

Check which columns in the DataFrame are categorical

select categorical column names

More articles: