In a large dataset, determine which variables are categorical and which are numeric

Question

In a large dataset, determine which variables are categorical and which are numeric

I have a list of 65 variables, and I want to highlight lists of a numeric and categorical variable.

What could be for this task.

+4

r

user3619169 May 28 '14 at 9:12

source share

2 answers

You can do this (imagine your data.frame has a name df):

sapply(df, class)

Indeed, output with a time variable is less nice:

library(lubridate)
df <- data.frame(V1 = character(10),
                 V2 = numeric(10),
                 V3 = ymd(paste("2014-05", 21:30, sep="-")))
sapply(df, class)
##$V1
##[1] "factor"
##
##$V2
##[1] "numeric"
##
##$V3
##[1] "POSIXct" "POSIXt"

, , , :

names(df)[sapply(df, class) == "factor"]
##[1] "V1"

# for time variable it less obvious indeed...
names(df)[grepl("POSIXct", sapply(df, class))]
##[1] "V3"

+5

Victorp 28 '14 9:14

James · Accepted Answer · 2014-05-28T09:23:43+0000

You can use splitwith sapplyto group variables together:

split(names(iris),sapply(iris, function(x) paste(class(x), collapse=" ")))
$factor
[1] "Species"

$numeric
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"

Note the use pasteto collapse all class class names with multiple classes.

In a large dataset, determine which variables are categorical and which are numeric

More articles: