How to define columns as quantitative or categorical data?

If I have a file with many columns, all the data is numbers, how can I find out if a particular column is categorical or quantitative? Is there a study area for this problem? If not, what heuristics can be used to determine?

Some heuristics I can think of:

Most likely there will be categorical data

  • make a summary of the unique value, if it is < some_threshold, there is a higher probability of being categorical data.
  • if the data is highly concentrated (low level)
  • if the unique value is very consistent and starts at 1
  • if all the values ​​in the column have a fixed length (maybe ID / Date)
  • if it has a very small p value in Benford Law
  • if it has a very small p value in the Chi-square test against the result column

Most likely, there will be quantitative data

  • if the column has a floating number
  • if the column has a sparse value
  • if the column is negative

Other

  • Perhaps quantitative data is likely to be close to / next to quantitative data (vice versa)

I use R, but the question does not have to be specific to R.

+4
source share
1 answer

It is assumed that someone has encoded the data correctly.

, , , , . , . .

, , , , ?

, , Stack Exchange.

my.data <- read.table(text = '
    aa     bb      cc     dd
    10    100    1000      1
    20    200    2000      2
    30    300    3000      3
    40    400    4000      4
    50    500    5000      5
    60    600    6000      6
', header = TRUE, colClasses = c('numeric', 'character', 'numeric', 'character'))

my.data

# one way
str(my.data)

'data.frame':   6 obs. of  4 variables:
 $ aa: num  10 20 30 40 50 60
 $ bb: chr  "100" "200" "300" "400" ...
 $ cc: num  1000 2000 3000 4000 5000 6000
 $ dd: chr  "1" "2" "3" "4" ...

:

my.class <- rep('empty', ncol(my.data))

for(i in 1:ncol(my.data)) {
    my.class[i] <- class(my.data[,i])
}

> my.class
[1] "numeric"   "character" "numeric"   "character"

class for-loop:

my.class <- sapply(my.data, class)
0

Source: https://habr.com/ru/post/1527165/


All Articles