Read.xlsx and colClasses

Does anyone know why the colClasses argument colClasses not work in read.xlsx ?

I am creating a sample * .xlsx file:

 > library(xlsx) > d1 = data.frame(A=LETTERS[1:3], B=letters[1:3], C=1:3, D=c(1.1, NA, NA)) > str(d1) 'data.frame': 3 obs. of 4 variables: $ A: Factor w/ 3 levels "A","B","C": 1 2 3 $ B: Factor w/ 3 levels "a","b","c": 1 2 3 $ C: int 1 2 3 $ D: num 1.1 NA NA > write.xlsx(d1, 'test.xlsx', sheetName='Sheet1', row.names=F, showNA=F) 

then try reading it with read.xlsx , without and with the colClasses argument:

 > d2 = read.xlsx('test.xlsx', sheetName='Sheet1') > str(d2) 'data.frame': 3 obs. of 4 variables: $ A: Factor w/ 3 levels "A","B","C": 1 2 3 $ B: Factor w/ 3 levels "a","b","c": 1 2 3 $ C: num 1 2 3 $ D: num 1.1 NA NA > d2 = read.xlsx('test.xlsx', sheetName='Sheet1', colClasses=c(B='character', 'A'='character')) > str(d2) 'data.frame': 3 obs. of 4 variables: $ A: Factor w/ 3 levels "A","B","C": 1 2 3 $ B: Factor w/ 3 levels "a","b","c": 1 2 3 $ C: num 1 2 3 $ D: num 1.1 NA NA 

The colClasses problem seems to have no effect. Any ideas?

Thank you for your help.

Alexei

PS I have R 3.0.1, xlsx 0.5.1

+6
source share
1 answer

colClasses= works, but the problem is that when your system acts by default, when the import data is intended to convert character columns to a factor.

If you import test.xlsx and set all the columns to be "character" , you will see that all columns are made as factors (also numbers).

 d2 = read.xlsx('test.xlsx', sheetName='Sheet1', colClasses=rep("character",4)) str(d2) 'data.frame': 3 obs. of 4 variables: $ A: Factor w/ 3 levels "A","B","C": 1 2 3 $ B: Factor w/ 3 levels "a","b","c": 1 2 3 $ C: Factor w/ 3 levels "1","2","3": 1 2 3 $ D: Factor w/ 1 level "1.1": 1 NA NA 

To ensure that characters are not converted to factors, you can add the stringsAsFactors=FALSE argument to the read.xlsx() function.

 d2 = read.xlsx('test.xlsx', sheetName='Sheet1', colClasses=c(B='character', A='character'),stringsAsFactors=FALSE) str(d2) 'data.frame': 3 obs. of 4 variables: $ A: chr "A" "B" "C" $ B: chr "a" "b" "c" $ C: num 1 2 3 $ D: num 1.1 NA NA 
+9
source

Source: https://habr.com/ru/post/951904/


All Articles