I found that I consider it a mistake in the way MicrosoftR handles metadata from .sav files from SPSS.
Here is a summary of the Variable View:
ColumnA: 1 - Yes, 2 - No
ColumnB: 0.33 - Yes, 0.5 - Maybe, 0.66 - No, 0.99 - Why not, 1.00 - Yes, for sure.
ColumnC: A - Yes, B - No
My code is:
library(RevoScaleR)
df <- RxSpssData(
"RoundingTest.sav",
stringsAsFactors = FALSE,
labelsAsInfo = TRUE,
labelsAsLevels = TRUE,
mapMissingCodes = "none"
)
test = rxImport(df)
Data is read in the order:
ColumnA ColumnB ColumnC Var0001
1 Yes 0.33 Yes NA
2 No 0.50 Yes NA
3 Yes 0.66 No NA
However, the InfoCodes values do not matter:
attr(test$ColumnA, ".rxValueInfoCodes")
attr(test$ColumnB, ".rxValueInfoCodes")
attr(test$ColumnC, ".rxValueInfoCodes")
It seems that using some kind of floor function for metadata in numeric columns is used before converting them to character strings.
I tried to use options(scipen = 12)
and rxOptions(numDigits = 12)
with no success. Using rxDataStep
instead rxImport
does not work. I believe the error was somewhere in the RxSpssData () function.
- Has anyone experienced this with RxSpssData or any other type of file?
- Is there a workaround?
- Is there an official way to report this to Microsoft if this is a genuine error?
Thanks!
, :
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
EDIT: SAV GitHub .