sas7bdat worked fine for everyone except one of the files I was looking at (in particular, this one ); reporting an error to sas7bdat developer, Matthew Shotwell, he also pointed me towards the Hadley haven package in R, which also has a read_sas method.
This method is superior for two reasons:
1) He had no problems reading the linked file 2) It is much (I say much ) faster than read.sas7bdat . Here's a quick test (for this file , which is smaller than the others) to prove:
microbenchmark(times=10L, read.sas7bdat("psu97ai.sas7bdat"), read_sas("psu97ai.sas7bdat")) Unit: milliseconds expr min lq mean median uq max neval cld read.sas7bdat("psu97ai.sas7bdat") 66696.2955 67587.7061 71939.7025 68331.9600 77225.1979 82836.8152 10 b read_sas("psu97ai.sas7bdat") 397.9955 402.2627 410.4015 408.5038 418.1059 425.2762 10 a
This right - haven::read_sas takes (on average) 99.5% less time than sas7bdat::read.sas7bdat .
minor update
I previously could not figure out whether the two methods gave the same data (i.e. both have equal levels of accuracy with respect to reading the data), but finally did it:
# Keep as data.tables sas7bdat <- setDT(read.sas7bdat("psu97ai.sas7bdat")) haven <- setDT(read_sas("psu97ai.sas7bdat"))
However, note that read.sas7bdat has retained a massive list of attributes for the file, presumably a SAS hook:
str(sas7bdat) # ... # - attr(*, "column.info")=List of 70 # ..$ :List of 12 # .. ..$ name : chr "NCESSCH" # .. ..$ offset: int 200 # .. ..$ length: int 12 # .. ..$ type : chr "character" # .. ..$ format: chr "$" # .. ..$ fhdr : int 0 # .. ..$ foff : int 76 # .. ..$ flen : int 1 # .. ..$ label : chr "UNIQUE SCHOOL ID (NCES ASSIGNED)" # .. ..$ lhdr : int 0 # .. ..$ loff : int 44 # .. ..$ llen : int 32 # ...
So, if you really need these attributes (I know some people are especially interested in label s, for example), read.sas7bdat be the option for you after all.
MichaelChirico May 05 '15 at 2:20 AM 2015-05-05 02:20
source share