Getting SPSS Data File in R

At my company, we are thinking about phasing out the use of SPSS in the selection of R. During the transition, although we will still have data coming in the SPSS data file format (.sav).

I am having problems importing this SPSS data file into R. When I import an SPSS file into R, I want to save both the values ​​and the label values ​​for the variables. The read.spss() function from the foreign package gives me the ability to save either the values ​​or labels of the values ​​of a variable, but not both.

AFAIK, R allows factor variables to have values ​​(levels) and value labels (level labels). I'm just wondering if it is possible to somehow modify the read.spss() function to enable this.

As an alternative, I came across the spss.system.file() function from the memisc package, which supposedly allows this to happen, but it requests a separate syntax file (code.file), which is not always available to me always.

Here is a sample data file .

I would be grateful for any help in solving this problem.

Thank.

+5
r spss
Jan 30 '13 at 14:51
source share
4 answers

There is a solution for reading the SPSS data file in R using the ODBC driver.

1) There is an IBM SPSS Statistics Data File Driver . I could not find the download link. I got it from my SPSS provider. A standalone driver is all you need. You do not need SPSS to install or use the driver.

2) Create a DSN for the SPSS data driver.

3) Using the RODBC package, you can read any SPSS data file in R. You can get value labels for each variable as separate tables. Then you can use labels in R in any way you wish.

Here is a working example on Windows (now I do not have SPSS on my computer) to read an example data file in R. I have not tested this on Linux. It probably also works on Linux because there is an SPSS data driver for Linux.

 require(RODBC) # Create connection # Change the DSN name and CP_CONNECT_STRING according to your setting con <- odbcDriverConnect("DSN=spss_ehsis;SDSN=SAVDB;HST=C:\\Program Files\\IBM\\SPSS\\StatisticsDataFileDriver\\20\\Standalone\\cfg\\oadm.ini;PRT=StatisticsSAVDriverStandalone;CP_CONNECT_STRING=C:\\temp\\data_expt.sav") # List of tables Tables <- sqlTables(con) Tables # List of table names to extract table.names <- Tables$TABLE_NAME[Tables$TABLE_SCHEM != "SYSTEM"] # Function to query a table by name sqlQuery.tab.name <- function(table) { sqlQuery(con, paste0("SELECT * FROM [", table, "]")) } # Retrieve all tables Data <- lapply(table.names, sqlQuery.tab.name) # See the data lapply(Data, head) # Close connection close(con) 

For example, we can use these value labels for two variables:

 [[5]] VAR00002 VAR00002_label 1 1 Male 2 2 Female [[6]] VAR00003 VAR00003_label 1 2 Student 2 3 Employed 3 4 Unemployed 

Additional Information

Here is a function that allows you to read SPSS data after connecting to the SPSS data file. The function allows you to specify a list of selected variables. If value.labels=T selected variables with value labels in the SPSS data file are converted to R-factors with labels attached.

I have to say that I am not happy with the performance of this solution. It works well for small data files. The RAM limit is reached quite often for large SPSS data files (even a subset of variables is selected).

 get.spss <- function(channel, variables = NULL, value.labels = F) { VarNames <- sqlQuery(channel = channel, query = "SELECT VarName FROM [Variables]", as.is = T)$VarName if (is.null(variables)) variables <- VarNames else { if (any(!variables %in% VarNames)) stop("Wrong variable names") } if (value.labels) { ValueLabelTableName <- sqlQuery(channel = channel, query = "SELECT VarName FROM [Variables] WHERE ValueLabelTableName is not null", as.is = T)$VarName ValueLabelTableName <- intersect(variables, ValueLabelTableName) } variables <- paste(variables, collapse = ", ") data <- sqlQuery(channel = channel, query = paste("SELECT", variables, "FROM [Cases]"), as.is = T) if (value.labels) { for (var in ValueLabelTableName) { VL <- sqlQuery(channel = channel, query = paste0("SELECT * FROM [VLVAR", var,"]"), as.is = T) data[, var] <- factor(data[, var], levels = VL[, 1], labels = VL[, 2]) } } return(data) } 
+2
Feb 26 '13 at 15:19
source share

I do not know how to read SPSS metadata; I usually read .csv files and add metadata, or write a small one-time PERL script to complete the task. I would like to mention that the recently published Rz package Rz can help you with the introduction of SPSS data in R. I had a quick look at it and it seems useful.

+4
Jan 30 '13 at 17:15
source share

My work goes through the same transition.

read.spss () returns variable labels as an attribute of the object you are creating. So, in the example below, I have a data frame called rvm that was created by read.spss () with to.data.frame = TRUE. It has 3,500 variables with short names a1, a2, etc., but long labels for each variable in SPSS. I can access variable labels

 cbind(attributes(rvm)$variable.labels) 

which returns a list of all 3,500 variable names up to

 … x23 "Other Expenditure Uncapped Daily Expenditure In Region" x24 "Accommodation Expenditure In Region" x25 "Food/Meals/Drink Expenditure In Region" x26 "Local Transport Expenditure In Region" x27 "Sightseeing/Attractions Expenditure In Region" x28 "Event/Conference Expenditure In Region" x29 "Gambling/Casino Expenditure In Region" x30 "Gifts/Souvenirs Expenditure In Region" x31 "Other Shopping Expenditure In Region" x0 "Accommodation Daily Expenditure In Region" 

What to do with this is another matter, but at least I have them, and if I want, I can put them in some other object for storage, searching with grep, etc.

+2
Jan 30 '13 at 20:29
source share

Since you have SPSS, I recommend installing the "Essentials for R" plugin (free, but you need to register), also see the instructions), which allows you to run R in SPSS. The plugin includes an R package with functions that transmit the active SPSS data frame to R (and vice versa) - including levels of marked factors, dates, German umlauts - details that are known to be complex. In my experience, it is more reliable than its own foreign package.

After everything is configured, open the data in SPSS and run in the syntax window something like the following code:

 begin program r. myDf <- spssdata.GetDataFromSPSS(missingValueToNA=TRUE, factorMode="labels", rDate="POSIXct") save(myDf, file="d:/path/to/your/myDf.Rdata") end program. 

The Essentials for R plugin link (apparently violates the syntax of the markup link):

 https://www.ibm.com/developerworks/mydeveloperworks/wikis/home/wiki/We70df3195ec8_4f95_9773_42e448fa9029/page/Downloads%20for%20IBM®%20SPSS®%20Statistics?lang=en 
+1
Jan 30 '13 at 19:41
source share



All Articles