Reading specific columns of a delimited file in SAS

It seems like this should be simple, but I cannot find how to do this in the documentation. I want to read in a comma-delimited file, but it's very wide, and I just want to read a few columns.

I thought I could do this, but the @ pointer seems to point to columns of text, not the column numbers given by the delimiter:

 data tmp; infile 'results.csv' delimiter=',' MISSOVER DSD lrecl=32767 firstobs=2; @1 id @5 name$ run; 

In this example, I want to read only what is in the 1st and 5th columns based on the separator, but SAS reads what is in the 1st and 5th positions of the text file. So, if the first line of the input file starts like this

1234567, "x", "y", "asdf", "bubba", ... more variables ...

I want id=1234567 and name=bubba , but I get name=567, " .

I understand that I could read in each column and drop those that I don't need, but there should be a better way.

+4
source share
3 answers

Indeed, @ points to a column of text, not a separator column. The only method using standard input that I ever found was to read in an empty state, i.e.

 input id blank $ blank $ blank $ name $ ; 

and then leave it blank.

However, there is a better solution if you do not mind writing your input in different ways.

 data tmp; infile datalines; input @; id = scan(_INFILE_,1,','); name = scan(_INFILE_,5,','); put _all_; datalines; 12345,x,y,z,Joe 12346,x,y,z,Bob ;;;; run; 

This makes formatting a little more messy, since you need to enter or enter statements for each variable that you don't want in the base character format, but it can be simpler depending on your needs.

+6
source

You can skip fields quite efficiently, if you know a little syntax of the INPUT operator, pay attention to the use of (3 * dummy) (: $ 1.). Reading just one byte should also slightly improve performance.

 data tmp; infile cards DSD firstobs=2; input id $ (3*dummy)(:$1.) name $; drop dummy; cards; id,x,y,z,name 1234567, "x", "y", "asdf", "bubba", ... more variables 1234567, "x", "y", "asdf", "bubba", ... more variables run; proc print; run; 
+1
source

Another option that I thought about when I answered a question related to it from another user.

 filename tempfile temp; data _null_; set sashelp.cars; file tempfile dlm=',' dsd lrecl=32767; put (Make--Wheelbase) ($); run; data mydata; infile tempfile dlm=',' dsd truncover lrecl=32767; length _tempvars1-_tempvars100 $32; array _tempvars[100] $; input (_tempvars[*]) ($); make=_tempvars[1]; type=_tempvars[3]; MSRP=input(_tempvars[6],dollar8.); keep make type msrp; run; 

Here we use an array of effectively temporary (cannot actually be temporary, unfortunately) variables, and then we extract exactly what we want to specify columns. This is probably an excess for a small file - just read all the variables and process it, but for 100 or 200 variables, where you only want 15, 18 and 25, it might be easier if you know which column you want exactly. (I could use this when working with census data, for example, if you have it in a CSV form. Very often, just a few columns are required, most of which are on 100 or 200 columns from the starting column.)

You have to take care of your length for the temporary array (it should be until you need your long column!), And you must be sure that you will not ruin the columns, as you will not know if you will ruin it will be apparent from the data.

0
source

Source: https://habr.com/ru/post/1435280/


All Articles