Reading data in SAS, columns are not aligned

I have a data file that looks like this:

001 Mayo Clinic  120 78 7 15 
Patient has had a persistent cough for 3 weeks
023 Mayo Clinic  157 72 10 2 
Patient complained of ear ache
064 HMC  201 59 . . 
Patient left against medical advice
003 HMC  166 58 8 15 
Patient placed on beta-blockers on 7/1/2006

I find the task of reading this in SAS basically impossible. And no, in this case, reformatting the data file is out of the question. So let me explain what you are looking at here:

Each object has two rows of data. First line -

topic number / clinic / wt / hr / dx / sx (don't worry about what the numbers mean, it doesn't matter).

The second line is the text, which is basically a note containing additional information related to the subject whose data are indicated in the previous line. So the lines are:

001 Mayo Clinic  120 78 7 15 
Patient has had a persistent cough for 3 weeks

. Subject 001. ​​ SAS. ; - , , , SAS . , :

data ClinData;
    infile "&wdir.clinic_data.txt";
    retain patno clinic weight hr dx sx exinfo;
    input patno clinic $1. @;
    if clinic='M' then
        input patno @5 clinic $11. weight hr dx sx / @1 exinfo $30.;
    else if clinic='H' then
        input patno @5 clinic $3. weight hr dx sx / @1 exinfo $30.;
    run;

:

http://i61.tinypic.com/2uswl90.png

.

.

-, ('patno') . ?

-, ​​ "" "". SAS , , .

-, "exinfo" . SAS . , , 30 , .

? SAS . , , , . , /; , . , , , , .

+4
3

, , - , . , $1, , .

, :

data ClinData(drop=s varlen);
  retain patno clinic weight hr dx sx; 

  input patno clinic $30. @;
    clinic=compress(clinic,,'ka');
    s=length(clinic)+4+2;
   input @s weight hr dx sx /@; 
     varlen=length(_infile_); 
    input  @1 exinfo $varying256. varlen;

datalines4;
001 Mayo Clinic  120 78 7 15 
Patient has had a persistent cough for 3 weeks
023 Mayo Clinic  157 72 10 2 
Patient complained of ear ache
064 HMC  201 59 . . 
Patient left against medical advice
003 HMC  166 58 8 15 
Patient placed on beta-blockers on 7/1/2006
;;;;
run; 
proc print data=ClinData; run;
+1

SAS . , , , . , :

data want;
  length patno $3 clinic $20 weight hr dx sx 8 exinfo $80;
  input;
  patno  = scan(_infile_,1,' ');
  clinic = substr(_infile_,5,index(_infile_,'  ')-5);
  weight = input(scan(_infile_,-4,' '),8.);
  hr     = input(scan(_infile_,-3,' '),8.);
  dx     = input(scan(_infile_,-2,' '),8.);
  sx     = input(scan(_infile_,-1,' '),8.);
  input exinfo $80.;

datalines;
001 Mayo Clinic  120 78 7 15 
Patient has had a persistent cough for 3 weeks
023 Mayo Clinic  157 72 10 2 
Patient complained of ear ache
064 HMC  201 59 . . 
Patient left against medical advice
003 HMC  166 58 8 15 
Patient placed on beta-blockers on 7/1/2006
run;

_INFILE _ . "" , ( ). ​​ , substr, index / scan. , .

, , , , .

+3

, .

1 , , 1. - throwaway , , .

I recommend an approach similar to the one below. It’s easier to work with INFILE (an automatic variable created during input containing one row of data), rather than just trying to use only input methods. Your data is pretty simple; if it is more complex than you suggest (for example, you have more clinics than this), regular expressions or other logic can help this even deeper - and it will be easier to parse the file. There are also ANYDIGIT and NODIGIT and similar functions, as well as COMPRESS, which can help.

data want;
length clinic $12;
input 
@1 patid 3. @;  *hold input so _infile_ exists and we can play with it.  Might as well read in patid here.;
array numvars weight hr dx sx; *we are going to read this in via array;
do _t = 4 to 1 by -1;  *we are going through the string in backwards order;
 numvars[_t] = scan(_infile_,(_t-5),' '); *(_t-5) is giving us 4 -> -1 3 -> -2 etc.- I include space explicitly here as I think period otherwise might count which is bad;
end;
clinic = scan(_infile_,2); *start out using the 2nd word;
if scan(_infile_,3) = 'Clinic' then clinic=catx(' ',clinic,scan(_infile_,3)); *then maybe add the third word.  Here you could also check if compress(scan(_infile_,3),,'ka') is not missing;
input;
input @1 exinfo $50.;
put _all_;
datalines;
001 Mayo Clinic  120 78 7 15 
Patient has had a persistent cough for 3 weeks
023 Mayo Clinic  157 72 10 2 
Patient complained of ear ache
064 HMC  201 59 . . 
Patient left against medical advice
003 HMC  166 58 8 15 
Patient placed on beta-blockers on 7/1/2006
;;;;
run;
+1
source

Source: https://habr.com/ru/post/1530137/


All Articles