Dummy variables in SAS

Suppose we have some data set peoplethat has a categorical variable incomewith 4 levels (1,2,3,4). How will we code this in SAS? This will:

data people;
set people;
if income=1 then income1=1;
else if income=2 then income2=1
else if income  =3 then income3=1;
run;

In other words, this would create three dummy variables for four levels. Is it correct?

+4
source share
5 answers

Arrays are a slightly more flexible way to do this.

data people;
set people;
array incomes income1-income4;
do _t = 1 to dim(incomes);
  if income=_t then income[_t] = 1;
  else if not missing(income) then income[_t]=0;
  else income[_t]=.;
end;
run;
+5
source

I changed your code below. This would give 3 dummy encoded variable. income = 4will be your reference code.

data people_dummy;
         set people;
         if income=1 then income1=1 ; else income1=0;
         if income=2 then income2=1 ; else income2=0; 
         if income=3 then income3=1 ; else income3=0;
run;
+1
source

- .

%macro cat(indata, variable);
  proc sql noprint;
    select distinct &variable. into :mvals separated by '|'
    from &indata.;

    %let mdim=&sqlobs;
  quit;

  data &indata.;
    set &indata.;
    %do _i=1 %to &mdim.;
      %let _v = %scan(&mvals., &_i., |);
      if &variable. = &_v. then &variable.&_v. = 1; else &variable.&_v = 0;
    %end;
  run;
%mend;

%cat(people, income);
+1

"else". :

    income1_ind=(income1 eq 1);
    income2_ind=(income2 eq 2);
+1
source

Code: -

proc sql noprint;
 select distinct 'income' || strip(put(income,8.)) into :income_var    separated by ' '
 from people;
quit;

data people(rename = (in = income));
 set people(rename = (income = in));
 length &income_var. 8;
 array tmp_arr(*) income:;
 do i = 1 to dim(tmp_arr);
    if in eq i then tmp_arr(i) = 1;
    else tmp_arr(i) = 0;
 end;
 drop i;
run;

Work: SAS code is dynamic and will work for any number of levels of income variable, because it automatically creates the number of variables according to the number of different levels in the data set of people.

The data step sets the corresponding variable to 1, and the others to 0 according to the value of the income variable.

0
source

Source: https://habr.com/ru/post/1524531/


All Articles