How to sort variables horizontally with arrays and use call sorting

The user sasprofessionals.net had the problem of inability to group his data set into several variables, where the variable values ​​are interchangeable as part of the observation, because they carried the same value.

In the sample dataset, observations 2,3 and 7 are the same, since each of them has the values ​​A14, A14, and A10 as the values ​​for Stat1-Stat3, and this is only the order is different. They should be grouped by graph. Observations 5 and 6 form another group, which should be summarized by counting.

Dataset example:

Obs Stat1 Stat2 Stat3 Count
1   A14   A14   A14   53090
2   A14   A14   A10   6744
3   A14   A10   A14   5916
4   A01   A01   A01   4222
5   A10   A10   A10   3085
6   A10   A10   A10   2731
7   A10   A14   A14   2399

Desired Result:

Obs Stat1 Stat2 Stat3 Count
1   A14   A14   A14   53090
4   A01   A01   A01   4222
6   A10   A10   A10   5816
7   A10   A14   A14   15059

The actual data set is bigger and more complex. I do not know if the user tried to use any methods to solve the problem.

sasprofessionals.net, StackOverflow . StackOverflow Q & A.

+4
2

, . , Stat1-Stat3 , , , Stat1-Stat3.

/* Loading the data into SAS dataset */ 
/* Loading Stat1-Stat3 into an array */
/* Sorting stat1-stat3 creating a new ID */
data have; 
input obs stat1 $ stat2 $ stat3 $ count; 
array stat{3} stat1-stat3;
call sortc(of stat1-stat3); 
ID = CATX("/",stat1,stat2,stat3);
datalines; 
1 A14 A14 A14 53090
2 A14 A14 A10 6744
3 A14 A10 A14 5916
4 A01 A01 A01 4222
5 A10 A10 A10 3085
6 A10 A10 A10 2731
7 A10 A14 A14 2399
; 


/* sorting the data set in preparation for data step with by statement*/
PROC SORT data=have; 
BY ID OBS; 
RUN; 

/* Summarising the dataset and outputing into final dataset*/
DATA summed (drop=ID count); 
set sorted_arrays; 
by ID; 
retain sum 0; 
if first.ID then sum = 0; 
sum + count; 
if last.ID then output; 
RUN; 

/* Sorting it back into original order */
PROC SORT data=summed out=want; 
BY OBS; 
RUN; 
+2

, . , - , . .

- , - . , , , , , . .

/questions/ , .:)

data have; 
input stat1 $ stat2 $ stat3 $ count; 
datalines; 
A14 A14 A14 53090
A14 A14 A10 6744
A14 A10 A14 5916
A01 A01 A01 4222
A10 A10 A10 3085
A10 A10 A10 2731
A10 A14 A14 2399
; 

data want;
  length _stat $3;

  if _n_=1 then do;
    declare hash  hstat(multidata:"y", ordered:"y");
    declare hiter hstatiter ("hstat" ) ;      
    hstat.definekey('_stat');
    hstat.definedata('_stat'); 
    hstat.definedone();
    call missing(_stat);

    declare hash  hsum(suminc: "count", ordered: "y");
    declare hiter hsumiter ("hsum" ) ;      
    hsum.definekey("stat1","stat2","stat3");
    hsum.definedone();
  end;

  set have end=last;

  array stat{3};

  *load the array values into htable hstat to sort them;
  *then iterate over the hash, returning the values to array in sorted order;
  do _i=1 to dim(stat);  
    hstat.add(key:stat{_i},data:stat{_i});
  end;
  do _i=1 to dim(stat);
    hstatiter.next();
    stat{_i}=_stat;
  end;
  _rc=hstatiter.next(); *hack- there is no next, this releases hiter lock so can clear hstat;
  hstat.clear();

  *now that the stat keys have been sorted, can use them as key in hash table hsum;
  *as data are loaded into/checked against the hash table, counts are summed;
  *Then if last, iterate over hsum writing it to output dataset;

  hsum.ref(); *This sums count as records are loaded/checked;

  if last then do;
    _rc = hsumiter.next();
    do while(_rc = 0);
      _rc = hsum.sum(sum: count);
      output ;
      _rc = hsumiter.next();
    end;
  end;

  drop _: ;
run;
0

Source: https://habr.com/ru/post/1588991/


All Articles