Multiple Hash Objects in SAS

I have two SAS datasets. The first is relatively small and contains unique dates and the corresponding identifier:

date dateID 1jan90 10 2jan90 15 3jan90 20 ... 

The second data set is very large and has two date variables:

 dt1 dt2 1jan90 2jan90 3jan90 1jan90 ... 

I need to map both dt1 and dateID to dateID , so the output will look like this:

 id1 id2 10 15 20 10 

Efficiency is very important here. I know how to use a hash object to make one match, so I could take one data step to match for dt1 and then another step for dt2 , but I would like to do both in one data step. How can I do that?

Here is how I would make a match only for dt1 :

 data tbl3; if 0 then set tbl1 tbl2; if _n_=1 then do; declare hash dts(dataset:'work.tbl2'); dts.DefineKey('date'); dts.DefineData('dateid'); dts.DefineDone(); end; set tbl1; if dts.find(key:date)=0 then output; run; 
+4
source share
3 answers

The format is likely to work just as efficiently considering the size of your hash table ...

 data fmt ; retain fmtname 'DTID' type 'N' ; set tbl1 ; start = date ; label = dateid ; run ; proc format cntlin=fmt ; run ; data tbl3 ; set tbl2 ; id1 = put(dt1,DTID.) ; id2 = put(dt2,DTID.) ; run ; 

An edited version based on the following comments ...

 data fmt ; retain fmtname 'DTID' type 'I' ; set tbl1 end=eof ; start = date ; label = dateid ; output ; if eof then do ; hlo = 'O' ; label = . ; output ; end ; run ; proc format cntlin=fmt ; run ; data tbl3 ; set tbl2 ; id1 = input(dt1,DTID.) ; id2 = input(dt2,DTID.) ; run ; 
+5
source

I don't have SAS in front of me right now to check it out, but the code will look like this:

  data tbl3; if 0 then set tbl1 tbl2; if _n_=1 then do; declare hash dts(dataset:'work.tbl2'); dts.DefineKey('date'); dts.DefineData('dateid'); dts.DefineDone(); end; set tbl1; date = dt1; if dts.find()=0 then do; id1 = dateId; end; date = dt2; if dts.find()=0 then do; id2 = dateId; end; if dt1 or dt2 then do output; * KEEP ONLY RECORDS THAT MATCHED AT LEAST ONE; drop date dateId; run; 
+2
source

I agree with the solution for the format, for one, but if you want to make a hash solution, here it is. The main feature here is that you define the key as a variable that you match, and not in the hash itself.

 data tbl2; informat date DATE7.; input date dateID; datalines; 01jan90 10 02jan90 15 03jan90 20 ;;;; run; data tbl1; informat dt1 dt2 DATE7.; input dt1 dt2; datalines; 01jan90 02jan90 03jan90 01jan90 ;;;; run; data tbl3; if 0 then set tbl1 tbl2; if _n_=1 then do; declare hash dts(dataset:'work.tbl2'); dts.DefineKey('date'); dts.DefineData('dateid'); dts.DefineDone(); end; set tbl1; rc1 = dts.find(key:dt1); if rc1=0 then id1=dateID; rc2 = dts.find(key:dt2); if rc2=0 then id2=dateID; if rc1=0 and rc2=0 then output; run; 
+2
source

Source: https://habr.com/ru/post/1433925/


All Articles