SAS: hidden save statement inside a dial statement?

Consider the following example:

/* Create two not too interesting datasets: */ Data ones (keep = A); Do i = 1 to 3; A = 1; output; End; run; Data numbers; Do B = 1 to 5; output; End; Run; /* The interesting step: */ Data together; Set ones numbers; if B = 2 then A = 2; run; 

Thus, the data set contains one variable A with three observations, all units and numbers of the data set contain one variable (B) with 5 observations: digits from 1 to 5. I expect the resulting data set to have two columns (A and B ) and columns A for reading (vertically) 1, 1, 1,., 2,.,.,.

However, when I run the code, I find that column A reads 1, 1, 1,., 2, 2, 2, 2

Apparently, the 2 created in the fifth observation are preserved completely for no apparent reason. What's going on here?

(For completeness: when I split the last data step into two, as shown below:

 Data together; set ones numbers; run; Data together; set together; if B = 2 then A = 2; run; 

he does what I expect.)

+6
source share
1 answer

Yes, any variable defined in the SET , MERGE or UPDATE is automatically saved (not indicated at the top of the data step loop). You can effectively ignore this with

 output; call missing(of <list of variables to clear out>); run; 

at the end of the data step.

This is how MERGE works for one-to-one mergers, by the way, and the reasons why many-to-many merges don't usually work the way you want.


The difference between โ€œtogetherโ€ and โ€œseparateโ€ cases is that in a separate case you have two data sets with different variables. If you run it interactively, i.e. SAS Program Editor or Enhanced Editor (not EG or batch mode), you can use the data step debugger to see it a little more clearly. You will see the following:

At the end of the last row of the ones dataset:

 i AB 3 1 . 

Notice B exists but is missing. It then returns to the top of the data step loop. All three variables remain valid because they are all from data sets. Then it tries to read from ones again, which generates:

 i AB . . . 

Then he realizes that he cannot read from ones and begins to read from numbers . At the end of the first row of the numbers dataset:

 i AB . . 1 

Then it will move to the top, change nothing again; then he reads in 2 for B.

 i AB . . 2 

Then it sets A to 2 for your program:

 i AB . 2 2 

Then it returns to the beginning of the data step cycle.

 i AB . 2 2 

Then it reads in B = 3:

 i AB . 2 3 

Then he continues the cycle, for B = 4, 5.

Now compare this to a single dataset. It will be almost the same (with a slight difference when switching between data sets that do not give another result). Now go to the step where A = 2 B = 2:

 i AB . 2 2 

Now, when the data step is read in the next line, it has all three variables on it. Thus, it gives:

 i AB . . 3 

Since it reads in =. from the line, he sets its absence. In the version with one data type, it had no value for reading A, so it did not replace 2 with the missing one.

+7
source

Source: https://habr.com/ru/post/955915/


All Articles