SAS Proc means: how to capture default statistics in an output dataset like nmiss p1 p99 etc.?

The original question:

By default, Proc Means outputs N, MIN, MEAN, MAX, and STD to the output dataset . How to add NMISS, P1, P5, etc. to this list.


Additional Information 1:

I need statistics for all the numeric variables in my dataset. Therefore, I use _numeric_ in the var specification.

I do not want each statistic to be in a row and variables for columns.

  Obs _TYPE_ _FREQ_ _STAT_ var1 var2 var3 etc 1 0 84829 N 84826.00 2 0 84829 MIN 0.00 3 0 84829 MAX 5000.00 4 0 84829 MEAN 151.22 5 0 84829 STD 1989.47 6 0 84829 NMISS 3 7 0 84829 P1 2.00 8 0 84839 P99 4999.00 

How to do it?

Thanks!

+4
source share
2 answers

This article has an excellent discussion of the exact problem that you are describing, along with a macro to output a dataset that matches your description.

Best Tool - ODS Data Trap

Update: I found that there is a more recent document that "presents a revised version of a macro that supports additional features and fixes an amazing bug." This is an updated solution:

Solve the SAS® ODS Data Trap in PROC MEANS

The macro looks well designed and avoids many potential problems. The paths used to create the output dataset include calls to proc means (of course), proc sql , proc contents and proc datasets and extensive use of the macro language architecture, and describing them will probably not be instructive in this answer. I do not claim to fully understand this.

However, once you have compiled the macro, you can create the data set you need with one simple statement.

 %better_means(data=MyDataSet) 

Now that I have found this convenient solution, I can start using it myself.

+3
source

Assuming that you use the output parameter in the proc tools (and not ODS OUTPUT), you can control what is included in this data set, for example:

 proc means data=sashelp.class; var age; class sex; output out=mymeans nmiss= P1= P5= /autoname; run; 

A complete list of statistics names is available in the PROC MEANS documentation under the "statistics keyword" section.

You can also achieve the same result (with a slightly different output format) with ODS OUTPUT.

 ods output summary=mymeans; ods trace on; proc means data=sashelp.class nmiss p1 p5; var age; class sex; run; ods trace off; ods output close; 

Turn on / off ODS TRACE - show the name of the created table (i.e. 'summary'). This is not necessary in production. In this case, you request statistics in the same way as you request them in the output window (in the PROC MEANS instruction).

Based on your changes, you want it to be transposed (one line in the statistics). You cannot get it directly, but transposing is not very difficult.

 proc means data=sashelp.class nmiss p1 p5; class sex; var _numeric_; output out=mymeans n= mean= nmiss= p1= p5= /autoname ; run; data mymeans_out; set mymeans(drop=_type_ _freq_); by sex; array numvars _numeric_; format var stat $32.; do _t = 1 to dim(numvars); var=scan(vname(numvars[_t]),1,'_'); stat=scan(vname(numvars[_t]),-1,'_'); value = numvars[_t]; output; end; keep sex var stat value; run; 

This has several limitations. If your variable names already have underscores in them, the string var=scan... must be rewritten to use substr and find the last underscore, then var = substr(vname(...),1,position_of_last_underscore) . Stat must be accurate, as it uses -1 (reverse direction). If your variable names can exceed ~ 23 characters, you cannot return the exact name of the variable, as it may be truncated or changed. If this happens, then the ODS OUTPUT solution at the top will help you (since the name of the source variable is indicated in an additional column), but additional work will be required to associate this value with the truncated name.

I also discard _TYPE_ and _FREQ_ to simplify the definition of the array; if you need it, then you need to write some code to exclude them from separate output and save them.

+4
source

Source: https://habr.com/ru/post/1485060/


All Articles