How to detect all empty columns in a dataset and delete \ drop them?

As suggested in the header, I would like to remove all empty columns / variables (where all entries are empty or equal to zero or "") to reduce time spent on subsequent execution.

Detailed scenario:

I have a dataset () with 1000 columns, some \ lots of which are empty. Now I want to create a new data set in which I need to add columns under certain conditions of the previous data set.

data new; set old; if oldcol1 ne "" then newcol1='<a>'||strip(oldcol1)||'</a>'; end; if oldcol2 ne "" then newcol2='<a>'||strip(oldcol2)||'</a>'; end; ... ...; drop oldcol1 oldcol2.....oldcol1000; run; 

It takes enough time to complete, given the following reason:

  • the number of old columns is huge

  • actually i need to do a loop in another dataset to set the number after oldcol

Colnumber

  1 2 3 

...

1000

So, you can imagine how many times you need to perform in terms of searching, searching and setting values.

Therefore, one of the ways in which time could be reduced is to first clear all empty columns. But any material regarding algorithm optimization is also welcome.

thanks

+4
source share
3 answers

Here is a general macro that you can use to create a list of empty columns in a source dataset, which you can then pass to the drop statement. It uses the proc and proc freq format, so it is relatively fast.

 %macro findmiss(ds,macvar); %local noteopt; %let noteopt=%sysfunc(getoption(notes)); option nonotes; *ds is the data set to parse for missing values; *macvar is the macro variable that will store the list of empty columns; %global &macvar; proc format; value nmis .-.z =' ' other='1'; value $nmis ' '=' ' other='1'; run; ods listing close; ods output OneWayFreqs=OneValue( where=(frequency=cumfrequency AND CumPercent=100)); proc freq data=&ds; table _All_ / Missing ; format _numeric_ nmis. _character_ $nmis.; run; ods listing; data missing(keep=var); length var $32.; set OneValue end=eof; if percent eq 100 AND sum(of F_:) < 1 ; var = scan(Table,-1,' '); run; proc sql noprint; select var into: &macvar separated by " " from missing;quit; option &noteopt.; %mend; 

Here is how you can use it:

 %findmiss(old,droplist); /*generate the list of empty columns */ data new; set old(drop=&droplist); run; 
+2
source

I agree that proc transpose is a good idea:

 proc transpose data=old out=temp; var _ALL_; run; data _NULL_; set temp end=eof; array cols {*} COL: ; do i = 1 to dim(cols); cols[i]=ifn((strip(cols[i])=" " or strip(cols[i])="."),0,1); end; if sum(of COL:)=0 then call symput("dropvars", catx(" ",symget("dropvars"),_NAME_)); run; data new; set old (drop=&dropvars); run; 
+1
source

Something like that?

 data work.temp1; attrib idcol length=8; set work.old; idcol=_n_; run; proc transpose data=work.temp1 out=work.temp2 name=varname; var oldcol1-oldcol1000; by idcol; run; proc sql; create table work.temp3 as select distinct varname from work.temp2 where not missing(col1); quit; data _null_; set work.temp3 end=lastrec; attrib nvarname length=$32; if _n_=1 then do; call execute('data work.new;'); call execute('set work.old;'); end; nvarname = 'newcol' || strip(input(substr(varname,4),4.)); call execute('attrib ' || strip(nvarname) || ' length=$250;'); call execute(strip(nvarname) || '= "<a>" || strip(' || strip(varname) || ') || "</a>";' ); if lastrec then do; call execute('drop oldcol1-oldcol1000;'); call execute('run;'); end; run; 
0
source

Source: https://habr.com/ru/post/1345827/


All Articles