April, 2008
by Phil Mason
Often when I am programming I need to know something about a SAS dataset that I am using. Perhaps I want to know how many observations it has, how long a particular variable is or what a label is. Well with SAS there are always many ways to do anything, and getting information like this is no exception. I could perhaps use a Proc Contents like so:
proc contents data=sashelp.class out=contents noprint ;
run ;
This would give me a dataset with all the information I want, but then I would need to go through it to pick out the parts I want.
Another way to get the information would be to use one of the SQL dictionary tables. To get the number of observations in a dataset I could use some code like this:
proc sql ;
create table nobs as select nobs from dictionary.tables where libname='SASHELP' and memname='CLASS' ;
I might want to get this into a macro variable since it is often more useful. If it’s in a macro variable then I could use it in a datastep, macro program, title statement, and so on. Some code like this would do the job:
proc sql noprint ;
select nobs into :nobs from dictionary.tables where libname='SASHELP' and memname='CLASS' ;
Another way to get information like this into macro variables is by using functions which originated in the SAS/AF product as part of Screen Control Language – now called SAS Component Language. These functions can be used from the data step and enable you to open datasets, read/write to them, get information and so on. In a datastep I could get the number of observations and put it into a macro variable like this:
data _null_ ;
dsid=open('sashelp.class') ; /* open dataset, and keep the ID */
nobs=attrn(dsid,'nobs') ; /* get the numeric attribute NOBS - number of observations */
call symputx('nobs',nobs) ; /* write the value out to a macro variable */
dsid=close(dsid) ; /* close dataset using the ID */
run ;
%put nobs=&nobs ;
Of course if I only wanted the number of observations then another way to do it with a datastep would be like this:
data _null_ ;
call symputx('nobs',nobs) ; /* nobs is available before the set statement executes */
stop ; /* stop dataset execution before we read an observation */
set sashelp.class nobs=nobs ; /* put number of observations into variable NOBS */
run ;
%put nobs=&nobs ;
What’s the benefit of using the attrn() function you may be thinking. Well the attrn() function can give you a range of other information as well as number of observations including creation date, whether integrity constraints are in use, whether the dataset is indexed, last modified date, number of variables, etc. There is a similar function called the attrc() function. This will give you character attributes from a dataset. It is very useful if you want to know things like what character set is in use (EBCDIC, ASCII, etc.), name of engine being used, variables sorted by, dataset label, etc.
The thing that I mostly want to tell you in this article is that I think there is an even better way of getting information from datasets and it involves using the dataset functions from macro language. As you may know, you can use the %SYSFUNC macro function to call most (but not all) dataset functions directly from macro language. You can then write a macro program which will open a dataset, get the number of observations (or other information), and then close the dataset again. For example, the following code shows a general purpose macro which will substitute the number of observations when called.
%macro nobs(dataset) ;
%let dsid=%sysfunc(open(&dataset)) ;
%sysfunc(attrn(&dsid,nobs))
%let dsid=%sysfunc(close(&dsid)) ;
%mend nobs ;
title "sashelp.class has %nobs(sashelp.class) obs" ;
proc print data=sashelp.class ;
run ;
Or if you wanted a macro program which would give you a dataset label, then you could use some code like this:
%macro label(dataset) ;
%let dsid=%sysfunc(open(&dataset)) ;
%sysfunc(attrc(&dsid,label))
%let dsid=%sysfunc(close(&dsid)) ;
%mend label ;
title "%label(sashelp.air)" ;
proc print data=sashelp.air ;
run ;
And there is one slightly more complicated thing that macros can make easy. You might want to get some information about a variable by using these dataset functions being called from a macro program. Using the functions you can get all kinds of information about variables such as label, length, format, informat, etc. To show you how, here is one last example.
%macro varlabel(dataset,variable) ;
%let dsid=%sysfunc(open(&dataset)) ;
%let varnum=%sysfunc(varnum(&dsid,&variable)) ;
%sysfunc(varlabel(&dsid,&varnum))
%let dsid=%sysfunc(close(&dsid)) ;
%mend varlabel ;
title "%varlabel(sashelp.prdsale,predict) vs. %varlabel(sashelp.prdsale,actual)" ;
proc gplot data=sashelp.prdsale ;
plot predict*actual ;
run ;