October, 2007
by Phil Mason
ABSTRACT
SAS® 9 has many new features, as well as all the old ones. This paper will try to outline some of the best new features that I have found over the last 3 years that I have been using SAS 9. The paper will be published in three parts. In this first part I will discuss the Data Step. Parts two and three will contain discussions on graphics, ODS, macro processing, SQL processing, and some miscellaneous topics.
Introduction
The paper is based on a series of tips & techniques that I publish to a list of subscribers to my newsletter (see www.woodstreet.org.uk). This being a hands on workshop means that I will outline the feature we are looking at and then we will follow through some example code to see how the feature works and what can be varied. Remember that the tips & techniques covered here are only a small subset of everything that is available in SAS 9. You should make sure that you read the “What’s New” documentation for the version of SAS 9 that you have. And don’t forget to always read the changes and enhancements for each new service pack you get too.
Data Step fEATURES
Writing messages to the LOG, while writing text elsewhere
In SAS 9 there is a new statement called PUTLOG, which explicitly writes to the SAS LOG. This means that you can direct regular PUT statements to write to another destination, and write to the log using PUTLOG without the need to redirect output to the LOG with a FILE LOG statement.
PUTLOG is similar to the ERROR statement, except PUTLOG does not set _error_ to 1. The ERROR statement is also available in SAS 6 & 8.
SAS Log
55 data test ;
56 put 'This goes to LOG by default' ;
57 file print ;
58 put 'This goes to OUTPUT window, since I selected print' ;
59 putlog 'but this still goes to the LOG' ;
60 put 'This goes to OUTPUT' ;
61 putlog 'NOTE: and I can write proper messages using colours' ;
62 putlog 'WARNING: ...' ;
63 putlog 'ERROR: ...' ;
64 run ;
This goes to LOG by default
but this still goes to the LOG
NOTE: and I can write proper messages using colours
WARNING: ...
ERROR: ...
NOTE: 2 lines were written to file PRINT.
NOTE: The data set WORK.TEST has 1 observations and 0 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
IN operator now accepts integer ranges
In SAS 9 the IN operator has been enhanced to accept integer ranges. This works well with IF statements, but doesn’t seem to work with WHERE statements (yet).
SAS Log
73 data sample ;
74 set sashelp.class ;
75 if age in (11, 13:15, 18:25) ;
76 run ;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.SAMPLE has 13 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.02 seconds
Concatenating strings the easy way
In SAS 9 there is a new function called CATX which makes concatenating strings easy. It will concatenate any number of character strings, removing leading & trailing blanks and inserting a separator.
SAS Log
1 data test ;
2 a=' Phil ' ;
3 b=' Mason ' ;
4 c=trim(left(a))!!' '!!left(b) ;
5 d=catx(' ',a,b) ;
6 put c= d= ;
7 run ;
c=Phil Mason d=Phil Mason
NOTE: The data set WORK.TEST has 1 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.55 seconds
cpu time 0.06 seconds
Counting words
Rather than using a more convoluted technique for counting the number of occurrences of a word in a character string, in SAS 9 we can now use the COUNT function. It will simply count the number of sub-strings that occur in a string, optionally ignoring the case (as in my example).
SAS Log
15 data _null_ ;
16 sentence='This is ONE way of using one in One sentence' ;
17 num=count(sentence,'one','i') ;
18 put num= ;
19 run ;
num=3
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Ways to use the PUT statement
The PUT statement is a very flexible tool in the data step programmers toolkit. Here are some different ways of using it – hopefully there may be one or two you have not seen before. These are not new in SAS 9, but I threw this in because I think its very useful.
|
Statement |
Explanation |
|
Put x y z ; |
Write values of 3 variables out separated by a space |
|
Put ‘hello’ ’09’x ; |
Write text followed by hexadecimal 09 – which is a tab character (in ASCII) |
|
Put 132*’_’ ; |
Write 132 underscores |
|
Put #3 @44 cost ; |
Write value of cost out beginning at line 3 column 44 |
|
Put var 1-5 ; |
Write the value of var out into the columns from column 1 to column 5 |
|
Put cost dollar12.2 ; |
Write the value of cost out using the dollar12.2 format |
|
Put (a b) (1. ‘,’ $3.) ; |
Write the value of a out using a 1. format, then a comma, then the value of b using a $3. format |
|
Put _infile_ ; |
Write out the current input buffer, as read by the last input statement |
|
Put _all_ ; |
Write out the values of all variables, including _error_ and _n_ |
|
Put _ods_ ; |
Write out the default or previously defined variables to the ODS destination |
|
Put a b c @ ; |
Write values of variables a, b & c out, separated by spaces & keep line “open” so that next put statement will continue on. If we reach end of data step iteration then line is “closed” |
|
Put d e @@ ; |
Write values of variables d & e out, separated by spaces & keep line “open”, even if we reach end of data step iteration. |
|
Put @10 name ; |
Write value of name at column 10 |
|
Put @pos name ; |
Write value of name at column specified in variable pos |
|
Put @(3*pos) name ; |
Write value of name at column calculated by value of pos multiplied by 3 |
|
Put a +3 b ; |
Write value of a followed by 3 spaces and then value of b |
|
Put a +gap b ; |
Write value of a followed by a number of spaces specified in variable gap, and then value of b |
|
Put a +(2*gap) b ; |
Write value of a followed by a number of spaces calculated by value of gap multiplied by 2, and then value of b |
|
Put #2 text ; |
Write value of text at line 2 |
|
Put #line text ; |
Write value of text at line specified in variable line |
|
Put #(line*3) text ; |
Write value of text at line calculated by value of line multiplied by 3 |
|
Put line1 / line2 ; |
Write value of line1, then go to a new line and write value of line2 |
|
Put @1 title overprint |
Write value of title and then overprint underscores on that value. This only works on some print destinations and usually looks wrong on the screen. |
|
Put _blankpage_ ; |
Ensure that a totally blank page is produced. This means that if we had written even 1 character on a page, then that page will be written as well as another totally blank page. |
|
Put _page_ ; |
This finishes the current page, causing the next thing we write out to be on a new page. |
|
Put name= phone= ; |
Write the text “name=” followed by the value of name, and then “phone=” followed by the value of phone. |
|
Put my_big_array(*) ; |
Write each element of my_big_array in the form variable=value |
Using PERL regular expressions for searching text
In SAS 9 there are new functions available for using PERL regular expressions to search text for sub-strings. There are two parts to using these, which I demonstrate in the code below.
Note: For a useful quick reference on PERL regular expressions see http://www.erudil.com/preqr.pdf
SAS Program:
data _null_;
if _N_=1 then
do;
retain patternID;
pattern = "/ave|avenue|dr|drive|rd|road/i";
patternID = prxparse(pattern);
end;
input street $80.;
call prxsubstr(patternID, street, position, length);
if position ^= 0 then
do;
match = substr(street, position, length);
put match : $QUOTE. "found in " street : $QUOTE.;
end;
datalines;
153 First Street
6789 64th Ave
4 Moritz Road
7493 Wilkes Place
;
run ;
Lines written to Log
"Ave" found in "6789 64th Ave"
"Road" found in "4 Moritz Road"
Flexible new date format
In SAS version 9 there is a new date informat which interprets dates being read based on the value of a system option. The system option is called DATESTYLE and it is used to identify sequence of month, day and year when the ANYDATE informat data is ambiguous. When dates being read are not ambiguous, then the option is ignored and date is read correctly.
SAS Log
33 options datestyle=mdy;
34 data _null_;
35 date=input('01/02/03',anydtdte8.); * ambiguous date ;
36 put date=date9.;
37 run;
date=02JAN2003
NOTE: DATA statement used (Total process time):
real time 0.51 seconds
cpu time 0.00 seconds
38 options datestyle=ydm;
39 data _null_;
40 date=input('01/02/03',anydtdte8.); * ambiguous date ;
41 put date=date9.;
42 run;
date=02MAR2001
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
43 options datestyle=myd;
44 data _null_;
45 date=input('01/31/2003',anydtdte10.); * unambiguous date, so option ignored ;
46 put date=date9.;
47 run;
date=31JAN2003
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
New IF functions
It’s always worth looking at “What’s New” for the latest SAS release. It often reveals very useful new additions, which often turn out to be available in prior releases of SAS also, though they are undocumented there.
Two of the more useful new functions are the IFN & IFC functions. These are useful for more efficient coding and can be used anywhere a function can be used, even in where clauses or macro language (using %sysfunc). In fact using the IF functions from macro language means that you can use IF logic in open code, rather than being forced to use a macro program.
IFN(condition, true-numeric-value, false-numeric-value, missing-numeric-value)
IFN returns a numeric value. It returns the true, false or missing value depending on whether the condition is true, false or missing.
IFC(condition, true-character-value, false-character-value, missing-character-value)
IFC returns a character value. It returns the true, false or missing value depending on whether the condition is true, false or missing.
Example SAS Log
21 * without IFN function ;
22 data test1 ;
23 set sashelp.class ;
24 * set entry price based on age ;
25 if age>=13 then
26 price=12.50 ;
27 else
28 price=8 ;
29 run ;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.TEST1 has 19 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
30 * with IFN function ;
31 data test2 ;
32 set sashelp.class ;
33 * set entry price based on age ;
34 price=ifn(age>=13,12.50,8) ;
35 run ;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.TEST2 has 19 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
36 %put %sysfunc(ifc(&sysscp=WIN,You are using Windows!,You are not using Windows)) ;
You are using Windows!
Sorting array elements
In SAS 9 there is a new routine which can be used to sort the values of a list of variables passed to it. SORTN should be used for sorting numerics, and SORTC for character values. If the variables belong to an array then these sort routines effectively sort the values of the array. The sorts are always done into ascending sequence, but by specifying the variables in reverse order you can effectively sort in descending sequence.
Note: character variables must be same length.
SAS Log
1 *** SORTING ARRAY ELEMENTS
2 data _null_;
3 array v(50) 8;
4 do i=1 to 50;
5 v(i)=i;
6 end;
7 put ‘Up: ‘ v(1)= v(2)= v(3)= V(48)= v(49)= v(50)=;
8 call sortn(of v50-v1);
9 put 'Down: ' v(1)= v(2)= v(3)= v(48)= v(49)= v(50)=;
10 * sort values between 3 character variables;
11 * note: character variables must be same length to avoid errors;
12 x='3 dogs ';
13 y='1 cat ';
14 z='2 frogs';
15 call sortc(x,y,z);
16 put x= y= z=;
17 run;
Up: v1=1 v2=2 v3=3 v48=48 v49=49 v50=50
NOTE: The SORTN function or routine is experimental in release 9.1.
Down: v1=50 v2=49 v3=48 v48=3 v49=2 v50=1
NOTE: The SORTC function or routine is experimental in release 9.1.
x=1 cat y=2 frogs z=3 dogs
NOTE: DATA statement used (Total process time):
real time 0.17 seconds
cpu time 0.01 seconds
Some basics of creating XML in SAS 9
SAS 9 handles XML extremely well via the XML libname engine. To create an XML file I could use some code like this. Remember to free the libref so you can access the XML file once you have created it.
SAS Code to create XML
libname test xml 'c:\test.xml' ;
data test.sample ;
name='Phil Mason' ;
age=40 ;
sex='M' ;
phone='01491 824905' ;
country='England' ;
run ;
libname test ;
XML created in c:\test.xml
<?xml version="1.0" encoding="windows-1252" ?>
<TABLE>
<SAMPLE>
<name> Phil Mason </name>
<age> 40 </age>
<sex> M </sex>
<phone> 01491 824905 </phone>
<country> England </country>
</SAMPLE>
</TABLE>
What happens if you write multiple datasets to an XML file?
The following code will only write the last file (test.c) to the XML.
libname test xml 'c:\test.xml' ;
data test.a ; x=1 ; run ;
data test.b ; x=1 ; run ;
data test.c ; x=1 ; run ;
libname test ;
This code will also only write the last file (test.c) to the XML, even though the log seems to indicate that they have all been written.
libname test xml 'c:\test.xml' ;
data test.a test.b test.c ; x=1 ; run ;
libname test ;
I could use the following code though, which would create an XML file containing all three tables.
libname test xml 'c:\test.xml' ;
data a b c ; x=1 ; run ;
proc copy in=work out=test ;
select a b c ;
run ;
libname test ;
SAS Code to use XML
To make use of the XML file produced above I could simply point my libname statement at it, and then refer to the table name defined in the XML.
libname test2 xml 'c:\test.xml' ;
proc print data=test2.sample ;
run ;
libname test2 ;
Ignoring case for comparisons
When comparing text in SAS you will note that uppercase and lowercase are not equivalent. So “abc” does not equal “ABC”. Sometimes you may want to compare text and ignore case. There are several ways to do this.
Method 1
If using a where clause you can use the upcase() function against any variables, and make sure that any text literals are in uppercase.
e.g.
proc print data=mydata ;
where upcase(name)='PHIL' ;
run ;
Method 2
In a datastep convert all text to uppercase, so you will then compare uppercase with uppercase. You also need to make sure that any text in custom formats is also in uppercase. The following code demonstrates a simple but effective technique for creating an array containing all character variables and then processing them in some way. You can also use the VNAME() function to handle individual variables as exceptions.
e.g.
data test ;
set sashelp.prdsale ;
* make an array containing all character variables currently defined ;
array c(*) _character_ ;
* loop through array ;
do _i=1 to dim(c) ;
vname=vname(c(_i)) ;
* uppercase each character variable, except COUNTRY ;
if vname^="COUNTRY" then
c(_i)=upcase(c(_i)) ;
end ;
run ;
New currency formats for UK and others
Some of the SAS 9.1.3 features include new National Language formats, including a long awaited currency format for British Pounds. There are a range of others, reflecting a range of currencies from around the world:
Sample SAS Program
DATA _NULL_;
x = 1234.56;
PUT 'UK - ' @20 x NLMNLGBP. @40 x NLMNIGBP.
/ 'Japan - ' @20 x NLMNLJPY. @40 x NLMNIJPY.
/ 'Australia - ' @20 x NLMNLAUD. @40 x NLMNIAUD.
/ 'Euro - ' @20 x NLMNLEUR. @40 x NLMNIEUR.
/ 'New Zealand - ' @20 x NLMNLNZD. @40 x NLMNINZD. ;
RUN;
Sample SAS Log
77 DATA _NULL_;
78 x = 1234.56;
79 PUT 'UK - ' @20 x NLMNLGBP. @40 x NLMNIGBP.
80 / 'Japan - ' @20 x NLMNLJPY. @40 x NLMNIJPY.
81 / 'Australia - ' @20 x NLMNLAUD. @40 x NLMNIAUD.
82 / 'Euro - ' @20 x NLMNLEUR. @40 x NLMNIEUR.
83 / 'New Zealand - ' @20 x NLMNLNZD. @40 x NLMNINZD. ;
84 RUN;
UK - £1,234.56 GBP1,234.56
Japan - JPY1,235 JPY1,235
Australia - AU$1,234.56 AUD1,234.56
Euro - €1,234.56 EUR1,234.56
New Zealand - NZ$1,234.56 NZD1,234.56
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Contact Information
Your comments and questions are valued and encouraged. Contact the author at:
Phil Mason
Wood Street Consultants Ltd.
16 Wood Street
Wallingford, Oxfordshire, OX10 0AY, ENGLAND
E-mail: phil@woodstreet.org.uk
Web: www.woodstreet.org.uk
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.