SAS Procedures

Introduction to Procedures
Sorting and Running a proc by Subgroups
Numerical Summaries
Graphical Summaries

Return to SAS Introduction or Information on SAS.

Introduction to Procedures

Procedures come in many forms. They consist of the proc phrase followed by a set of sub-phrases particular to the procedure invoked. The proc phrase in its simplest form is simply (using the means procedure to illustrate)

proc print;

This automatically uses the data set from the previous proc or data step. The form

proc print data=a;

explicitly uses the data set a rather than the previously or created one. Procedures usually produce printed output (in myfile.lst), but do not create a new or add to existing data sets unless this is made explicit with an output phrase. For instance,

proc means;
   . . .
   output out = newname . . . ;

This explicitly creates the data set newname. Each proc has its own sub-phrases (the first ". . ." above) and their own set of variables that can be added to the new data set. The general form of the output phrase is:

   output out=d1 a=a1 b=b1 c=c1;

with out= being the keyword for the data set name d1 and a= b= and c= being any number of optional keywords for new variables. The names after the equals -- a1, b1 and c1, respectively -- are up to you. They are the names of these variable that you can later use. Now to specifics. Here I give some phrases which may be useful. Others can be found in the SAS/STAT book. For the output phrase, I indicate some keywords.

Sorting and Running a `proc` by Subgroups

Sometimes it is very helpful to run each of several subgroups through some summary or analysis procedure. This can be done with the sort procedure and use of the by phrase:

proc sort; by trt;
proc means; by trt;

will first sort the data by treatment trt and then run the means procedure separately for each treatment group. This is much cleaner than running SAS 3 times, each time retaining only the treatment group under study. However, it does produce a lot more output! The by phrase is on all procedures. BUT you MUST sort before you use it. You can sort by several things at once:

proc sort; by sex trt;
proc means; by sex trt;

Sorting is cheap. It is a good idea to always run proc sort before using by with other procedures, even if you think you did it earlier in your program.
NOTE: While you can get separate printed listings for each treatment (in myfile.lst), you get only one data set if you use by.

Numerical Summaries

proc univariate;		/* detailed univariate summaries */
   var x y;			/* for variables x and y */
   output out=b mean=mx std=sx;	/* create set b with mean and SD for x only */

proc means;			/* means, SDs, min, max for each variable
   var x y;			/* for variables x and y */
   output out=b mean=mx my std=sx sy;/* output means and SD for x and y */

proc means noprint;		/* useful form if you do not want printout */
   var x y;
   output out=b mean=mx my std=sx sy;/* output means and SD for x,y */

Graphical Summaries

The main character-based graphic routines are univariate plot (1-dimensional) plot (two-dimensional). There is a system of fancy graphics routines (begining with letter g) introduced briefly at the end of this section. [Feedback so far is that manuals for these are confusing.] In addition, SAS has a module called INSIGHT which some have found very nice for graphics and general user interface. [I have no experience with this -- volunteers for documentation out there?]

proc univariate plot normal;	/* histogram type summaries */
   var x;

The plot option to proc univariate produces a stem-and-leaf plot, a box plot, and a normal probability plot. The normal option tests for normal distribution.

proc plot;			/* scatter plot */
   plot y*x;			/* plot y vertical and x horizontal */
   plot y*x='*';		/* use "*" as plotting symbol */
   plot y*z=trt;		/* use value of trt as plotting symbol */
   plot y*x='*' y*z / overlay;	/* overlay two plots on same page */

Here is a way to construct Interaction Plots. It gives you a plot of the average values of y for each period and trt.

proc sort; by period trt;
proc means noprint; by period trt;
   var y;
   output out=means mean=my;
proc plot;
   plot my*period=trt;

The noprint option used in proc means is available for many procedures. Sometimes it can be very handy in shortening output. You can do plots by another variable.

Here is a fancier way to construct Interaction Plots and some Diagnostic Plots, which allows you to use the ful value of statistical modelling. The basic idea is to fit the desired model, save the least squares means (lsmeans) as a dataset, and print from that. Diagnostic information is saved with the output phrase.

proc glm;
   class a b;
   model y = a | b;
   lsmeans a*b / out=lsm;
   output out=diag p=py r=ry;
proc plot data=lsm;			/* Interaction Plot */
   plot lsmean*a=b;			/* cell mean v. a by b */
   plot lsmean*b=a;			/* cell mean v. b by a */
proc plot data=diag;			/* Diagnostic Plots */
   plot y*py py*py='*' / overlay;	/* observed v. predicted */	
   plot ry*py;				/* residual v. predicted */

Here is a way to keep uniform axes for separate plots by location:

proc plot uniform; by location;
   plot y*x;

You can set several plot features:

   plot y*x / vaxis=10 to 100 by 5;	/* vertical axis ticks */
   plot y*x / haxis=10 to 20 by 2;	/* horizontal axis ticks */
   plot y*x / vzero hzero;		/* include origin on plot */
   plot ry*py / href=0;			/* horizontal reference line */
   plot y*x py*x='*' / overlay;		/* overlay two plots */

Fancy Graphics

Here is some brief information about fancy graphics with SAS. The goptions phrase allows you to pick a device (there is an X-windows device, but I forget what it is called). The first example creates a biplot with text labels (cf. proc prinqual in SAS/STAT book) as a two-dimensional generic Tektronix window (say on a Macintosh).

goptions device=tek4014;
data biplot; set results;
   if _type_ = 'SCORE' then do; text = substr(_name_,4,5); end;
   if _type_ = 'CORR' then do;  text = substr(_name_,1,3); end;
   x = prin1; y = prin2; z = prin3;
   xsys = '2'; ysys = '2'; zsys = '2';
   size = 1;
   label z = 'Dimension 3'; label y = 'Dimension 2'; label x = 'Dimension 1';
   keep x y z text xsys ysys zsys size;
proc gplot;
   title3 'BiPlot of Stimuli and Chemicals';
   symbol1 v=none;
   plot y*x=1 / annotate=biplot frame href=0 vref=0;

The second example creates a 3-dimensional plot. The third example following takes data from file hatps.dat, creates a 3-D plot, and outputs it to a postscript printer. I am not sure if this is saved as a file, or sent directly to one's default print device.

goptions device=tek4014;
proc g3d;
   plot y*x=z / tilt=45 rotate=45;

filename gsasfile 'hatps.dat';
goptions device=ps hsize=8.5 vsize=8.5 gaccess=gsasfile;
proc g3d;
   plot y*x=z / tilt=45 rotate=45;

Return to U WI Statistics Home Page