proc reg; /* simple linear regression */ model y = x; proc reg; /* weighted linear regression */ model y = x; weight w; proc reg; /* multiple regression */ model y = x1 x2 x3;The model phrase indicates which variables are response (y) and which are predictors (x, or x1,x2,x3). Here are some print options for the model phrase:
model y = x / noint; /* regression with no intercept */ model y = x / ss1; /* print type I sums of squares */ model y = x / p; /* print predicted values and residuals */ model y = x / r; /* option p plus residual diagnostics */ model y = x / clm; /* option p plus 95% CI for estimated mean */ model y = x / cli; /* option p plus 95% CI for predicted value */ model y = x / r cli clm; /* options can be combined */CAUTION: SAS listings label the standard error of the estimated mean as the STD ERROR PREDICT. Be wary and know what these things mean! Some of the residual diagnostics go beyond the material cover here. You may explore these on your own.
It is possible to let SAS do the predicting of new observations and/or estimating of mean responses. The way to do this is to enter the x values (or x1,x2,x3 for multiple regression) you are interested in during the data input step, but put a period (.) for the unknown y value. That is,
data new; input x y; cards; 1 0 2 3 3 . 4 3 5 6 ; proc reg; model x = y / r cli clm;Try it, and check standard errors and confidence intervals by hand. Here are some other model options for more advanced stuff:
model y = x / covb; /* covariance matrix for estimates */ model y = x / collin; /* collinearity diagnostic */ model y = x / collinoint; /* collin without intercept */The output phrase can have several keywords (which can be used together):
output out=b predicted=py; /* predicted values in "py" */ output out=b p=py; /* same as predicted */ output out=b residual=ry; /* residual values in "ry" */ output out=b r=ry; /* same as residual */ output out=b stdr=sr; /* standard error of residuals "sr" */ output out=b student=sy; /* studentized residuals "sy" */Only one output phrase can be used, but you can combine keywords on one line:
output out=b p=py r=ry stdr=sr student=sy;Those new variables created in set b are available for later plotting, etc.
proc anova; /* one-way analysis of variance */ class trt; model y = trt; proc anova; /* 1-way with multiple comparisons */ class trt; model y = trt; means trt / lsd snk; /* LSD and Student-Neumann-Kohl */ proc anova; /* two-way anova */ class fert var; model y = fert var; means fert var / lsd; /* means by fert and var with LSD */ proc anova; /* two-way anova with interaction */ class fert var; model y = fert var fert*var; /* interaction signified by asterisk */ means fert var / lsd; means fert*var; /* for each fert-var combination */The class phrase is required, identifying all factors as categorical variables. The model phrase has only a few options, and these are not often used. The means phrase is quite handy to do multiple comparisons. Options include:
means trt / t; /* Least Significant Difference */ means trt / lsd; /* Least Significant Difference */ means trt / bon; /* Bonferroni */ means trt / snk; /* Student-Newman-Keuls */ means trt / lsd alpha=.05; /* LSD at level 5% (default) */ means trt / lsd lines; /* force ordering of means */ means trt / lsd cldiff; /* force pairwise tests of means */The lines to means option is default when data are balanced. The cldiff option can be useful at times, but it only gives differences CI for the differences, not the means themselves. None of these options works when looking at 2-way combinations such as means fert*var;.
If you want to save predicted values or residuals, or to evaluate contrasts, you must use proc glm instead of proc anova. See below.
proc glm; /* analysis of covariance */ class trt; /* trt = factor, x = covariate */ model y = x trt; proc glm; /* analysis of covariance */ class trt; /* with different slopes */ model y = x trt x*trt;More advanced use of ANCOVA can be found in the section on Multiple Responses.
proc glm; /* simple linear regression */ model y = x / solution; proc glm; /* weighted linear regression */ model y = x / solution; weight w; proc glm; /* multiple regression */ model y = x1 x2 x3 / solution; proc glm; /* one-way analysis of variance */ class trt; model y = trt; proc glm; /* additive two-factor anova */ class fert var; model y = fert var; proc glm; /* full two-factor anova */ class fert var; model y = fert | var; proc glm; /* analysis of covariance */ class trt; /* trt = factor, x = covariate */ model y = x trt; data testlin; set resps; x = level; proc glm; /* test for non-linearity */ class level; resp = x level;The class phrase works like in proc anova. However, here we can have both categorical (identified in class) and continuous variables in the model. The model phrase indicates which variables are response (y) and which are predictors (x, or x1,x2,x3). You won't get parameter estimates (solution) if there is a class phrase unless you ask for them. Here are some options:
model y = trt x / solution; /* print parameter estimates and SEs */ model y = x / noint; /* no intercept (as in proc reg) */ model y = x / ss1; /* print only type I sums of squares */ model y = x / ss2; /* print only type II sums of squares */ model y = x / p; /* print predicted values and residuals */ model y = x / clm; /* option p plus 95% CI for estimated mean */ model y = x / cli; /* option p plus 95% CI for predicted value */ model y = x / cli alpha=.01; /* only .01, .05 and .10 available */The default way of estimating model parameters in SAS is to set the last group estimate to 0. Thus if there are 3 treatment groups, the estimated mean for group 1 is the intercept plus the estimate for trt=1; for group 2 it is similar; for group 3, the estimated mean for group 3 is the intercept since the estimate for trt=3 is 0. This can be changed by another option.
The means phrase works much the same in proc glm as in proc anova. Contrasts can be set up if means aren't enough. Here is an example from the glue data. The contrast phrase contains a quoted title, variable name and the contrast coefficient values. Note that the order of factor levels is lexicographic, which may not be what you expect. This can be checked by examining the order under the solution option to the model phrase. Further, these can get very complicated for higher order designs. Consult a book for further help.
contrast 'A vs. rest' glue 1 -.25 -.25 -.25 -.25; contrast 'BD vs. CE' glue 0 .5 -.5 .5 -.5;Predicted and residual (and other) values can be passed to other procedures and data steps using the output phrase in the same manner as proc reg.
Source Type I SS Type II SS Type III or IV SS A SS(A|u) SS(A|u,B) SS(A|B,AB) B SS(B|u,A) SS(B|u,A) SS(B|A,AB) A*B SS(A*B|u,A,B) SS(A*B|u,A,B) SS(AB|A,B)
Type II approach is appropriate for model building, and is the natural choice for regression.
Type III and Type IV tests differ only if the design has empty cells. SAS automatically gives you Types I and III with proc glm. You can explicitly choose types with options to the model phrase:
proc glm; class a b; model y = a b a*b / ss1 ss2 ss3 ss4; /* select all 4 types */
proc glm; class a b; model y = a | b / e; /* general form of estimable functions */ proc glm; class a b; /* estimable function coefficients */ model y = a | b / e1 e2 e3; /* for Types I, II, III */
proc stepwise; model y = x1 x2 x3;Here are model options for the means of selection and elimination:
model y = x1 x2 x3 / forward; /* forward selection */ model y = x1 x2 x3 / backward; /* backward elimination */ model y = x1 x2 x3 / stepwise; /* forward in & backward out */ model y = x1 x2 x3 / maxr stop=4; /* like stepwise, but using R^2 */The cheapest methods are backward (or b) and forward (or f). The stepwise option (the default) is not much more costly, and a good idea in practice, as it checks back and forth. The maxr option is much more expensive, but does consider pairs of variables in ways possibly missed by stepwise; the stop=4 option to maxr only considers models with 4 or fewer variables, at considerable time savings. There is an alternative to maxr (called minr) which is even more costly.
model y = x1 x2 x3 / noint; /* no intercept */ model y = x1 x2 x3 / slentry=0.5; /* signif. level for selection */ model y = x1 x2 x3 / slstay=0.1; /* signif. level for elimination */ model y = x1 x2 x3 / include=2; /* force in first 2 variables */ model y = x1 x2 x3 / start=2; /* start with 2 variables */ model y = x1 x2 x3 / details; /* more details of R^2, F stats */The significance levels slentry (or sle) and slstay (or sls) shown are the default ones (but sl=0.15 is used for selsection and elimination with stepwise option). The include option is useful if you want to force certain variables to always be in the model. The start option indicates how many must be in the model before elimination is considered (stepwise and maxr only).
Last modified: Mon Jun 19 14:23:40 1995 by Brian Yandell Wed Mar 22 11:26:59 1995 by Stat Www (statwww@stat.wisc.edu)