proc anova; /* randomized complete block */ class block trt; model y = block trt; proc anova; /* three-way anova with all interactions */ class a b c; model y = a b c a*b a*c b*c a*b*c; proc anova; /* three-way anova--short hand */ class a b c; model y = a | b | c; /* use a vertical bar between factors a,b,c */
proc anova; /* nested anova for balanced design */ class block var fert; model y = fert block(fert) var var*fert; test h = fert e = block(fert); proc glm; /* nested anova -- more general form */ class block var fert; model y = fert block(fert) var var*fert; test h = fert e = block(fert); random block(fert) / test; lsmeans var var*fert / stderr pdiff; lsmeans fert / stderr pdiff e = block(fert);The proc glm form almost works if there are missing data. However, the lsmeans for the whole-plot factor (fert) does not work properly; the standard errors of estimates are usually wrong for nested designs, and some combinations are deemed not-estimable when there is missing data. The random phrase with the test option uses the Satterthwaite approximation, which is somewhat better than the test phrase, as it matches expected mean squares.
The newer proc mixed (version 6.0.9) handles nested designs in a natural way, fixing many of the shortcomings of the proc glm approach. Here is the setup for the split plot used above.
proc mixed; class block var fert; model yield = fert var var*fert / outpm = blue; random block block*fert / solution; lsmeans fert | var; ods output SolutionR = random; proc sort data = blue; by fert block; proc plot data = blue; by fert block; plot Resid*Pred = var;The preferred way to test fixed effects is with the anova tests that come naturally with proc mixed. However, inference for random effects should be done by comparing likelihood ratios with and without the variance component of interest. This involves running proc mixed twice. ThHere is a SAS macro called compmix that can assist in this process.
The above code includes a residual plot of the predicted (estimated fixed effects) against the residual (random effects). If you want to examine predictors of random effects (BLUPs) or separate residuals based on the size of experimental unit, here is some code that might help using proc mixed. If the equation is
y = Xb + Zu + ethen Pred in dataset blue estimates Xb, Pred in dataset blup predicts Xb+Zu, and Resid in each of these is y-Pred. The dataset random has Estimate equal to the predictors of u. See the Syntax and Linear Models Theory for proc mixed for further details. Here is the code:
proc mixed; class block var fert; model yield = fert var var*fert / outpp = blup outpm = blue; random block block*fert / solution; ods output SolutionR = random; /* Output data set random contains the random effects */ proc print data = random; /* sort out whole plot and split plot residuals */ data blup; set blup; /* best linear unbiased predictors */ blup = pred; spresid = resid; /* split plot residual */ drop Pred L95 U95 Resid; data blue; set blue; /* best linear unbiased estimators */ blue = pred; /* estimate of fixed effects */ wpresid = resid; drop Pred L95 U95 Resid; proc sort data = blup; by fert block var; proc sort data = blue; by fert block var; data both; merge blue blup; by fert block var; wpresid = wpresid - spresid; /* whole plot residual */ proc sort data = both; by fert block; proc plot data = both; by fert block; plot spresid*wpresid = var;
proc reg; /* fit multiple responses in regression */ model y1 y2 y3 = x;Sometimes multiple responses are correlated, and it is desireable to examine their relationship to each other adjusted by the experimental design. Alternatively, one may simply want to identify the "best" response, or linear combination of responses, which shows off treatment differences. Several related approaches and procedures are available. Note that these multivariate methods may be less powerfull than univariate analyses if responses are not correlated. Further, there is no guarantee that multivariate analysis leads to ready, intuitive interpretation. Multivariate ANOVA, or MANOVA, includes several related methods for examining these questions. It is invoked by a manova phrase with either proc anova or proc glm.
proc glm; /* multivariate analysis of variance */ class a; model y1 y2 y3 = a; manova h = a / printe printh; /* see below for printe, printh */The manova phrase results in several overall multivariate tests (Wilk's lambda, Pillai's trace, Hotelling-Lawley trace and Roy's greatest root) which, in different ways, reduce the responses on the left of the model to one dimension. The printe option produces the error covariance matrix (almost -- need to divide by degrees of freedom!) and the matrix of partial correlations among the responses. The printh option yields the hypothesis SS matrix for the factor (a).
Discriminant Analysis (DA) is a closely related technique which allows one to identify linear combinations of responses that do the "best" in discriminating among groups. The goal is to reduce "dimensionality" by either focusing on only a few responses (stepwise) or finding a few linear combinations (canonical). Often these two approaches are used in tandem.
Stepwise DA uses proc stepdisc to pick the most significant responses, in order, that explain group differences. It can also be performed using a sequence of analyses of covariance (ANCOVA) via proc glm and checking factors adjusted for covariates; however this produces a lot more output!.
proc stepdisc data=iris bsscp tsscp; /* Stepwise DA */ class species; var seplen sepwid petlen petwid; proc glm; /* Stepwise DA using PROC GLM (i.e. ANCOVA) */ class species; model seplen sepwid petlen petwid = species / ss1; proc glm; /* check species adjusted for petlen */ class species; model seplen sepwid petwid = petlen species / ss1; proc glm; /* check species adjusted for petlen+sepwid */ class species; model petwid = petlen sepwid species / ss1; proc glm; /* check species adjusted for 3 covariates */ class species; model seplen = petlen sepwid petwid species / ss1;Canonical DA can use all multiple responses, or a subset determined by Stewise DA. The eigenvalues, in decreasing value, indicate the ratio of SS_H to SS_E for each linear combination of responses. Typically, one only examines the significant canonical variates, as assessed by an F-test. Canonical variate correlations (called Total Canonical Structure) are a good method to assess the "importance" of responses by noting how strongly they are correlated with canonical variates. The weights (called Raw Canonical Coefficients) are poor indicators of importance as they are greatly influenced by correlation among responses.
proc candisc out=can distance anova; /* canonical DA */ class species; var seplen sepwid petlen petwid; proc plot data=can; /* plot of first two canonical variates */ plot can2*can1=species;Step-Down Analysis assumes that the scientist has a predetermined idea of cause and effect to guide analysis of group differences. That is, groups may affect y1, which in turn affects y2 and so on. This can be analysed using a sequence of ANCOVA via proc glm and checking factors adjusted for covariates, much as in Stepwise DA above. The difference here is that the order of entry of responses is specified in advance by the scientist (based on knowledge from other experiments) rather than ordering based on data analysis.
proc glm; /* Step-Down Analysis */ class a; model y1 = a / ss1; proc glm; class a; model y2 = y1 a / ss1; /* check factor a adjusted for y1 */ proc glm; class a; model y3 = y1 y2 a / ss1;
proc anova; /* repeated measures with polynomials */ class trt; model t1 t2 t3 t4 = trt; means trt; repeated time 4 (1 2 3 4) polynomial / summary; proc glm; /* repeated measures with profile */ class trt; model t1 t2 t3 t4 = trt; means trt; repeated time 4 (1 2 3 4) profile / summary;The repeated phrase has several components: variable name (time), number of levels (4), values of levels (1,2,3,4), types of curves over time (polynomial or profile). The option summary insures that the polynomial or profile analyses are printed.
Several overall multivariate tests (Wilk's lambda, Pillai's trace, Hotelling-Lawley trace and Roy's greatest root) reduce, in different ways, the responses over time to one dimension. These tests are the same ones considered for the manova phrase discussed above.
Then new proc mixed (version 6.0.9) handles repeated measures in a very natural way. It is newer, more general and can handle missing data. However, its options and setup are different from the above. Best to check out (Technical Report P-229) for full details. Here are three ways to fit the compound symmetry model (with repeated measures over time on each unit):
proc mixed; /* compound symmetry */ class trt unit; model y = trt | time / solution; repeated / type = cs subject = unit; lsmean trt | time; proc mixed; /* compound symmetry */ class trt unit; /* note change in DDF */ model y = trt | time / solution; random unit / solution; proc mixed; /* compound symmetry */ class trt unit; /* no ratio estimate */ model y = trt | time; repeated intercept diag / subject = unit;The solution option to the model (or random) phrase prints the solution for the fixed (or random) effects. The lsmeans phrase computes correct estimates and standard errors, but does not have the same options as found for proc glm. Note in addition the different form of the repeated phrase from that used in proc anova and proc glm. Notice also the difference between treating time as fixed (repeated phrase used) or random (random phrase used).
You can also fit autoregressive models of order p (type=ar(p)) or order n (type=toep). Now there are a wide variety of other options as well.
proc mixed; /* auto-regression(1) */ class trt unit; model y = trt | time; random intercept time / type = ar(1) subject = unit;Here are two ways to fit a multivariate model with unstructured covariance matrix. The first treats effects over time as random, while the second treates time as fixed.
proc mixed; /* unstructured (multivariate) */ class trt unit; /* random coefficients */ model y = trt | time; random intercept time / type = un subject = unit; proc mixed; /* unstructured (multivariate) */ class trt unit; /* fixed coefficients */ model y = trt | time; repeated / hlps hlm type = un subject = unit;The multivariate statistic of Hotelling-Lawley-McKeon (hlm) or Hotelling-Lawley-Pillai-Samson (hlps) is availabe with the repeated phrase when type = un is specified. Both are superior to the F statistic for small samples; hlm appears to be slightly preferred Other MV test (Roy's greatest root, Wilk's lambda, Pillai trace) are apparently not available.
proc prinqual n=2 out=results replace standard scores correlations; id row; transform monotone(col1-col20); proc sort data=results; by prin1; proc print; var row _type_ prin1 prin2;
proc freq; /* one entry for each subject */ tables sex; /* one-way table */ proc freq; /* one entry for each cell */ tables infilt*score; /* two-way table */ weight count; /* cell weighted as "count" subjects */Options, as for other procedure, can be combined. The first set request additional or reduced information by cell:
tables infilt*score / expected; /* expected values */ tables infilt*score / deviation; /* observed - expected */ tables infilt*score / cellchi2; /* contribution to chi-square */ tables infilt*score / nocol; /* no column percentage */ tables infilt*score / norow; /* no row percentage */ tables infilt*score / nopercent; /* no percentages */ tables infilt*score / nofreq; /* no frequencies */ tables infilt*score / missprint; /* show missing value freqs */ tables infilt*score / sparse; /* condense for sparse table */The following options specify statistical tests:
tables infilt*score / chisq; /* chisquare homogeneity test */ tables infilt*score / alpha=.05 chisq; /* chisquare at level .05 */ tables infilt*score / cmh; /* tests for association */ tables infilt*score / exact; /* Fisher's exact test */ tables infilt*score / measures; /* measures of association */ tables infilt*score / all; /* all of the above */
proc catmod; direct score; response means; model infilt = score; weight count;The direct phrase states that score is a regression type predictor. The response phrases sets up the linear regression response.
The second example performs logistic regression of y (=1 for response, =0 for no response) on dose:
proc catmod; weight count; direct dose; response clogits; model y = dose / freq ml nogls;It is possible to save predicted values using an out= phrase; however, the setup is rather different than for glm or reg. Again, it is best to consult the User's Guide and/or get consulting help.
gee.sas test program geemac.sas GEE IML macro gee1.dat test data gee.doc documentation for Gee macro (including test output)Contact Karim or Zeger if you want to obtain this. (It may be at StatLib)
Last modified: Thu Apr 2 09:07:24 1998 by Brian Yandell Wed Mar 22 11:21:45 1995 by Stat Www (statwww@stat.wisc.edu)