Practical Data Analysis for Designed Experiments

Brian S. Yandell (1977) Chapman & Hall, London
A | B | C | D | E | F | G | H | I

B. Working with Groups of Data

4. Comparison of Groups
4.1 Graphical Summaries
4.2 Estimates of Means and Variance
4.3 Assumptions and Pivot Statistics
4.4 Interval Estimates of Means
4.5 Testing Hypotheses about Means
4.6 Formal Inference on the Variance
5. Comparing Several Means
5.1 Linear Contrasts of Means
5.2 Overall Test of Difference
5.3 Partitioning Sums of Squares
5.4 Expected Mean Squares
5.5 Power and Sample Size
6. Multiple Comparisons of Means
6.1 Experiment- and Comparison-Wise Error Rates
6.2 Comparisons Based on F-Tests
6.3 Comparisons Based on Range of Means
6.4 Comparison of Comparisons

4. Comparison of Groups

determine nature & extent of differences among groups

4.1 Graphical Summaries

stem-and-leaf plot histogram box-plot

4.2 Estimates of Means and Variance

balanced design -- equal sample size per group population mean, variance & SD by group math notation least squares normal equations degrees of freedom common variance -- equal across groups

4.3 Assumptions and Pivot Statistics

basic assumptions (Linear) model correct (could be nonlinear) Independent experimental units Equal variance across units & groups Normal distribution of errors (symmetry most important) distribution notation (~) pivot statistic confidence interval / hypothesis testing (usually) known & tabled distribution relies on the basic assumptions does not depend on unknown parameters critical value & alpha

4.4 Interval Estimates of Means

standard error (SE) vs. SD -- reporting significant digits & precision confidence interval +/- 2 SE as rough guide

4.5 Testing Hypotheses about Means

test of hypothesis how large is large? alpha -- risk of error experience -- substantial deviation 1- vs. 2- vs. 3-sided power to detect differences tradeoff close to expected (null) chance to find deviation significant more later with 2-sample testing classical significance level Fisher's idea vs. dogma strength of evidence p-value as random variable uniform (0,1) under null hypothesis null & alternative hypotheses include basic assumptions

4.6 Formal Inference on the Variance

chi-square variate importance of assumptions (normality) confidence interval precision & significant digits chi-square vs. normal approximation

5. Comparing Several Means

pivot statistic for simple hypothesis hypothesis tests & confidence intervals for differences using plots for tests of difference among means shrinking mean CIs -- approximate tests based on overlap notched box-plots compound hypotheses all at once / as set of simple hypotheses hint at orthogonality issue

5.1 Linear Contrasts of Means

linear combination of means contrasts sum to zero pivot statistic confidence intervals & hypothesis tests orthogonal contrasts balanced experiment -- easy check unbalanced experiment -- depends on sample sizes orthogonal polynomials recursive calculation sequential fitting Type I approach -- Gram-Schmidt

5.2 Overall Test of Difference

quadratic form & chi-square mean square for null hypothesis F-statistics & F distribution what is "large enough"?

5.3 Partitioning Sums of Squares

basic idea centered response = group mean + error sums of squares total = model + error ANOVA table F-statistic expected value & rough critical value (4) general F-test: full & reduced model

5.4 Expected Mean Squares

moments of chi-square statistic non-central chi-square non-centrality parameter replace sample means with population means shift to right & increased spread relation to moments of F-statistic

5.5 Power and Sample Size

power for composite hypothesis seldom computed power tables for F-test are complicated noncentrality in "worst case scenario" reduce to power study for t-test power of 50% at critical value balance power & size 4 SEs = 2 * critical value modest imbalance -- slight adjustment of power

6. Multiple Comparisons of Means

examine differences among means in detail compromise find significant differences comparison-wise error rate avoid reporting spurious differences as real experiment-wise error rate generic methods vs. methods focussed on specific interests all-contrast comparisons (Scheffe) all-pairs comparisons multiple comparisons with best (Hsu) multiple comparisons with control (Dunnett) abuses & misconceptions (Hsu 1996)

6.1 Experiment- and Comparison-Wise Error Rates

before (pre-planned) & after (post-hoc) comparisons conservative & liberal approaches general procedure conduct overall F-test if evidence of differences conduct pre-planned comparisons investigate post-hoc comparisons if no evidence for differences only examine pre-planned comparisons interpret results conservatively cautions overall test depends on choice of groups null hypothesis is rarely true in practice harmonic mean -- adjustment for minor imbalance multiple-stage or step-down tests compare progressively fewer means SNK, REGWF, REGWQ

6.2 Comparisons Based on F-Tests

can address arbitrary contrasts directly related to overall test of means Fisher's LSD ordinary t-tests for each comparison can be too liberal Bonferroni (BSD) crude adjustment for multiple comparisons can be too conservative if many comparisons done Scheffe' all contrasts -- equivalent to overall F-test tends to be conservative liked by statisticians, ignored by many practicioners REGWF multiple-stage improvement on Scheffe' Waller-Duncan

6.3 Comparisons Based on Range of Means

studentized range statistic known properties only for balanced experiment more powerful for simple comparison of pairs of means less powerful for general comparisons of means Tukey's honest significant difference (HSD) designed for pairwise comparisons Student-Newman-Keuls (SNK) multiple-stage test based on Tukey's HSD can be too liberal REGWQ multiple-stage improvement on SNK Duncan's multiple range more liberal than SNK higher level than alpha for comparing all k means several practioners recommend against this method but still used by selected disciplines

6.4 Comparison of Comparisons

liberal < conservative LSD < REGWF < Scheffe' LSD < Bonferroni LSD < SNK < REGWQ < Tukey Waller < Duncan < SNK Waller < LSD < SNK Duncan can be more liberal than LSD for all k means

Last modified: Tue Feb 17 08:47:37 1998 by Brian Yandell (yandell@stat.wisc.edu)