Brian S. Yandell (1977)
Chapman & Hall, London
A | B | C | D | E | F | G | H | I
C. Sorting out Effects with Data
- 7. Factorial Designs
- 7.1 Cell Means Models
- 7.2 Effects Models
- 7.3 Estimable Functions
- 7.4 Linear Constraints
- 7.5 General Form of Estimable Functions
- 8. Balanced Experiments
- 8.1 Additive Models
- 8.2 Full Models with Two Factors
- 8.3 Interaction Plots
- 8.4 Higher Order Models
- 9. Model Selection
- 9.1 Pooling Interactions
- 9.2 Selecting the "Best" Model
- 9.3 Model Selection Criteria
- 9.4 One Observation per Cell
- 9.5 Tukey's Test for Interaction
response = group mean + random error
one-factor and two-factor means models
estimable means
at least one observation per group
unique unbiased estimator
linear combination of responses
linear comb of estimables is estimable
one-factor effects model
response = reference + group effect + random error
group effect = group mean - reference
reference is arbitrary
overall (population grand) mean
intercept (SAS)
not estimable
two factor effects model
population cell & marginal means -- no data yet
cell means are estimable provided cell is not empty
may combine multiple factors into one
additive effects model
functions of parameters which do not depend on
particular solution to normal equations
normal equations for effects model
one factor & two factors
matrix form -- overspecified model
linear contrasts
main effects contrasts
pure interaction contrasts
simplification in additive model
sum-to-zero linear constraints
reference = population grand mean
group effect = deviation from grand mean
matrix form
set-to-zero linear constraints
reference = last group mean
group effect = deviation from last group mean
matrix form
(particular) solutions of normal equations
estimable functions in terms of constraints
L-notation as in SAS (Littell et al 1991)
overspecied model
relations among columns <-> among L's
substituting for redundant L's
set-to-zero constraints
sum-to-zero constraints
one- & two-factor effects models
show GFEF has unique solution of normal equations (one factor)
response = reference + factor A + factor B + error
without replication and with balanced replication
model equation & null hypotheses
partition of sum of squares
expected mean squares & F-statistics
relation of marginal means to model & estimators
cell means model & effects model
estimates of cell means & marginal means
standard errors
main effects & interaction hypotheses
partition of total sum of squares
expected sum of squares
F-statistics & non-centrality parameters
two-factor anova table
interaction plot
plot levels of factor A against cell means
connect levels of factor B by lines
label levels of both factors accordingly
try switch A & B for better clarity
order levels by marginal mean?
add SE or LSD bar to help interpretation
parallel lines or curves
constant separation across levels of factor A
parallel if no interaction
unequal separation vs. crossing lines
margin plots
use marginal means along horizontal axis
include identity line for reference
straight lines = Tukey interaction (see 9.4)
parallel straight lines = no interaction
three-factor interaction
separate plots by levels of factor C
switch roles of A,B,C for clarity
or combine two factors on one plot
more lines or more horizontal levels
plots to examine sieze of effects
half-normal plot
factors all at two levels
significant effects deviate from identity line
effect plot
effect = deviation used in MS calculation
effects rescaled for mean square by df
plot one point for each level
main effect -- label by level
interactions
residuals
spread (SD) relative to residual
indicates size of effect
can identify cells that contribute
cell means & effects models
estimates & partition of sums of squares
three-factor anova table
3-factor interaction / interpretation
two or more 2-factor interactions
interaction plots
separate plots by level of third factor
possibly averaged over third factor
again, switch roles to find best view
parsimonious model
balance bias & over-fit
bias -- miss key features
over-fit -- high variabilty in paramter estimates
hierarchy of factorial models
usually keep main effects if interaction significant
testing nested models
formal F tests & other statistics
comparing non-nested models
decision paths for two-factor models
pragmatic consideration of full & additive model
report results honestly
decision paths for three-factor additive model
18 hierarchical models from which to choose
suggested method of analysis for full model
if 3-factor interaction is significant
separately analyze 2-factor models
by level of third factor
if no 3-factor interaction
easy if only one 2-factor interaction
analyze several ways of more than one
separate analyses as above
how to move among models?
forward selection
add terms one at a time
begin with nothing or a few terms
danger of biased model -- too simple
backward elimination
drop one at a time from full model
danger of bloated model
rule of 2 for pooling interactions
sweep down from main effects
only examine lower terms if large
simplifies hierarchy for interpretation
what if different approaches differ?
look further
look ahead more than one step
be skeptical -- take broad view
automated tools
useful but can be limited
designed for regression, not factors
consider important contrasts
plots
half-normal plots when 2 levels per factor
effect plots
selected interaction plots
based on full model fit?
test statistic vs. model df (=p)
especially Mallow's C(p)
F-test
careful of multiple testing issues
explained variation R^2 (adjusted for p)
heuristic guide
unadjusted always increases as model grows
but how fast does it increase?
mean squared error
does it change dramatically among models?
Mallow's C(p)
C(p) > p indicates `large' model bias
C(p) = p if model bias eliminated
pick smallest such p to avoid overfit
sensitive to estimate of variance
tricky if no or few df error
initial artful choice of reduced model
effects model with no replication
how to simplify interaction -- fewer df
Tukey interaction model
interaction plots / margin plots
formal test (under null additive model)
Mandel interaction model
Last modified: Tue Feb 17 08:47:45 1998 by Brian Yandell
(yandell@stat.wisc.edu)