Practical Data Analysis for Designed Experiments

Brian S. Yandell (1977) Chapman & Hall, London
A | B | C | D | E | F | G | H | I
C. Sorting out Effects with Data

7. Factorial Designs: 7.1 Cell Means Models; 7.2 Effects Models; 7.3 Estimable Functions; 7.4 Linear Constraints; 7.5 General Form of Estimable Functions
8. Balanced Experiments: 8.1 Additive Models; 8.2 Full Models with Two Factors; 8.3 Interaction Plots; 8.4 Higher Order Models
9. Model Selection: 9.1 Pooling Interactions; 9.2 Selecting the "Best" Model; 9.3 Model Selection Criteria; 9.4 One Observation per Cell; 9.5 Tukey's Test for Interaction
7. Factorial Designs
7.1 Cell Means Models
	response = group mean + random error
	one-factor and two-factor means models
	estimable means
		at least one observation per group
		unique unbiased estimator
		linear combination of responses
		linear comb of estimables is estimable

7.2 Effects Models
	one-factor effects model
	response = reference + group effect + random error
	group effect = group mean - reference
	reference is arbitrary
		overall (population grand) mean
		intercept (SAS)
		not estimable		
	two factor effects model
		population cell & marginal means -- no data yet
		cell means are estimable provided cell is not empty
		may combine multiple factors into one
	additive effects model
	
7.3 Estimable Functions
	functions of parameters which do not depend on
		particular solution to normal equations
	normal equations for effects model
		one factor & two factors
	matrix form -- overspecified model
	linear contrasts
		main effects contrasts
		pure interaction contrasts
		simplification in additive model

7.4 Linear Constraints
	sum-to-zero linear constraints
		reference = population grand mean
		group effect = deviation from grand mean
		matrix form
	set-to-zero linear constraints
		reference = last group mean
		group effect = deviation from last group mean
		matrix form
	(particular) solutions of normal equations
	estimable functions in terms of constraints

7.5 General Form of Estimable Functions
	L-notation as in SAS (Littell et al 1991)
		overspecied model
		relations among columns <-> among L's
		substituting for redundant L's
	set-to-zero constraints
	sum-to-zero constraints
	one- & two-factor effects models
	show GFEF has unique solution of normal equations (one factor)

8. Balanced Experiments
8.1 Additive Models
	response = reference + factor A + factor B + error
	without replication and with balanced replication
		model equation & null hypotheses
		partition of sum of squares
		expected mean squares & F-statistics
		relation of marginal means to model & estimators

8.2 Full Models with Two Factors
	cell means model & effects model
		estimates of cell means & marginal means
		standard errors
		main effects & interaction hypotheses
	partition of total sum of squares
	expected sum of squares
	F-statistics & non-centrality parameters
	two-factor anova table

8.3 Interaction Plots
	interaction plot
		plot levels of factor A against cell means
		connect levels of factor B by lines
		label levels of both factors accordingly
		try switch A & B for better clarity
		order levels by marginal mean?
		add SE or LSD bar to help interpretation
	parallel lines or curves
		constant separation across levels of factor A
		parallel if no interaction
		unequal separation vs. crossing lines
	margin plots
		use marginal means along horizontal axis
		include identity line for reference
		straight lines = Tukey interaction (see 9.4)
		parallel straight lines = no interaction
	three-factor interaction
		separate plots by levels of factor C
		switch roles of A,B,C for clarity
		or combine two factors on one plot
			more lines or more horizontal levels
	plots to examine sieze of effects
		half-normal plot
			factors all at two levels
			significant effects deviate from identity line
		effect plot
			effect = deviation used in MS calculation
			effects rescaled for mean square by df
			plot one point for each level
				main effect -- label by level
				interactions
				residuals
			spread (SD) relative to residual
				indicates size of effect
				can identify cells that contribute

8.4 Higher Order Models
	cell means & effects models
	estimates & partition of sums of squares
	three-factor anova table
	3-factor interaction / interpretation
	two or more 2-factor interactions
	interaction plots
		separate plots by level of third factor
		possibly averaged over third factor
		again, switch roles to find best view

9. Model Selection
parsimonious model
	balance bias & over-fit
	bias -- miss key features
	over-fit -- high variabilty in paramter estimates
hierarchy of factorial models
	usually keep main effects if interaction significant
testing nested models
	formal F tests & other statistics
	comparing non-nested models

9.1 Pooling Interactions
	decision paths for two-factor models
	pragmatic consideration of full & additive model
	report results honestly

9.2 Selecting the "Best" Model
	decision paths for three-factor additive model
	18 hierarchical models from which to choose
	suggested method of analysis for full model
		if 3-factor interaction is significant
			separately analyze 2-factor models
			by level of third factor
		if no 3-factor interaction
			easy if only one 2-factor interaction
			analyze several ways of more than one
			separate analyses as above
	how to move among models?
		forward selection
			add terms one at a time
			begin with nothing or a few terms
			danger of biased model -- too simple
		backward elimination
			drop one at a time from full model
			danger of bloated model
		rule of 2 for pooling interactions
			sweep down from main effects
			only examine lower terms if large
			simplifies hierarchy for interpretation
		what if different approaches differ?
			look further
			look ahead more than one step
			be skeptical -- take broad view
		automated tools
			useful but can be limited
			designed for regression, not factors
			consider important contrasts

9.3 Model Selection Criteria
	plots
		half-normal plots when 2 levels per factor
		effect plots
		selected interaction plots
			based on full model fit?
		test statistic vs. model df (=p)
			especially Mallow's C(p)
	F-test
		careful of multiple testing issues
	explained variation R^2 (adjusted for p)
		heuristic guide
		unadjusted always increases as model grows
		but how fast does it increase?
	mean squared error
		does it change dramatically among models?
	Mallow's C(p)
		C(p) > p indicates `large' model bias
		C(p) = p if model bias eliminated
		pick smallest such p to avoid overfit
		sensitive to estimate of variance
			tricky if no or few df error
			initial artful choice of reduced model

9.4 One Observation per Cell
	effects model with no replication
	how to simplify interaction -- fewer df
	Tukey interaction model
	interaction plots / margin plots

9.5 Tukey's Test for Interaction
	formal test (under null additive model)
	Mandel interaction model
Last modified: Tue Feb 17 08:47:45 1998 by Brian Yandell (yandell@stat.wisc.edu)