Practical Data Analysis for Designed Experiments

Brian S. Yandell (1977) Chapman & Hall, London
A | B | C | D | E | F | G | H | I
A. Placing Data in Context

1. Practical Data Analysis: 1.1 Effect of Factors; 1.2 Nature of Data; 1.3 Summary Tables; 1.4 Plots for Statistics; 1.5 Computing; 1.6 Interpretation
2. Collaboration in Science: 2.1 Asking Questions; 2.2 Learning from Plots; 2.3 Mechanics of Consulting Session; 2.4 Philosophy & Ethics; 2.5 Intelligence, Culture & Learning; 2.6 Writing
3. Experimental Design: 3.1 Types of Studies; 3.2 Designed Experiments; 3.3 Design Structure; 3.4 Treatment Structure; 3.5 Designs in This Book
1. Practical Data Analysis
practical data analysis (pda)
	data in context of scientific experiment
	Chatfield: initial data analysis -- tables & graphs
	Tukey: exploratory data analysis
	confirmatory data analysis
	human judgement
	interpretation in terms of original problem
key questions

1.1 Effect of Factors
	factors have levels; factor combinations as cells
	analysis of variance (ANOVA)
	main effects & interaction
	
	word model / math symbols / computer language

1.2 Nature of Data
	quality & structure
		garbage in, garbage out (gigo)
	mechanics of manipulation
		store, transfer, handle
		very large data sets
	analysis & display
	description & inference
	data mining
		dangers of fishing
		new views on very large problems

1.3 Summary Tables
	table of means
		order by mean values, not alphabetical
		significant digits
		avoid repetition
		cross-tables for two or more factors
		plots for moderate to large number of levels
	anova table
		needed? put in appendix?
		key results in text (p-values)

1.4 Plots for Statistics
	annotation
		use plot symbols, circles & arrows
		identify unusual points
		label axes & subject matter
		show central tendancy & variation
	compromise
		crammed with important details
		easy to absorb & grasp
	plots of relationships guide analysis
		crystalize questions
		highlight design issues
		sketch vs. publication quality
	single group or side-by-side groups
		histogram or dot diagram
		stem-and-leaf diagram
		survival curve or cumulative distribution
		boxplot
		eschew bargraphs & piecharts
	multiple factors
		interaction plots
		scatter plots
			response vs. covariate or group mean
			residual plot: vs. predicted or covariate
		use plot symbols for factor levels!
		invent symbols for factor combinations (cells)
		care with unbalanced designs
	nested designs
		care separating & identifying sources of variation
		blocking & subsampling
		split plot design -- key features of nesting
		repeated measures -- correlation over levels (time)

1.5 Computing
	primary tools suggested in this course
		SAS
			industry and government standard
			handles complicated designs well
			large staff of statisticians
			local expertise
			tends to be used in "batch" mode
		S-Plus
			becoming industry standard
			excellent interactive functions & graphics
			easily extensible with functions
			intelligent data structures
			on your own for more complicated designs
	others
		whatever works (Minitab, SPSS, Systat, ...)
		know in detail what it does
		strengths & weaknesses
		accuracy & accessibility
		fancy graphics does not imply correct calculations
	complement computing tools
		exploratory vs. presentation graphics
		complicated analyses
		ease of transfer to written report
	dynamic graphics
		interactive adjustment of plot features
	Internet
		StatLib -- http://lib.stat.cmu.edu/
		NetLib -- ftp://netlib.att.com/netlib/master/readme.html
		http://www.stat.wisc.edu/
		interactive Internet resources

1.6 Interpretation
	inference: sampled vs. target population
	comparing distribtions
	means & variances may differ
	assumptions: how important are they?
	models vs. reality
		curve fitting to match data in hand
		mechanistic model to match process under study
		Box: "all models are wrong, but some models are useful"

2. Collaboration in Science
communication takes practice
	applied statistician -- building career in collaborative consulting
	lab or field scientist -- organizing thoughts before & during research
environment for healthy collaboration
	embark on knowledge discovery process
	convey concepts in simple, accessible language
	neutral, comfortable climate for listening
consulting as a series of interviews
	initial grasp of experiment & key questions
	later elaboration of specific aspects of design & analysis

2.1 Asking Questions
	general -> specific -> general
	start with background of experiment
	avoid blunt questions & jargon
	ask neutral questions
	rephrase material to check comprehension
	anything else?

2.2 Learning from Plots
	initial plots
		physical layout of experiment
		raw sketches -- scatter plots & tables
		augment plots with symbols & comments
		order factor levels by mean values
	model fit & check
		start with simple models using well-behaved subsets
		subdivide when suggested by analysis (interactions)
		overlay model on data
		include precision; identify sources of variation
		use plots to check assumptions & identify outliers
	interpretation & presentation
		keep audience in mind
		stick to a few self-contained figures
		annotate to highlight results & key features

2.3 Mechanics of Consulting Session
	many activities at once
		organization of time & responsibilities
		science of research problem
		interpersonal dynamics
	beginning
		build mutual respect
		importance of opening climate
		set clear agenda & time frame
		establish levels of expertise
	middle
		goals, scientific issues
		statistical approach
			start simple with plots
			build complexity at comfortable pace
			keep technical level appropriate to problem
			always have goals in mind
	ending
		review progress
		outline future tasks
		reevaluate time frame & goals as necessary

2.4 Philosophy & Ethics
	articles
		philosophy of consulting
		training of statisticians for consulting
		history of statistics & science
	science does not always move forward
	statistician as disinterested party
	statistician's role in ethical misconduct
		error/oversight vs. misuse/fraud
		ethical guidelines & avenues for help
	
2.5 Intelligence, Culture & Learning
	learning process & concept of intelligence
		Herrmann: complementary thinking processes
			cerebral/limbic - left/right
		Gardner: seven intelligences
			linguistic, musical, logical/mathematical,
			spatial, bodily/kinesthetic,
			intra-personal, inter-personal
		Markova: perceptual channels
			visual, auditory, kinesthetic
			front/middle/back channels
	statistical consultant as anthropologist

2.6 Writing
	science writing
		protocols of materials & methods
		articulate key questions & goals
		lay out experimental design
		plan strategy for analysis
		visualize data as sketched plots
		notes before, during & after consulting sessions
		keep in mind how to communicate with peers
	sample report outline
		title page (informative title / name / date),
		abstract / summary (half-page / condensed / specific results),
		introduction (overview / big picture, literature ),
		experimental design / materials & methods / data description,
		results (plots / tables / plain reporting),
		conclusions (interpretation / cautions / future work ),
		references (full citations of work referred to in report),
		appendix (brief! needed?)
	writing guides
		Strunk & White: elements of style
		Gower: classic writing ideas
		Goldberg: creative writing
		Higham: handbook of writing for math sciences

3. Experimental Design
data analysis drives experimental design drives data analysis

3.1 Types of Studies
	pure observational study (natural history)
	sample survey
	designed experiment
		protocol established ahead
		scientist controls key aspects
	biostatistics
		prospective study
		retrospective study
		clinical trial
		
3.2 Designed Experiments
	factor & levels, groups
	what is the experimental unit (EU)?
	factor combination as cell
	factor combination as group
	designed experiment
		key questions drive experiment
		treatment structure: factor levels under study
		design structure: restrictions on randomization
		assumptions, goals for inference

3.3 Design Structure
	must be understood for proper analysis
	replication
		increase precision (central limit theorem)
		smooth over odd situations (outliers)
		pseudo-replication, repeated measures
	randomization
		sample EUs drawn from one population of interest
			randomly assign factor levels to EU (drug)
		samples drawn from several populations
			random sample of EUs from population (gender)
		same analysis, different inference / interpretation
		randomize over extraneous factors, trends, etc.
	examples
		one factor
			subsampling or pseudoreplication
			completely randomized design (CRD)
			randomized comple block design (RCBD)
		two factor
			strip plot, CRD, split plot

3.4 Treatment Structure
	one-factor (one-way layout)
	two-factors (two-way layout)
	factorial arrangements
	fractional factorial arrangement (stat 424)

3.5 Designs in This Book
	B: groups, one factor
	1,2,3 factors
		C: balanced designs
		D: unbalanced / missing cell
	E: assumptions
		residual & diagnostics / unequal variances
		transformations / distribution-free methods
	F: covariates
	G: random / fixed / mixed effects
	H: nested designs
		blocking / subsampling
		split plot, strip plot
	I: correlated measurements (over time, space)
		repeated measures
		cross-over designs
Last modified: Tue Feb 17 08:47:29 1998 by Brian Yandell (yandell@stat.wisc.edu)