Brian S. Yandell (1977)
Chapman & Hall, London
A | B | C | D | E | F | G | H | I
A. Placing Data in Context
- 1. Practical Data Analysis
- 1.1 Effect of Factors
- 1.2 Nature of Data
- 1.3 Summary Tables
- 1.4 Plots for Statistics
- 1.5 Computing
- 1.6 Interpretation
- 2. Collaboration in Science
- 2.1 Asking Questions
- 2.2 Learning from Plots
- 2.3 Mechanics of Consulting Session
- 2.4 Philosophy & Ethics
- 2.5 Intelligence, Culture & Learning
- 2.6 Writing
- 3. Experimental Design
- 3.1 Types of Studies
- 3.2 Designed Experiments
- 3.3 Design Structure
- 3.4 Treatment Structure
- 3.5 Designs in This Book
practical data analysis (pda)
data in context of scientific experiment
Chatfield: initial data analysis -- tables & graphs
Tukey: exploratory data analysis
confirmatory data analysis
human judgement
interpretation in terms of original problem
key questions
factors have levels; factor combinations as cells
analysis of variance (ANOVA)
main effects & interaction
word model / math symbols / computer language
quality & structure
garbage in, garbage out (gigo)
mechanics of manipulation
store, transfer, handle
very large data sets
analysis & display
description & inference
data mining
dangers of fishing
new views on very large problems
table of means
order by mean values, not alphabetical
significant digits
avoid repetition
cross-tables for two or more factors
plots for moderate to large number of levels
anova table
needed? put in appendix?
key results in text (p-values)
annotation
use plot symbols, circles & arrows
identify unusual points
label axes & subject matter
show central tendancy & variation
compromise
crammed with important details
easy to absorb & grasp
plots of relationships guide analysis
crystalize questions
highlight design issues
sketch vs. publication quality
single group or side-by-side groups
histogram or dot diagram
stem-and-leaf diagram
survival curve or cumulative distribution
boxplot
eschew bargraphs & piecharts
multiple factors
interaction plots
scatter plots
response vs. covariate or group mean
residual plot: vs. predicted or covariate
use plot symbols for factor levels!
invent symbols for factor combinations (cells)
care with unbalanced designs
nested designs
care separating & identifying sources of variation
blocking & subsampling
split plot design -- key features of nesting
repeated measures -- correlation over levels (time)
primary tools suggested in this course
SAS
industry and government standard
handles complicated designs well
large staff of statisticians
local expertise
tends to be used in "batch" mode
S-Plus
becoming industry standard
excellent interactive functions & graphics
easily extensible with functions
intelligent data structures
on your own for more complicated designs
others
whatever works (Minitab, SPSS, Systat, ...)
know in detail what it does
strengths & weaknesses
accuracy & accessibility
fancy graphics does not imply correct calculations
complement computing tools
exploratory vs. presentation graphics
complicated analyses
ease of transfer to written report
dynamic graphics
interactive adjustment of plot features
Internet
StatLib -- http://lib.stat.cmu.edu/
NetLib -- ftp://netlib.att.com/netlib/master/readme.html
http://www.stat.wisc.edu/
interactive Internet resources
inference: sampled vs. target population
comparing distribtions
means & variances may differ
assumptions: how important are they?
models vs. reality
curve fitting to match data in hand
mechanistic model to match process under study
Box: "all models are wrong, but some models are useful"
communication takes practice
applied statistician -- building career in collaborative consulting
lab or field scientist -- organizing thoughts before & during research
environment for healthy collaboration
embark on knowledge discovery process
convey concepts in simple, accessible language
neutral, comfortable climate for listening
consulting as a series of interviews
initial grasp of experiment & key questions
later elaboration of specific aspects of design & analysis
general -> specific -> general
start with background of experiment
avoid blunt questions & jargon
ask neutral questions
rephrase material to check comprehension
anything else?
initial plots
physical layout of experiment
raw sketches -- scatter plots & tables
augment plots with symbols & comments
order factor levels by mean values
model fit & check
start with simple models using well-behaved subsets
subdivide when suggested by analysis (interactions)
overlay model on data
include precision; identify sources of variation
use plots to check assumptions & identify outliers
interpretation & presentation
keep audience in mind
stick to a few self-contained figures
annotate to highlight results & key features
many activities at once
organization of time & responsibilities
science of research problem
interpersonal dynamics
beginning
build mutual respect
importance of opening climate
set clear agenda & time frame
establish levels of expertise
middle
goals, scientific issues
statistical approach
start simple with plots
build complexity at comfortable pace
keep technical level appropriate to problem
always have goals in mind
ending
review progress
outline future tasks
reevaluate time frame & goals as necessary
articles
philosophy of consulting
training of statisticians for consulting
history of statistics & science
science does not always move forward
statistician as disinterested party
statistician's role in ethical misconduct
error/oversight vs. misuse/fraud
ethical guidelines & avenues for help
learning process & concept of intelligence
Herrmann: complementary thinking processes
cerebral/limbic - left/right
Gardner: seven intelligences
linguistic, musical, logical/mathematical,
spatial, bodily/kinesthetic,
intra-personal, inter-personal
Markova: perceptual channels
visual, auditory, kinesthetic
front/middle/back channels
statistical consultant as anthropologist
science writing
protocols of materials & methods
articulate key questions & goals
lay out experimental design
plan strategy for analysis
visualize data as sketched plots
notes before, during & after consulting sessions
keep in mind how to communicate with peers
sample report outline
title page (informative title / name / date),
abstract / summary (half-page / condensed / specific results),
introduction (overview / big picture, literature ),
experimental design / materials & methods / data description,
results (plots / tables / plain reporting),
conclusions (interpretation / cautions / future work ),
references (full citations of work referred to in report),
appendix (brief! needed?)
writing guides
Strunk & White: elements of style
Gower: classic writing ideas
Goldberg: creative writing
Higham: handbook of writing for math sciences
data analysis drives experimental design drives data analysis
pure observational study (natural history)
sample survey
designed experiment
protocol established ahead
scientist controls key aspects
biostatistics
prospective study
retrospective study
clinical trial
factor & levels, groups
what is the experimental unit (EU)?
factor combination as cell
factor combination as group
designed experiment
key questions drive experiment
treatment structure: factor levels under study
design structure: restrictions on randomization
assumptions, goals for inference
must be understood for proper analysis
replication
increase precision (central limit theorem)
smooth over odd situations (outliers)
pseudo-replication, repeated measures
randomization
sample EUs drawn from one population of interest
randomly assign factor levels to EU (drug)
samples drawn from several populations
random sample of EUs from population (gender)
same analysis, different inference / interpretation
randomize over extraneous factors, trends, etc.
examples
one factor
subsampling or pseudoreplication
completely randomized design (CRD)
randomized comple block design (RCBD)
two factor
strip plot, CRD, split plot
one-factor (one-way layout)
two-factors (two-way layout)
factorial arrangements
fractional factorial arrangement (stat 424)
B: groups, one factor
1,2,3 factors
C: balanced designs
D: unbalanced / missing cell
E: assumptions
residual & diagnostics / unequal variances
transformations / distribution-free methods
F: covariates
G: random / fixed / mixed effects
H: nested designs
blocking / subsampling
split plot, strip plot
I: correlated measurements (over time, space)
repeated measures
cross-over designs
Last modified: Tue Feb 17 08:47:29 1998 by Brian Yandell
(yandell@stat.wisc.edu)