January 2017

evolution of QTL models

original ideas focused on rare & costly markers
models & methods refined as technology advanced

  • single marker regression
  • QTL (quantitative trait loci)
    • single locus models: interval mapping for QTL
    • QTL model search: QTLs & epistasis
  • GWA (genome-wide association mapping)
    • adjust for population structure
    • capture "missing heritability"
    • genome-wide selection

phenotype data: flowering time

Satagopan JM, Yandell BS, Newton MA, Osborn TC (1996) Genetics

covariates

  • examples: treatment, location, age, weight, height
  • account for important design structure (location)
  • adjust for important predictors
  • reduce residual variation to increase power
    • covariate with strong effect

\(\text{H}_0: y = \mu + \beta_x x + e\)
\(\text{H}_a: y = \mu_q + \beta_x x + e\)

  • more complicated (ignored here)
    • QTL * covariate interactions
    • covariate as ratio \(y/x\)

covariate cautions

  • use care when the covariate is another phenotype
  • permutations: keep phenotype & covariates together

permtest

additive covariate

add covar

other phenotype as covariate

flowering time: QTL as covariate

QTL model search

pros & cons of multiple QTL models

  • benefits
    • reduce residual variation
    • increased power
    • separate linked QTL
    • identify interactions among QTL (epistasis)
  • shortcomings
    • only includes significant loci
    • gets complicated very quickly
    • selection bias: overestimate effects of included loci
    • many loci of small effect ignored …

epistasis in BC or DH

Epistasis BC

epistasis in F2

Epistasis F2

multiple loci models

basic model looks the same

\[ y = \mu_q + e \] but now QTL has parts: \(q = (q_1, q_2, \cdots)\) \[ \mu_q = \mu(q_1, q_2, \cdots) = \mu+q_1\beta_1+q_2\beta_2+\cdots\]

  • allows for multiple loci
  • can add epistasis (here for BC)

\[ \mu_q = \mu+\beta_1q_1+\beta_2q_2+\gamma q_1 q_2 \] - more terms for F2 …

LOD-based tests for 2 QTL

For all pairs of positions, fit the following models:

\(\texttt{H}_f: y = \mu + \beta_1 q_1 + \beta_2 q_2 + \gamma q_1 q_2 + e\)
\(\texttt{H}_a: y = \mu + \beta_1 q_1 + \beta_2 q_2 + e\)
\(\texttt{H}_1: y = \mu + \beta_1 q_1 + e\)
\(\texttt{H}_0: y = \mu + e\)

log\(_{10}\) likelihoods for QTL positions \(\lambda_1\) (for \(q_1\)) and \(\lambda_2\) (for \(q_2\))

\(l_f(\lambda_1,\lambda_2)\)
\(l_a(\lambda_1,\lambda_2)\)
\(l_1(\lambda_1)\)
\(l_0\)

LOD scores for 2-QTL scan

full (interactive) vs no QTL (\(f-0\)): \[\texttt{lod}_f(\lambda_1,\lambda_2) = l_f(\lambda_1,\lambda_2) - l_0\]

additive vs no QTL (\(a-0\)): \[\texttt{lod}_a(\lambda_1,\lambda_2) = l_a(\lambda_1,\lambda_2) - l_0\]

interaction, or full vs additive (\(f-a\)): \[\texttt{lod}_i(\lambda_1,\lambda_2) = l_f(\lambda_1,\lambda_2) - l_a(\lambda_1,\lambda_2)\]

usual 1 QTL vs no QTL (\(1-0\)): \[\texttt{lod}_1(\lambda_1) = l_1(\lambda_1) - l_0\]

flowering time: add QTL?

R/qtl tools: sim.geno, makeqtl, addqtl
fancier form of using first QTL as covariate

flowering time 2-D search

Brassica FLC homologs (6 years later)

model search and selection

  • QTL mapping began as hypothesis testing: is this a QTL?
    • much focus on adjusting for multiple testing
  • better to view problem as model selection
    • what set of QTL are well supported?
    • is there evidence for QTL-QTL interactions?

Model = an identified set of QTL and QTL-QTL interactions
(and possibly covariates and QTL-covariate interactions)

QTL model search and selection

  • Class of models: begin with additive models
    • add pairwise or higher interactions?
    • other approaches?
  • Model fit (MLE, Haley-Knott, …)
  • Model comparison
    • estimated prediction error
    • model effects criterion: AIC, BIC, penalized likelihood
    • Bayes method (prior across model space)
  • Model search
    • forward, backward, stepwise selection
    • randomized algorithm

QTL model search goal

Goal: identify major players

  • selected model has two types of errors:
    • miss important terms (QTLs or interactions)
    • include extraneous terms
  • both errors likely at the same time
    • identify as many correct terms as possible
    • while controlling rate of inclusion of extra terms
  • hypothesis testing only has one error at a time
    • pick no QTL model, but there is really a QTL at \(\lambda_1\)
    • pick 1 QTL model, but there is really no QTL

special nature of QTL models

What is special here?

  • continuum of ordinal-valued predictors (the genetic loci)
  • association among these QTL predictors
  • loci on different chromosomes are independent
  • along chromosome:
    • simple (and known) correlation structure

See Broman MultiQTL talk for more details

pros & cons of multiple QTL revisted

  • Benefits
    • reduce residual variation
    • increased power
    • separate linked QTL
    • identify interactions among QTL (epistasis)
  • Shortcomings
    • only includes significant loci
    • gets complicated very quickly
    • selection bias: overestimate effects of included loci
    • many loci of small effect ignored …

selection bias

selection bias

  • estimated QTL effect QTL varies from true effect
  • detect QTL when estimated effect is large
  • experiments with detected QTL often have larger estimated than true effect
  • selection bias largest in QTLs with small or moderate effects
  • true QTL effects smaller than those observed

implications of selection bias

  • estimated % variance explained by identified QTLs: too high
  • repeating an experiment: different QTL (Beavis effect)
  • congenics (or near isogenic lines): off base
  • marker-assisted selection: missed effect

See Broman (2003) and Haley, Knott (1992).
Beavis WD (1994). The power and deceit of QTL experiments: Lessons from comparative QTL studies. In DB Wilkinson, (ed) 49th Ann Corn Sorghum Res Conf, pp 252–268. Amer Seed Trade Asso, Washington, DC.

PHE = GWA + QTL * ENV example

PHE = QTL + GWA * ENV example

Pareto chart: from QTL to GWA

Pareto Diagram