A CLUSTER HIRE PROPOSAL IN MOLECULAR BIOMETRY
Modeling for the New Biology
Overview:
We propose a cluster hire focusing on the integration of mathematical and biological sciences, with particular attention placed on models and methods of inference that will guide our understanding of biological systems and processes. The immediate goal of this hiring proposal is to build strong, new connections between the mathematical and biological sciences on the campus. The interdisciplinary effort that these connections will foster should place the University of Wisconsin in a position of leadership in the search for new insight into biological processes, insight that requires new approaches beyond classical observation and experimentation. In earlier centuries, the mathematical sciences have played a major role in the advancement of many of the physical sciences. The biological sciences have now reached a point at which the mathematical sciences will play a comparable role in their advancement. The University of Wisconsin needs to be in the vanguard of this new integration.
We will target investigators
trained in mathematics, statistics, or related fields, whose research responds
to challenges from the biological sciences.
Cognizant of the significance of both molecular measurement and methods of theoretical analysis, we use
the term `molecular biometry' to characterize our overall effort. Molecular biometry involves mathematical
modeling, statistical inference and computation. It is a collaborative and interdisciplinary enterprise.
Why this Effort is Needed and Why Now:
a) This
proposal is a necessary response to the technological and scientific advances
that are transforming biology.
Molecular technologies enable us to manipulate organisms and to obtain
detailed, extensive data on sub-cellular processes, cellular interactions, and
effects on whole organisms. Information technologies allow us to store
these data and navigate burgeoning
public databases. Further, opportunities
afforded by the genome projects are having a profound effect on biological and
biomedical investigation. One clear
consequence of these advances is that biological researchers are faced with
large amounts of data.
The wealth of new data does not
translate simply into a wealth of new understanding. Questions concerning
process, dynamics, and control of intracellular communication or gene
interaction, for example, are difficult
to address by measurement alone; they demand a synthesis that may be offered
through a mathematical approach.
Mathematical modeling, guided by biological questions, informed by
statistical principles, and enabled by computational machinery, can provide an
organizing framework to understand the coherent behavior of a whole system from
the integration and coordination of its parts. The mathematical scaffolding is
essential in the search for new biological insights, both to cope with large
amounts of information and to achieve accurate quantitative predictions.
b) Many
fields of biology are in a transition from disciplines that are largely
qualitative to ones that are increasingly quantitative. Although many trained in the biological
sciences have substantial facility in mathematics, statistics, and computing,
the mathematical level of emergent problems is increasingly complex. Thus, this cluster will focus on scientists
who are trained in the mathematical sciences and whose research addresses
biological systems and processes.
The Department of Mathematics, in particular, lacks a critical mass of faculty with interests in the biological sciences at a time when there is a growing recognition that enhanced training for biologists in the mathematical sciences is essential. A cluster-hire program will enable us to attract high quality mathematical scientists to the campus who would be unlikely to come if recruited one at a time by a single department and at the same time ensure that they come to the campus with strong connections to campus biological science programs already established. The statistics and bioinformatics community has a fairly good track record at building links with biology. However, most of this collaboration has heretofore focused on issues like clinical trials, agricultural research, and epidemiological studies. Lacking here as well is a critical mass in the areas we call molecular biometry.
c) Although
some work in the direction of molecular biometry is happening on campus now,
the scope of the problem and the pace of development force us to look beyond
the current efforts to the next stage of advances. This is a post-genomics proposal. The 'next stage' will require
more detailed modeling of biological processes that underlie molecular data,
and an analysis of methods of data
collection. A better understanding of
these two components together can result in great improvement in study design
and in modeling derived from the
resulting studies. If our campus can get 'the jump' for this next stage, UW
could position itself as a world leader in the post-genomics era. Achieving
these goals will involve a new synergy between biologists using molecular
approaches and mathematical scientists.
Glossary of Key Terms:
genomics the study of whole genomes, the complete DNA sequence
bioinformatics the process of collecting, storing, and organizing genomic data
molecular biometry the modeling of and inference derivation from molecular data
Research Needs and the Scientific Context:
The specific research needs are for the modeling of molecular processes in biology and in sharpening the inference based on data obtained from studies of these processes. This research is collaborative and interdisciplinary. We envision teams of individuals from the cluster hire and existing faculty and staff working on cutting-edge problems driven by molecular information. In many cases, new models and new statistical methodology will guide the discovery process.
Mathematical modeling of complex biological systems requires expertise in stochastic and ordinary dynamical systems with both discrete and continuous states and in computational methods. Statistical analysis for these systems will require a solid grounding in inference and experimental design, skills in a variety of approaches in computational statistics, and knowledge of the evolving suite of methods used in analysis of molecular data. Statistical methodology will be drawn from traditional areas such as linear and nonlinear models as well as more modern areas such as learning theory, visualization of geometric structures, and non-parametric methods based on differential topology and differential geometry. For both mathematical modeling and statistical analysis, a fundamental knowledge of the underlying biological processes is absolutely essential.
Presented here are several specific research areas of
current interest at UW that are in need of the development of new mathematical,
statistical, and computational tools and ways of thinking. Although it is
possible that our new hires might work specifically on these problems, in the
context of this proposal, they only serve as examples of the type of problems
that these individuals would address.
(Although we do not include a specific example related to it, molecular
medicine offers a different set of challenges that can be addressed by
molecular biometry. We imagine that a typical
clinical trial will record a broad sprectrum of molecular phenotypes on each
participant. For example, the
expression profile in tumor cells from a cancer patient could easily generate
thousands of data points per person; NCI is funding large studies in this
area. It is quite likely to expect
that the viral genome of an infected patient could be entirely sequenced to
assist in therapy.)
1: Intracellular
processes A central problem in biology is to understand the workings
of a cell. The investigations of cell
biology have provided a detailed description of cell structure and basic
function; but no one understands how the ensemble of component molecules fully
interact to enable cellular processes.
Modern measurement is approaching an ideal in which we can monitor
fluctuations in abundance of all the molecular constituents in cells. A traditional reductionist approach is
unable to address questions of coordination and control of a cellular system,
but a more integrative, theoretical approach might provide a way. Cell biology
is entering a more mature stage, one that is integrative in nature, and one in
which mathematical modeling will play a critical role. We note that this modeling goes well beyond
the organizing and mining of genomic and gene expression data into databases,
activities that are currently underway. (In other words, it is beyond
bioinformatics.)
Intracellular control and
communication have often been described qualitatively in terms of pathways and
circuits, leading to detailed networks requiring precise knowledge of protein
concentrations and interaction constants. It is now believed that intracellular
processes are quite robust, with modules of interacting proteins adjusting
readily to perturbations. How do we
use mathematical modeling to identify members of these modules and their
interactions? How do these modules interact
to define cell function? A key
attribute of these mathematical models will be to facilitate biological
testing. The challenge to biologists, chemists and physicists is the
development of physical/biological models to test the predictions of the
mathematical models and thus define cell function.
2: Gene function and interaction Key innovations in genomic biology have enabled plant biologists to begin understanding the functions and interactions of large numbers of genes. Will this knowledge tell us how plants work? Local plant molecular biologists in the Arabidopsis Training Grant are studying the action of proteins crucial for various aspects of plant life. They recognize that the successful study of large gene families requires interdisciplinary collaboration with researchers both in statistical genomics and bioinformatics. The explosion of genome information and the inadequacy of traditional gene-by-gene approaches prompt plant scientists to adopt a larger scale, systemic approach to understanding the basic process of plant life. How will we model the interrelationships and co-evolution of gene families? How will we correlate their cell biology role with observed whole-plant processes?
Complex processes in plants, such as winter survival and flowering response to environmental cues, involve many aspects of development as well as stochastic response to subtle changes in internal chemistry. While progress has been made with one-at-a-time discovery of candidate genes, further advances will require more comprehensive multidimensional approaches employing simultaneous measurements of many plant functions. How will plant breeders use such data to design the next generation of crops? New microarray gene expression methodology coupled with traditional marker approaches are beginning to provide some hints. Mechanistic models employing detailed plant physiology as modified by environmental influences are likely to play key roles.
3: Complex diseases Complex diseases are
multifaceted and often heterogeneous. For instance, while there have been
numerous breakthroughs with type II diabetes in particular studies, these tend
to only explain a small portion of the cases; it is extremely difficult to
isolate what makes the system break as the body tends to compensate in many
ways through apparently redundant pathways.
Recent work on the physiology and cell biology of adipocytes and
pancreatic islet cells suggests that many biochemical pathways and body organs,
including the brain, may be involved in proper control of insulin and glucose.
Mouse models showing dramatic difference in diabetes suggest broad profiles of
differences in functionally related genes. As this becomes clearer, new
mathematical models of cell signaling between disparate body organs will be
needed. These models may ultimately rely on measurements of hundreds of proteins
and thousands of metabolites. While many intuitive stories will emerge from
such investigations, other aspects may at first or second glance be quite
counterintuitive, particularly as attempts are made to apply this knowledge to
human studies.
Instructional Needs:
There is a critical need for instructional offerings in areas related to bioinformatics, statistical genomics, and mathematical modeling generally in the biological sciences. We will expect faculty in this cluster to play a leading role in the development of the following instructional programs.
(1) For undergraduates in the biological sciences, we anticipate at least two new courses that will include an introduction to bioinformatics, statistical inference and related questions of data collection, mathematical modeling, and computational biology. The courses will be at two levels. Courses at the first level will be directed at students with at least a semester of calculus, but without strong interest in pursuing mathematical issues more deeply. The goal of this course (or courses) will be for students to obtain a modest grasp of some of the key issues in the new computational biology. Courses at the second level will be directed at students who want or need mathematical and computational training comparable to that now received by students in engineering and the physical sciences. These courses, new ones as well as some current ones modified as appropriate, will be directed towards students with at least two semesters of mathematics (including calculus and possibly some differential equations and linear algebra) and a semester of statistics. The objective of this suite of courses will be to provide students a more rigorous grounding in the mathematical, statistical, and computational aspects of the new biology. With the increasing quantification of biology, we anticipate that the number of students interested in this higher level program will grow over time. (Efforts towards creating new courses and modifying existing ones will interface with ideas emerging from the current SyMBiosis project.)
(2) For graduate students in the biological sciences with some statistics background (e.g. Stat 572), we plan an additional semester course that focuses on statistical issues related to questions of statistical inference -- and the corresponding issues relating to the design of data collection methods -- for biological data of the type described above. (We note that Brian Yandell is currently teaching an experimental course in this area that has attracted an enthusiastic audience.)
(3) For graduate students in the biological sciences with a solid background in mathematics, we propose a focused course on mathematical modeling of biological processes in genomics and molecular biology. This course could also serve undergraduates who have completed the program outlined above and who wish to develop their quantitative skills further.
(4) For graduate students in mathematics and in statistics, we plan several graduate level classes, perhaps taught on an every-other-year basis. These courses will provide training for the next generation of mathematicians and statisticians embarking on research careers in molecular biometry.
(5) We also see a need for more specialized courses such as probabilistic/computational genomics, statistical genomics in plant and animal breeding, modeling of cell communication systems. These courses will serve advanced students and provide continuing education for faculty and staff. Some of these courses will be offered as seminars, perhaps co-taught with biologists. These seminars could be developed on a modular basis with, perhaps, substantial amounts of web-based material, and could be offered in a format suitable for distance education.
Beyond the development of new courses, we believe that there are strong needs for new programs. For undergraduate students, we will give serious consideration to the development of a new interdisciplinary degree program integrating biology, mathematics, statistics, and perhaps physics. This will be modeled on the very successful program in Applied Mathematics, Engineering, and Physics that produces a modest number of highly qualified graduates every year.
For graduate students, we will expand and broaden the successful M.S. Biometry Degree Program to provide an increased range of opportunities within computational biology. The main goal of this program will be to provide added training in modeling, inference, and computational biology for students in biological fields who are simultaneously working towards a graduate degree (usually Ph.D.) and wish to expand their quantitative capabilities.
Linkages to Existing Efforts and Strengths on Campus:
Expertise in fields related to genomics exists in many places on Campus, including L&S, CALS, Pharmacy, and the Medical School. The Genome Center of Wisconsin has been a focus for a number of hires in this area. Computer Science and Biostatistics & Medical Informatics have added several bioinformaticists to their faculties. Numerous biological scientists on campus -- including many whose names are attached to this proposal -- are attempting to extend their understanding of biological problems by developing deeper inferences from their data and modeling their systems. The campus, however, lacks a solid core of sufficient size in the areas of mathematical sciences related to molecular biometry. The current proposal addresses this need.
We envision the cluster of new faculty serving as a focus and catalyst for molecular biometry. We see the new hires joining with biologists, bioinformaticists, and other scientists already on campus to foster a community of scholars in the new biology. It is highly likely that this community will serve as a magnet for future recruitment of outstanding new faculty and postdocs in a range of areas for which the expertise of this community is central.
Structure of Cluster:
We propose a cluster with four positions. We would like to hold open the opportunity to fill one or two of these at the senior level, should outstanding senior candidates be available, but anticipate that most will be at the junior level. These four individuals will have their major training in various areas within the mathematical sciences with strong interest and demonstrated experience in a biological area dealing with genomics or molecular biology. We aim for a mix of biological interests spanning agriculture, medicine, and basic science. We believe that it is particularly important to find a mathematician at the senior level to play a leading role in strengthening the connections and cultural dialog between the Mathematics Department and the biological community.
Ideally we would like our cluster to contain two mathematicians and two statisticians, all with substantial biological background and computational skills. However, with the growth of new programs -- many very interdisciplinary -- the exact background requirements cannot be precisely stated. We expect that one of the mathematicians and one of the statisticians will have major involvement with the Medical School with the others having similar involvement in CALS. For the Medical School faculty, we expect that they will be jointly appointed between Biostatistics and Medical Informatics and Mathematics (in one case) and Statistics (in the other). For the CALS faculty we anticipate that they will be jointly appointed between a specific department (e.g. Animal Science or Genetics) in CALS and Mathematics and Statistics respectively with a strong tie-in to the current Biometry group. These appointments will carry responsibilities for consulting and collaborative research similar to the responsibilities of current faculty in biostatistics and biometry.
We expect to create a “horizontal” structure to foster the community of scholars which will coalesce around this cluster. We leave open the exact form of this structure although a ‘center’ might be a possibility, We will coordinate closely with other groups on campus, in particular the Department of Biostatistics and Medical Informatics, the Biometry Program, and the Genome Center of Wisconsin.
Space needs, salaries, and start-up costs:
The faculty in this cluster will require adequate office space; all of them should have an office in each of their joint department homes. In addition, it is important to have an area where faculty, postdocs, and students within the molecular biometry community can work and exchange ideas. Thus, modest group space -- perhaps on the order of 1400 square feet -- will be very helpful in fostering a cohesive community. Ideally, such space should be located in as central a location as possible so as to be accessible to all community members. All space considerations require cognizance of the computational needs for the cluster hires.
Salaries for junior faculty will be at competitive rates for new hires in the departments that will serve as homes for the faculty in our cluster. A salary for a potential senior hire will depend on the level of the candidate. We believe that 12 month appointments should be made available to all of our hires although it likely that 9 month appointments may be appropriate in some circumstances.
Faculty in this cluster will require modern networked computers that would be part of a semi-private computational supercomputer cluster (e.g. using Condor developed by Computer Sciences). Such a computer cluster might cost $250,000 for the team; additional ongoing resources to maintain hardware and software will be needed as well as, potentially, some extra power supply and storage space. It is anticipated that some seed funds will be required for the first few years until collaborative grants emerge to cover ongoing costs. The computational needs of this cluster nicely complement those of other campus efforts, notably the existing Genome Center SUN and any Super Computing Facility that may be developed in conjunction with prospective hires under the ongoing Computational Sciences Cluster Hiring. Some economies might be achieved by a sharing of resources.
Diversity:
We will emphasize diversity in our hiring efforts. For example, within the rapidly growing field of statistical genomics, we know that a substantial fraction of recent graduate students, postdocs, and new faculty members are women. Strong programs such as UC-Berkeley, the University of Washington, North Carolina State, and others all have a substantial number of women listed in their web pages. The table below is incomplete, but points out that many women are being trained in these areas.
|
|
faculty |
|
students |
|
URL |
|
Program |
men |
women |
total |
Women |
|
|
NCSU Bioinformatics |
75 |
19 |
? |
Several |
http://genomics.ncsu.edu/bioinfo.html |
|
Berkeley Statistics |
1 |
0 |
8 |
7 |
http://www.stat.Berkeley.EDU/users/terry/ zarray/Html/group.html |
|
VA Tech Bioinformatics |
9 |
4 |
? |
? |
http://www.bioinformatics.vt.edu/faculty.html |
|
Purdue U Genomics |
12 |
7 |
? |
? |
http://www.genomics.purdue.edu/ |
|
U WA Statistical Genetics |
9 |
5 |
several |
1+ |
http://www.stat.washington.edu/thompson/Statgen/ |
|
USC Genomics |
8 |
0 |
22 |
Several |
http://www-hto.usc.edu/ |
|
U MI Statistical Genetics |
9 |
4 |
10 |
5+ |
http://www.sph.umich.edu/statgen/ |
|
Harvard U Computational Biology |
1 |
0 |
5 |
? |
http://www.biostat.harvard.edu/complab/ |
Potential Search Committee and approximate time line:
Tentatively, we anticipate that Prof. Tom Kurtz, jointly appointed between Mathematics and Statistics, will be chair of the Search Committee. We anticipate a member of Biostatistics and Medical Informatics (probably Michael Newton), a member of the Biometry Program (Rick Nordheim or Brian Yandell), an additional faculty member from Mathematics (possibly Paul Milewski or David Griffeaths) and 2 or 3 of the biologists active in the preparation of this document (e.g. Michael Gould, Mike Culbertson, Brian Kirkpatrick, John Yin) to complete a committee of about 6 or 7 individuals.
For most searches in Mathematics and Statistics, there is a main 'hiring season'. This requires scheduling interviews during the months of November through March. For senior faculty, there is less of a specific season. We would anticipate beginning our search to coincide with the 2002 hiring season; thus, allowing the first successful hires to begin in Fall 2002. We would hope to complete our hiring in two hiring seasons.
Contributors to the Proposal:
The following played a role in the preparation of this document:
Kurtz, Thomas Mathematics; Statistics L&S
Newton, Michael Biostat & Med Info; Statistics Med School; L&S
Nordheim, Rick Statistics; Forest Ecol & Mgmt L&S; CALS
Yandell, Brian Statistics; Horticulture L&S; CALS
Blattner, Fred Genetics; Med Genetics CALS; Med School
Culbertson, Mike Genetics; Med Genetics CALS; Med School
DeMets, David Biostat & Med Info; Statistics Med School; L&S
Gould, Michael Oncology, Medical Physics Med School
Kirkpatrick, Brian Animal Sci; Dairy Sci CALS
Milewski, Paul Mathematics L&S
Moss, Richard Physiology Med School
Yin, John Chemical Engineering Eng
Other faculty who have expressed interest and provided support:
Adem, Alejandro Mathematics L&S
Anantharaman, Thomas Biostat & Med Info Med School
Attie, Alan Biochemistry CALS
Bownds, Deric Zoology L&S
Bradfield, Chris Oncology Med School
Craven, Mark Biostat & Med Info; Comp Sci Med School; L&S
Forest, Katrina Bacteriology CALS
Gianola, Daniel Animal Sci; Dairy Sci CALS
Goodman, Robert Plant Pathology CALS
Goodrich-Blair, Heidi Bacteriology CALS
Griffeaths, David Mathematics L&S
Lee, Carol Eunmi Zoology L&S
Osborn, Tom Agronomy CALS
Palmenberg, Ann Biochemistry; Virology CALS
Phillips, George Biochemistry CALS
Porter, Warren Zoology L&S
Reinsel, Greg Statistics L&S
Schwartz, David Chemistry; Genetics L&S: CALS
Sytsma, Ken Botany L&S
Triplett, Eric Agronomy CALS
Turner, Robert Mathematics L&S
Waller, Don Botany L&S
Appendix: Letters of Support
Short biographies of external experts:
Simon Tavare holds the George and Louise Kawamoto Chair in Biological Sciences at the University of Southern California and holds joint appointments in the Departments of Mathematics and Preventive Medicine. With Michael Waterman, he has led an NSF funded program directed at developing post-doctoral researchers in computational genetics and molecular biology. His research interests include population genetics, molecular evolution, and analysis of expression array data.
http://www-hto.usc.edu/people/Tavare.html
Richard Durrett is Professor of Mathematics at Cornell University. He has been a leader in the development of interdisciplinary training programs in the mathematical and biological sciences. His research interests include application of stochastic models to complex biological systems, in particular, spatial models in ecology and modeling and analysis of genetic data.
http://math.cornell.edu/~durrett/
Terence Speed is Professor in the Department of Statistics and the Interdepartmental Group in Biostatistics at the University of California, Berkeley. He is a member of the Program in Mathematics and Molecular Biology, a national multi-university interdisciplinary research and training consortium funded by the Burroughs Wellcome Fund (BWF) Interfaces Program. Speed has played a leading role in the application of statistics to problems in genetics and molecular biology. His major interests within this area are in the mapping of genes in mice and humans, including disease genes and genes contributing to the variation of quantitative traits, and the statistics of production DNA sequencing. He is currently on the editorial board of the Journal of Computational Biology.
http://www.stat.Berkeley.edu/users/terry/
Richard Simon is Chief, Biometric Research Branch, and Head, Molecular Statistics & Bioinformatics Section, at the Division of Cancer Treatment & Drugs, National Cancer Institute. His current research interests include Bayesian methods in clinical trial design and analysis, and the development of methods for the analysis of genome sequence and expression data to identify cancer related genes, elucidate their functions, determine the steps of tumor development, identify molecular targets and develop genome based approaches to the prevention, detection, diagnosis and treatment of cancer.
http://linus.nci.nih.gov/~brb/rsimon.htm