A CLUSTER HIRE PROPOSAL IN MOLECULAR BIOMETRY

Modeling for the New Biology

 

 

Overview:

 

We propose a cluster hire focusing on the integration of mathematical and biological sciences, with particular attention placed on models and methods of inference that will guide our understanding of biological systems and processes.   The immediate goal of this hiring proposal is to build strong, new connections between the mathematical and biological sciences on the campus.  The interdisciplinary effort that these connections will foster should place the University of Wisconsin in a position of leadership in the search for new insight into biological processes, insight that requires new approaches beyond classical observation and experimentation.   In earlier centuries, the mathematical sciences have played a major role in the advancement of many of the physical sciences.  The biological sciences have now reached a point at which the mathematical sciences will play a comparable role in their advancement.  The University of Wisconsin needs to be in the vanguard of this new integration.

 

We will target investigators trained in mathematics, statistics, or related fields, whose research responds to challenges from the biological sciences.  Cognizant of the significance of both molecular measurement  and methods of theoretical analysis, we use the term `molecular biometry' to characterize our overall effort.  Molecular biometry involves mathematical modeling, statistical inference and computation.  It is a collaborative and interdisciplinary enterprise.

 

 

Why this Effort is Needed and Why Now:

 

a)         This proposal is a necessary response to the technological and scientific advances that are transforming biology.  Molecular technologies enable us to manipulate organisms and to obtain detailed, extensive data on sub-cellular processes, cellular interactions, and effects on whole organisms. Information technologies allow us to store these  data and navigate burgeoning public databases.  Further, opportunities afforded by the genome projects are having a profound effect on biological and biomedical investigation.  One clear consequence of these advances is that biological researchers are faced with large amounts of data.

 

The wealth of new data does not translate simply into a wealth of new understanding. Questions concerning process, dynamics, and control of intracellular communication or gene interaction, for example,  are difficult to address by measurement alone; they demand a synthesis that may be offered through a mathematical approach.  Mathematical modeling, guided by biological questions, informed by statistical principles, and enabled by computational machinery, can provide an organizing framework to understand the coherent behavior of a whole system from the integration and coordination of its parts. The mathematical scaffolding is essential in the search for new biological insights, both to cope with large amounts of information and to achieve accurate quantitative predictions.

 

b)         Many fields of biology are in a transition from disciplines that are largely qualitative to ones that are increasingly quantitative.  Although many trained in the biological sciences have substantial facility in mathematics, statistics, and computing, the mathematical level of emergent problems is increasingly complex.   Thus, this cluster will focus on scientists who are trained in the mathematical sciences and whose research addresses biological systems and processes.

 

The Department of Mathematics, in particular, lacks a critical mass of faculty with interests in the biological sciences at a time when there is a growing recognition that enhanced training for biologists in the mathematical sciences is essential.  A cluster-hire program will enable us to attract high quality mathematical scientists to the campus who would be unlikely to come if recruited one at a time by a single department and at the same time ensure that they come to the campus with strong connections to campus biological science programs already established.  The statistics and bioinformatics community has a fairly good track record at building links with biology.  However, most of this collaboration has heretofore focused on issues like clinical trials, agricultural research, and epidemiological studies.  Lacking here as well is a critical mass in the areas we call molecular biometry.

 

 

c)         Although some work in the direction of molecular biometry is happening on campus now, the scope of the problem and the pace of development force us to look beyond the current efforts to the next stage of advances.  This is a post-genomics proposal. The 'next stage' will require more detailed modeling of biological processes that underlie molecular data, and an analysis of methods of  data collection.  A better understanding of these two components together can result in great improvement in study design and in modeling  derived from the resulting studies. If our campus can get 'the jump' for this next stage, UW could position itself as a world leader in the post-genomics era. Achieving these goals will involve a new synergy between biologists using molecular approaches and mathematical scientists.

 

 

Glossary of Key Terms:

 

genomics                      the study of whole genomes, the complete DNA sequence

bioinformatics               the process of collecting, storing, and organizing genomic data

molecular biometry       the modeling of and inference derivation from molecular data

 

 

Research Needs and the Scientific Context:

 

The specific research needs are for the modeling of molecular processes in biology and in sharpening the inference based on data obtained from studies of these processes. This research is collaborative and interdisciplinary. We envision teams of individuals from the cluster hire and existing faculty and staff working on cutting-edge problems driven by molecular information. In many cases, new models and new statistical methodology will guide the discovery process.

 

Mathematical modeling of complex biological systems requires expertise in stochastic and ordinary dynamical systems with both discrete and continuous states and in computational methods. Statistical analysis for these systems will require a solid grounding in inference and experimental design, skills in a variety of approaches in computational statistics, and knowledge of the evolving suite of methods used in analysis of molecular data. Statistical methodology will be drawn from traditional areas such as linear and nonlinear models as well as more modern areas such as learning theory, visualization of geometric structures, and non-parametric methods based on differential topology and differential geometry.   For both mathematical modeling and statistical analysis, a fundamental knowledge of the underlying biological processes is absolutely essential.

 

 

 

Presented here are several specific research areas of current interest at UW that are in need of the development of new mathematical, statistical, and computational tools and ways of thinking. Although it is possible that our new hires might work specifically on these problems, in the context of this proposal, they only serve as examples of the type of problems that these individuals would address.    (Although we do not include a specific example related to it, molecular medicine offers a different set of challenges that can be addressed by molecular biometry. We imagine that a typical clinical trial will record a broad sprectrum of molecular phenotypes on each participant.  For example, the expression profile in tumor cells from a cancer patient could easily generate thousands of data points per person; NCI is funding large studies in this area.   It is quite likely to expect that the viral genome of an infected patient could be entirely sequenced to assist in therapy.)

 

 

 

1:         Intracellular processes                      A central problem in biology is to understand the workings of a cell.  The investigations of cell biology have provided a detailed description of cell structure and basic function; but no one understands how the ensemble of component molecules fully interact to enable cellular processes.  Modern measurement is approaching an ideal in which we can monitor fluctuations in abundance of all the molecular constituents in cells.  A traditional reductionist approach is unable to address questions of coordination and control of a cellular system, but a more integrative, theoretical approach might provide a way. Cell biology is entering a more mature stage, one that is integrative in nature, and one in which mathematical modeling will play a critical role.  We note that this modeling goes well beyond the organizing and mining of genomic and gene expression data into databases, activities that are currently underway. (In other words, it is beyond bioinformatics.)

 

Intracellular control and communication have often been described qualitatively in terms of pathways and circuits, leading to detailed networks requiring precise knowledge of protein concentrations and interaction constants. It is now believed that intracellular processes are quite robust, with modules of interacting proteins adjusting readily to perturbations.    How do we use mathematical modeling to identify members of these modules and their interactions?  How do these modules interact to define cell function?  A key attribute of these mathematical models will be to facilitate biological testing. The challenge to biologists, chemists and physicists is the development of physical/biological models to test the predictions of the mathematical models and thus define cell function.

 

 

           

2:         Gene function and interaction                       Key innovations in genomic biology have enabled plant biologists to begin understanding the functions and interactions of large numbers of genes.  Will this knowledge tell us how plants work? Local plant molecular biologists in the Arabidopsis Training Grant are studying the action of proteins crucial for various aspects of plant life.  They recognize that the successful study of large gene families requires interdisciplinary collaboration with researchers both in statistical genomics and bioinformatics. The explosion of genome information and the inadequacy of traditional gene-by-gene approaches prompt plant scientists to adopt a larger scale, systemic approach to understanding the basic process of plant life. How will we model the interrelationships and co-evolution of gene families? How will we correlate their cell biology role with observed whole-plant processes?

 

 

 

Complex processes in plants, such as winter survival and flowering response to environmental cues, involve many aspects of development as well as stochastic response to subtle changes in internal chemistry. While progress has been made with one-at-a-time discovery of candidate genes, further advances will require more comprehensive multidimensional approaches employing simultaneous measurements of many plant functions. How will plant breeders use such data to design the next generation of crops? New microarray gene expression methodology coupled with traditional marker approaches are beginning to provide some hints. Mechanistic models employing detailed plant physiology as modified by environmental influences are likely to play key roles.

 

 

3:         Complex diseases                  Complex diseases are multifaceted and often heterogeneous. For instance, while there have been numerous breakthroughs with type II diabetes in particular studies, these tend to only explain a small portion of the cases; it is extremely difficult to isolate what makes the system break as the body tends to compensate in many ways through apparently redundant pathways.  Recent work on the physiology and cell biology of adipocytes and pancreatic islet cells suggests that many biochemical pathways and body organs, including the brain, may be involved in proper control of insulin and glucose. Mouse models showing dramatic difference in diabetes suggest broad profiles of differences in functionally related genes. As this becomes clearer, new mathematical models of cell signaling between disparate body organs will be needed. These models may ultimately rely on measurements of hundreds of proteins and thousands of metabolites. While many intuitive stories will emerge from such investigations, other aspects may at first or second glance be quite counterintuitive, particularly as attempts are made to apply this knowledge to human studies.

 

 

 

Instructional Needs:

 

There is a critical need for instructional offerings in areas related to bioinformatics, statistical genomics, and mathematical modeling generally in the biological sciences. We will expect faculty in this cluster to play a leading role in the development of the following instructional programs.

 

(1) For undergraduates in the biological sciences,  we anticipate at least two new courses that will include an introduction to bioinformatics, statistical inference and related questions of data collection, mathematical modeling, and computational biology. The courses will be at two levels.  Courses at the first level will be directed at students with at least a semester of calculus, but without strong interest in pursuing mathematical issues more deeply.  The goal of this course (or courses) will be for students to obtain a modest grasp of some of the key issues in the new computational biology.  Courses at the second level will be directed at students who want or need mathematical and computational training comparable to that now received by students in engineering and the physical sciences.  These courses, new ones as well as some current ones modified as appropriate, will be directed towards students with at least two semesters of mathematics (including calculus and possibly some differential equations and linear algebra) and a semester of statistics.   The objective of this suite of courses will be to provide students a more rigorous grounding in the mathematical, statistical, and computational aspects of the new biology.  With the increasing quantification of biology, we anticipate that the number of students interested in this higher level program will grow over time.  (Efforts towards creating new courses and modifying existing ones will interface with ideas emerging from the current SyMBiosis project.)

 

(2) For graduate students in the biological sciences with some statistics background (e.g. Stat 572), we plan an additional semester course that focuses on statistical issues related to questions of statistical inference -- and the corresponding issues relating to the design of data collection methods -- for biological data of the type described above.  (We note that Brian Yandell is currently teaching an experimental course in this area that has attracted an enthusiastic audience.)

 

(3) For graduate students in the biological sciences with a solid background in mathematics, we propose a focused course on mathematical modeling of biological processes in genomics and molecular biology.  This course could also serve undergraduates who have completed the program outlined above and who wish to develop their quantitative skills further.

 

(4) For graduate students in mathematics and in statistics, we plan several graduate level classes, perhaps taught on an every-other-year basis.  These courses will provide training for the next generation of mathematicians and statisticians embarking on research careers in molecular biometry.

 

(5) We also see a need for more specialized courses such as probabilistic/computational genomics, statistical genomics in plant and animal breeding, modeling of cell communication systems.  These courses will serve advanced students and provide continuing education for faculty and staff.  Some of these courses will be offered as seminars, perhaps co-taught with biologists.  These seminars could be developed on a modular basis with, perhaps, substantial amounts of web-based material, and could be offered in a format suitable for distance education.

 

 

Beyond the development of new courses, we believe that there are strong needs for new programs.  For undergraduate students, we will give serious consideration to the development of a new interdisciplinary degree program integrating biology, mathematics, statistics, and perhaps physics.  This will be modeled on the very successful program in Applied Mathematics, Engineering, and Physics that produces a modest number of highly qualified graduates every year. 

 

For graduate students, we will expand and broaden the successful M.S. Biometry Degree Program to provide an increased range of opportunities within computational biology.  The main goal of this program will be to provide added training in modeling, inference, and computational biology for students in biological fields who are simultaneously working towards a graduate degree (usually Ph.D.) and wish to expand their quantitative capabilities.

 

 

Linkages to Existing Efforts and Strengths on Campus:

 

Expertise in fields related to genomics exists in many places on Campus, including L&S, CALS, Pharmacy, and the Medical School. The Genome Center of Wisconsin has been a focus for a number of hires in this area. Computer Science and Biostatistics & Medical Informatics have added several bioinformaticists to their faculties. Numerous biological scientists on campus -- including many whose names are attached to this proposal -- are attempting to extend their understanding of biological problems by developing deeper inferences from their data and modeling their systems.   The campus, however, lacks a solid core of sufficient size in the areas of mathematical sciences related to molecular biometry. The current proposal addresses this need.

 

 

 

We envision the cluster of new faculty serving as a focus and catalyst for molecular biometry.  We see the new hires joining with biologists, bioinformaticists, and other scientists already on campus to foster a community of scholars in the new biology.  It is highly likely that this community will serve as a magnet for future recruitment of outstanding new faculty and postdocs in a range of areas for which the expertise of this community is central.

 

 

Structure of Cluster:

 

We propose a cluster with four positions.  We would like to hold open the opportunity to fill one or two of these at the senior level, should outstanding senior candidates be available, but anticipate that most will be at the junior level.  These four individuals will have their major training in various areas within the mathematical sciences with strong interest and demonstrated experience in a biological area dealing with genomics or molecular biology. We aim for a mix of biological interests spanning agriculture, medicine, and basic science.  We believe that it is particularly important to find a mathematician at the senior level to play a leading role in strengthening the connections and cultural dialog between the Mathematics Department and the biological community.

 

Ideally we would like our cluster to contain two mathematicians and two statisticians, all with substantial biological background and computational skills.  However, with the growth of new programs -- many very interdisciplinary -- the exact background requirements cannot be precisely stated.  We expect that one of the mathematicians and one of the statisticians will have major involvement with the Medical School with the others having similar involvement in CALS.  For the Medical School faculty, we expect that they will be jointly appointed between Biostatistics and Medical Informatics and Mathematics (in one case) and Statistics (in the other).  For the CALS faculty we anticipate that they will be jointly appointed between a specific department (e.g. Animal Science or Genetics) in CALS and Mathematics and Statistics respectively with a strong tie-in to the current Biometry group.  These appointments will carry responsibilities for consulting and collaborative research similar to the responsibilities of current faculty in biostatistics and biometry.

 

We expect to create a “horizontal” structure to foster the community of scholars which will coalesce around this cluster.  We leave open the exact form of this structure although a ‘center’ might be a possibility,  We will coordinate closely with other groups on campus, in particular the Department of Biostatistics and Medical Informatics, the Biometry Program, and the Genome Center of Wisconsin.

 

 

 

Space needs, salaries, and start-up costs:

 

The faculty in this cluster will require adequate office space; all of them should have an office in each of their joint department homes. In addition, it is important to have an area where faculty, postdocs, and students within the molecular biometry community can work and exchange ideas. Thus, modest group space -- perhaps on the order of 1400 square feet -- will be very helpful in fostering a cohesive community.  Ideally, such space should be located in as central a location as possible so as to be accessible to all community members.  All space considerations require cognizance of the computational needs for the cluster hires.

 

 

 

Salaries for junior faculty will be at competitive rates for new hires in the departments that will serve as homes for the faculty in our cluster.  A salary for a potential senior hire will depend on the level of the candidate.  We believe that 12 month appointments should be made available to all of our hires although it likely that 9 month appointments may be appropriate in some circumstances.

 

Faculty in this cluster will require modern networked computers that would be part of a semi-private computational supercomputer cluster (e.g. using Condor developed by Computer Sciences). Such a computer cluster might cost $250,000 for the team; additional ongoing resources to maintain hardware and software will be needed as well as, potentially,  some extra power supply and storage space. It is anticipated that some seed funds will be required for the first few years until collaborative grants emerge to cover ongoing costs. The computational needs of this cluster nicely complement those of other campus efforts, notably the existing Genome Center SUN and any Super Computing Facility that may be developed in conjunction with prospective hires under the ongoing Computational Sciences Cluster Hiring.  Some economies might be achieved by a sharing of resources.

 

 

Diversity:

 

We will emphasize diversity in our hiring efforts. For example, within the rapidly growing field of statistical genomics, we know that a substantial fraction of recent graduate students, postdocs, and new faculty members are women. Strong programs such as UC-Berkeley, the University of Washington, North Carolina State, and others all have a substantial number of women listed in their web pages.  The table below is incomplete, but points out that many women are being trained in these areas.

 

faculty

 

students

 

URL

Program

men

women

total

Women

 

NCSU Bioinformatics

75

19

?

Several

http://genomics.ncsu.edu/bioinfo.html

Berkeley Statistics

1

0

8

7

http://www.stat.Berkeley.EDU/users/terry/

zarray/Html/group.html

VA Tech Bioinformatics

9

4

?

?

http://www.bioinformatics.vt.edu/faculty.html

Purdue U Genomics

12

7

?

?

http://www.genomics.purdue.edu/

U WA Statistical Genetics

9

5

several

1+

http://www.stat.washington.edu/thompson/Statgen/

USC Genomics

8

0

22

Several

http://www-hto.usc.edu/

U MI Statistical Genetics

9

4

10

5+

http://www.sph.umich.edu/statgen/

Harvard U Computational Biology

1

0

5

?

http://www.biostat.harvard.edu/complab/

 

 

Potential Search Committee and approximate time line:

 

Tentatively, we anticipate that Prof. Tom Kurtz, jointly appointed between Mathematics and Statistics, will be chair of the Search Committee.  We anticipate a member of Biostatistics and Medical Informatics (probably Michael Newton), a member of the Biometry Program (Rick Nordheim or Brian Yandell), an additional faculty member from Mathematics (possibly Paul Milewski or David Griffeaths) and 2 or 3 of the biologists active in the preparation of this document (e.g. Michael Gould, Mike Culbertson, Brian Kirkpatrick, John Yin) to complete a committee of about 6 or 7 individuals.

 

For most searches in Mathematics and Statistics, there is a main 'hiring season'.  This requires scheduling interviews during the months of November through March.  For senior faculty, there is less of a specific season.  We would anticipate beginning our search to coincide with the 2002 hiring season;  thus, allowing the first successful hires to begin in Fall 2002.  We would hope to complete our hiring in two hiring seasons.

 

 

Contributors to the Proposal:

 

The following played a role in the preparation of this document:

 

Kurtz, Thomas              Mathematics; Statistics                          L&S

Newton, Michael                      Biostat & Med Info; Statistics               Med School; L&S

Nordheim, Rick                        Statistics; Forest Ecol & Mgmt L&S; CALS

Yandell, Brian                           Statistics; Horticulture                           L&S; CALS

Blattner, Fred                           Genetics; Med Genetics                        CALS; Med School

Culbertson, Mike                      Genetics; Med Genetics                        CALS; Med School

DeMets, David                         Biostat & Med Info; Statistics               Med School; L&S

Gould, Michael             Oncology, Medical Physics                   Med School

Kirkpatrick, Brian                     Animal Sci; Dairy Sci                            CALS

Milewski, Paul              Mathematics                                         L&S

Moss, Richard                          Physiology                                            Med School

Yin, John                                  Chemical Engineering                            Eng

 

 

Other faculty who have expressed interest and provided support:

 

Adem, Alejandro                      Mathematics                                         L&S

Anantharaman, Thomas            Biostat & Med Info                              Med School

Attie, Alan                                Biochemistry                                         CALS

Bownds, Deric             Zoology                                                L&S

Bradfield, Chris                        Oncology                                             Med School

Craven, Mark                           Biostat & Med Info; Comp Sci Med School; L&S

Forest, Katrina             Bacteriology                                         CALS

Gianola, Daniel             Animal Sci; Dairy Sci                            CALS

Goodman, Robert                     Plant Pathology                         CALS

Goodrich-Blair, Heidi               Bacteriology                                         CALS

Griffeaths, David                       Mathematics                                         L&S

Lee, Carol Eunmi                      Zoology                                                L&S

Osborn, Tom                            Agronomy                                            CALS

Palmenberg, Ann                      Biochemistry; Virology              CALS

Phillips, George                        Biochemistry                                         CALS

Porter, Warren                         Zoology                                                L&S

Reinsel, Greg                            Statistics                                               L&S

Schwartz, David                       Chemistry; Genetics                              L&S: CALS

Sytsma, Ken                             Botany                                                 L&S

Triplett, Eric                             Agronomy                                            CALS

Turner, Robert                          Mathematics                                         L&S

Waller, Don                              Botany                                                 L&S

 

 

 

 

 

 

 

 

 

 

 

 

Appendix:       Letters of Support

 

Short biographies of external experts:

 

Simon Tavare holds the George and Louise Kawamoto Chair in Biological Sciences at the University of Southern California and holds joint appointments in the Departments of Mathematics and Preventive Medicine.  With Michael Waterman, he has led an NSF funded program directed at developing post-doctoral researchers in computational genetics and molecular biology.  His research interests include population genetics, molecular evolution, and analysis of expression array data.

http://www-hto.usc.edu/people/Tavare.html

 

Richard Durrett is Professor of Mathematics at Cornell University.  He has been a leader in the development of interdisciplinary training programs in the mathematical and biological sciences.  His research interests include application of stochastic models to complex biological systems, in particular, spatial models in ecology and modeling and analysis of genetic data.

http://math.cornell.edu/~durrett/

 

Terence Speed is Professor in the Department of Statistics and the Interdepartmental Group in Biostatistics at the University of California, Berkeley. He is a member of the Program in Mathematics and Molecular Biology, a national multi-university interdisciplinary research and training consortium funded by the Burroughs Wellcome Fund (BWF) Interfaces Program.  Speed has played a leading role in the application of statistics to problems in genetics and molecular biology. His major interests within this area are in the mapping of genes in mice and humans, including disease genes and genes contributing to the variation of quantitative traits, and the statistics of production DNA sequencing. He is currently on the editorial board of the Journal of Computational Biology.

http://www.stat.Berkeley.edu/users/terry/

 

Richard Simon is Chief, Biometric Research Branch, and Head, Molecular Statistics & Bioinformatics Section, at the Division of Cancer Treatment & Drugs, National Cancer Institute. His current research interests include Bayesian methods in clinical trial design and analysis, and the development of methods for the analysis of genome sequence and expression data to identify cancer related genes, elucidate their functions, determine the steps of tumor development, identify molecular targets and develop genome based approaches to the prevention, detection, diagnosis and treatment of cancer.

http://linus.nci.nih.gov/~brb/rsimon.htm