KnowledgeBased Data Classification, Approximation and Optimization
Project Supported by NSF under Grant IIS0511905
Project Period: September 1, 2005  August 31, 2009
Supported Graduate Students
Michael Thompson
Ted Wild
Geng Deng
Qian Li
Project Summary
Massive datasets occur in all types of settings ranging from the
highly scientific to the ubiquitous internet. Making sense of this
massive data requires sophisticated computer sciences techniques such
as data classification, approximation and optimization. All of these
techniques can be improved substantially by making effective use of
prior knowledge that is often readily available. For example doctors'
experience can be utilized in obtaining improved classifiers for
various types of important problems, such as medical diagnosis and
prognosis. Since the most powerful stateof theart classifiers are
based on support vector machines, which in turn are formulated as
constrained or unconstrained optimization problems, it is our aim that
prior knowledge be incorporated into various optimizationbased
applications such as classification and approximation problems as well
into the theory of optimization itself.
To a large degree, this proposal is motivated by the investigators'
extensive collaborative work with oncologists, surgeons and medical
physicists and the investigators' desire to make full use of the
expertise of such practitioners by incorporating it into computable
but rigorous models.
The intellectual merit of the proposed work lies in the use of
rigorous theory and problem analysis techniques that incorporate
domain specific information into general optimization problems.
The research will first
incorporate knowledge into a linear or nonlinear support vector
machine classifier and show that such incorporation is possible by
appending additional constraints to the original problem. This does
not seem to have been attempted before, and preliminary tests indicate
improvements in classifier accuracy. Secondly, prior knowledge will
be introduced into approximation problems.
Thus, in addition to given
discrete data that is normally used to generate an approximation to an unknown
function, prior knowledge in the form of inequalities on polyhedral
sets is also taken into account. Finally, prior knowledge will be
incorporated into general constrained or unconstrained optimization
problems, wherein the prior knowledge consists of new constraints to
be imposed on the behavior of the objective function on various
regions. The generality of these new techniques will facilitate the
integration of information from disparate sources, since the theory
allows multiple sets of prior information to be included concurrently.
Specific application to radiotherapy treatment planning problems will
ensure the computer science advancements are demonstrably useful in a
particular problem domain.
The work will have broader impacts in other areas of medical
science, public health and health care delivery.
The optimization, modeling, and
computational techniques will provide a boost to advances in
cancer diagnosis and prognosis, chemotherapy, and other treatment regimes.
The knowledgebased approach encompasses a broad
spectrum of important classification and approximation problems that
have wide applicability in science and engineering.
The work
will also raise the profile of data mining techniques in other areas
such as
surgery, pharmacology, and medical research, by demonstrating how
our methodologies can be utilized to incorporate prior knowledge into
both planning and design issues, and improving both efficiency of
delivery and effectiveness of treatment in many clinical settings.
By coupling
the education of several computer science and engineering
students with the proposed work, a new
group of multidisciplinary
researchers will be trained that will ensure the technical
advances are applied to further application domains.
