Knowledge-Based Data Classification, Approximation and Optimization
Project Supported by NSF under Grant IIS-0511905
Project Period: September 1, 2005 -- August 31, 2009


Principal Investigator: Olvi L. Mangasarian

  Office:       4385 Comp Sci & Stat
  Phone:    	(608) 262-6593, (608) 262-1204
  E-mail:       olvi@cs.wisc.edu

Co-Principal Investigator: Michael C. Ferris

  Office:       4381 Comp Sci & Stat
  Phone:    	(608) 262-4281
  E-mail:       ferris@cs.wisc.edu

Supported Graduate Students

  • Michael Thompson
  • Ted Wild
  • Geng Deng
  • Qian Li

    Project Summary


    Massive datasets occur in all types of settings ranging from the highly scientific to the ubiquitous internet. Making sense of this massive data requires sophisticated computer sciences techniques such as data classification, approximation and optimization. All of these techniques can be improved substantially by making effective use of prior knowledge that is often readily available. For example doctors' experience can be utilized in obtaining improved classifiers for various types of important problems, such as medical diagnosis and prognosis. Since the most powerful state-of the-art classifiers are based on support vector machines, which in turn are formulated as constrained or unconstrained optimization problems, it is our aim that prior knowledge be incorporated into various optimization-based applications such as classification and approximation problems as well into the theory of optimization itself. To a large degree, this proposal is motivated by the investigators' extensive collaborative work with oncologists, surgeons and medical physicists and the investigators' desire to make full use of the expertise of such practitioners by incorporating it into computable but rigorous models.

    The intellectual merit of the proposed work lies in the use of rigorous theory and problem analysis techniques that incorporate domain specific information into general optimization problems. The research will first incorporate knowledge into a linear or nonlinear support vector machine classifier and show that such incorporation is possible by appending additional constraints to the original problem. This does not seem to have been attempted before, and preliminary tests indicate improvements in classifier accuracy. Secondly, prior knowledge will be introduced into approximation problems. Thus, in addition to given discrete data that is normally used to generate an approximation to an unknown function, prior knowledge in the form of inequalities on polyhedral sets is also taken into account. Finally, prior knowledge will be incorporated into general constrained or unconstrained optimization problems, wherein the prior knowledge consists of new constraints to be imposed on the behavior of the objective function on various regions. The generality of these new techniques will facilitate the integration of information from disparate sources, since the theory allows multiple sets of prior information to be included concurrently. Specific application to radiotherapy treatment planning problems will ensure the computer science advancements are demonstrably useful in a particular problem domain.

    The work will have broader impacts in other areas of medical science, public health and health care delivery. The optimization, modeling, and computational techniques will provide a boost to advances in cancer diagnosis and prognosis, chemotherapy, and other treatment regimes. The knowledge-based approach encompasses a broad spectrum of important classification and approximation problems that have wide applicability in science and engineering. The work will also raise the profile of data mining techniques in other areas such as surgery, pharmacology, and medical research, by demonstrating how our methodologies can be utilized to incorporate prior knowledge into both planning and design issues, and improving both efficiency of delivery and effectiveness of treatment in many clinical settings. By coupling the education of several computer science and engineering students with the proposed work, a new group of multidisciplinary researchers will be trained that will ensure the technical advances are applied to further application domains.

    Publications Supported by Grant IIS-0511905

    O. L. Mangasarian amd M. C. Ferris
    Uniqueness of Integer Solution of Linear Equations
    PDF Version
    Data Mining Institute Technical Report 09-01, July 2009.
    M.C. Ferris, S.P. Dirkse, J.-H. Jagla, and A. Meeraus
    Extending modeling systems: Structure and solution
    X. Ban, H.X. Liu, M.C. Ferris, and B. Ran
    A link-node complementarity model and solution algorithm for dynamic user equilibria with exact flow propagations
    In Transportation Research Part B, 42(9):823--842, 2008.
    X. Ban, S. Lu, M. C. Ferris and H. Liu
    Risk-Averse Second Best Toll Pricing
    X. Ban, M.C. Ferris, and H.X. Liu
    Numerical studies on reformulation techniques for continuous network design with asymmetric user equilibrium
    International Journal of Operations Research and Information Systems, vol 1, forthcoming 2010
    X. Ban, M.C. Ferris, and L. Tang (Grad student)
    Risk neutral second best toll pricing
    M. C. Ferris, S. P. Dirkse, J.-H. Jagla and A. Meeraus
    An extended mathematical programming framework
    E. Bartholomew Fisher, K. Hedman, R. O'Neill , M. C. Ferris and S. S. Oren
    Optimal Transmission Switching in Electrical Networks for Improved Economic Operations
    Technical Report, Federal Energy Regulatory Commission
    O. L. Mangasarian
    Knapsack Feasibility as an Absolute Value Equation Solvable by Successive Linear Programming
    PDF Version
    Data Mining Institute Technical Report 08-03, September 2008. Optimization Letters, to appear. Online Version
    O. L. Mangasarian and E. W. Wild
    Privacy-Preserving Random Kernel Classification of Checkerboard Partitioned Data
    PDF Version
    Data Mining Institute Technical Report 08-02, September 2008.Annals of Information Systems to appear.
    O. L. Mangasarian
    A Generlaized Newton Method for Absolute Value Equations
    PDF Version
    Data Mining Institute Technical Report 08-01, May 2008. Optimization Letters 3(1), January 2009, 101-108.
    O. L. Mangasarian and E. W. Wild
    Privacy-Preserving Classification of Horizontally Partitioned Data via Random Kernels
    PDF Version
    Data Mining Institute Technical Report 07-03, November 2007. The 2008 4th International Conference on Data Mining - DMIN'08,July 14-17, Las Vegas, Nevada. Proceedings of the 2008 International Conference on Data Mining, DMIN08, Volume II, 473-479, R. Stahlbock, S.V. Crone and S. Lessman, Editors. 2008 Best Academic Research Paper Award DMIN'08.
    O. L. Mangasarian, E. W. Wild and G. M. Fung
    Privacy-Preserving Classification of Vertically Partitioned Data via Random Kernels
    PDF Version
    Data Mining Institute Technical Report 07-02, September 2007. ACM Transactions on Knowledge Discovery from Data (TKDD), to appear.
    O. L. Mangasarian and E. W. Wild
    Exactness Conditions for a Convex Differentiable Exterior Penalty for Linear Programming
    PDF Version
    Data Mining Institute Technical Report 07-01, July 2007. Optimization, to appear.
    O. L. Mangasarian and M. E. Thompson
    Chunking for Massive Nonlinear Kernel Classification
    PDF Version
    Data Mining Institute Technical Report 06-07, December 2006. Optimization Methods and Software 23, 2008, 365-274.
    O. L. Mangasarian and E. W. Wild
    Nonlinear Knowledge in Kernel Machines
    PDF Version
    Data Mining Institute Technical Report 06-06, November 2006. CRM Proceedings \& Lecture Notes, Volume 45, American Mathematical Society and Centre de Recherches Math\'{e}matiques at the Universit\'{e} de Montr\'{e}al, 2008, 181-198.
    O. L. Mangasarian, E. W. Wild and G. M. Fung
    Proximal Knowledge-Based Classification
    PDF Version
    Data Mining Institute Technical Report 06-05, November 2006. Statistical Analysis and Data Mining, to appear.
    O. L. Mangasarian & E. W. Wild
    Nonlinear Knowledge-Based Classification
    PDF Version
    Data Mining Institute Technical Report 06-04, August 2006.
    O. L. Mangasarian & E. W. Wild
    Feature Selection for Nonlinear Kernel Support Vector Machines
    PDF Version
    Data Mining Institute Technical Report 06-03, July 2006. IEEE Seventh International Conference on Data Mining (ICDM'07) October 28, 2007, Omaha, NE, Workshop Proceedings 231-236.
    O. L. Mangasarian
    Absolute Value Equation Solution via Concave Minimization
    PDF Version
    Data Mining Institute Technical Report 06-02, March 2006. Optimization Letters 1(1), 2007, 3-8.
    O. L. Mangasarian and M. E. Thompson
    Massive Data Classification via Unconstrained Support Vector Machines
    PDF Version
    Data Mining Institute Technical Report 06-01, March 2006. Journal of Optimization Theory and Applications 131(3), December 2006, 315-325.
    O. L. Mangasarian and R. R. Meyer
    Absolute Value Equations
    PDF Version
    Data Mining Institute Technical Report 05-06, December 2005. Linear Algebra and Its Applications 419 (2006) 359-367.
    O. L. Mangasarian and E. W. Wild
    Nonlinear Knowledge in Kernel Approximation
    PDF Version
    Data Mining Institute Technical Report 05-05, October 2005. Revised June 2006. IEEE Transactions on Neural Networks, to appear.
    O. L. Mangasarian
    Absolute Value Programming
    PDF Version
    Data Mining Institute Technical Report 05-04, September 2005. Computational Optimization and Applications 36(1), January 2007, 43-53.
    O. L. Mangasarian
    Exact 1-Norm Support Vector Machines via Unconstrained Convex Differentiable Minimization
    PDF Version
    Data Mining Institute Technical Report 05-03, August 2005. Revised January 2006. Journal of Machine Learning Research 7, 2006, 1517-1530.
    O. L. Mangasarian and E. W. Wild
    Multiple Instance Classification via Successive Linear Programming
    PDF Version
    Data Mining Institute Technical Report 05-02, May 2005. Journal of Optimization Theory and Applications 137(1), 2008, to appear.
    M. C. Ferris, P. F. Brennan, L. Tang, J. Marquard, S. M. Robinson, and S. J. Wright
    Creating operations research models to guide RHIO decision making
    In American Medical Informatics Association 2007 Symposium Proceedings, 2007.
    G. Deng and M. C. Ferris.
    Extension of the DIRECT optimization algorithm for noisy functions
    In B. Biller, S. Henderson, M. Hsieh, and J. Shortle, editors, Proceedings of the 2007 Winter Simulation Conference, 2007.
    P. Prakash, G. Deng, M. C. Converse, J. G. Webster, , D. M. Mahvi, and M. C. Ferris
    Design optimization of a robust sleeve antenna for hepatic microwave ablation
    Physics in Medicine and Biology, 53:1057-1069, 2008.
    G. Deng and M. C. Ferris
    Variable-number sample-path optimization
    Mathematical Programming, forthcoming, 2008
    J. Wallace, A. Philpott, M. O'Sullivan, and M. Ferris
    Optimal rig design using mathematical programming
    In 2nd High Performance Yacht Design Conference, Auckland, 14-16 February, 2006, pages 185-192, 2006.
    G. Deng and M. C. Ferris
    Adaptation of the UOBQYA algorithm for noisy functions
    In B. Lawson, J. Liu, F. Perrone, and F. Wieland, editors, Proceedings of the 2006 Winter Simulation Conference, pages 312-319, Orlando, Florida, 2006. Omnipress.
    G. Deng and M. C. Ferris
    Neuro-dynamic programming for fractionated radiotherapy planning
    In C.J.S. Alves, P.M. Pardalos, and L.N. Vicente, editors, Optimization in Medicine, International Center for Mathematics, Springer Optimization and Its Applications, pages 49-74. Springer-Verlag, 2007.
    X. Ban, H. X. Liu, M. C. Ferris, and B. Ran.
    A general MPCC model and its solution algorithm for continuous network design problem.
    Mathematical And Computer Modelling, 43:493-505, 2006.
    M. C. Ferris and G. Deng.
    Classification-based global search: An application to a simulation for breast cancer
    In Proceedings of the NSF CMMI Engineering Research and Innovation Conference, 2008.
    M. C. Ferris, A. Sundaramoorthy, and C. T. Maravelias.
    Simultaneous batching and scheduling using dynamic decomposition on a grid
    Technical report, Computer Sciences Department, University of Wisconsin, 2007.
    M. R. Bussieck, M. C. Ferris, and A. Meeraus.
    Grid enabled optimization with GAMS
    Technical report, Computer Sciences Department, University of Wisconsin, 2007.
    E. Bartholomew Fisher, R. O'Neill, and M. C. Ferris.
    Optimal transmission switching
    IEEE Transactions on Power Systems, forthcoming, 2008.
    M. C. Ferris, C. T. Maravelias, and A. Sundaramoorthy.
    Using grid computing to solve hard planning and scheduling problems
    In Proceedings of 18th European Symposium on Computer-Aided Process Engineering, Lyon, France, June 1-4 2008.

    Viewable Math Programming Papers and Reports

    Viewable Data Mining Papers and Reports

    Mathematical Programming at UW

    Data Mining at UW


    This page is updated periodically. Last updated on August 1, 2008.