Machine Learning for Cancer Diagnosis and Prognosis

This page describes various linear-programming-based machine learning approaches which have been applied to the diagnosis and prognosis of breast cancer. This work is the result of a collaboration at the University of Wisconsin-Madison between Prof. Olvi L. Mangasarian of the Computer Sciences Department and Dr. William H. Wolberg of the departments of Surgery and Human Oncology.


  • Breast Cancer Datasets

    Table of Contents


    Diagnosis

    This work grew out of the desire by Dr. Wolberg to accurately diagnose breast masses based solely on a Fine Needle Aspiration (FNA). He identified nine visually assessed characteristics of an FNA sample which he considered relevant to diagnosis. In collaboration with Prof. Mangasarian and two of his graduate students, Rudy Setiono and Kristin Bennett, a classifier was constructed using the multisurface method (MSM) of pattern separation on these nine features that successfully diagnosed 97% of new cases. The resulting data set is well-known as the Wisconsin Breast Cancer Data.

    The image analysis work began in 1990 with the addition of Nick Street to the research team. The goal was to diagnose the sample based on a digital image of a small section of the FNA slide. The results of this research have been consolidated into a software system known as Xcyt, which is currently used by Dr. Wolberg in his clinical practice. The diagnosis process is now performed as follows:

    To date, this system has correctly diagnosed 176 consecutive new patients (119 benign, 57 malignant). In only eight of those cases did Xcyt return a "suspicious" diagnosis (that is, an estimated probability of malignancy between 0.3 and 0.7).

    A small subset of the source images used in this research can be found in images. These are very good test cases for image segmentation or object recognition algorithms. If your pet segmentation algorithm can automatically identify all of the nuclei in these images, please email me (street@cs.wisc.edu) and let's work together.


    Prognosis

    The second problem considered in this research is that of prognosis, the prediction of the long-term behavior of the disease. We have approached prognosis as a function-approximation problem, using input features -- including those computed by Xcyt -- to predict a time of recurrence in malignant patients, using right-censored data. Our solution is termed the Recurrence Surface Approximation method (RSA), and utilizes a linear program to construct a surface which predicts time of recurrence for new patients. By examining the actual recurrence of those training cases with similar predicted recurrence times, we can plot the probability of disease-free survival for various times (out to 10 years) for an individual patient. This capability has been incorporated into Xcyt and an example is shown here. These survival curves plot the probability of disease-free survival versus time (in years). The black disease-free survival curve represents all patients in our original study; the red curve represents the probability of disease-free survival for the sample case. This particular case therefore has an above-average prognosis, with a probability of being disease-free after 10 years equal to about 80%.

    The RSA procedure can also be used to compare the predictive power of various prognostic factors. Our results indicate that precise, detailed cytological information of the type provided by Xcyt gives better prognostic accuracy than the traditional factors Tumor Size and Lymph Node Status. If corroborated by other researchers, this result could remove the need for the often painful axillary lymph node surgery.


    Chronological Bibliography

    Linked papers are provided in postscript format; if you don't have a postscript viewer, you can download the file (e.g., shift-click in Netscape) and print it. Abstracts are ASCII text. To obtain papers which are not linked, please contact the first author.
    O.L. Mangasarian, R. Setiono and W.H. Wolberg.
    Pattern Recognition via Linear Programming: Theory and Application to Medical Diagnosis. In Proceedings of the Workshop on Large-Scale Numerical Optimization, 1989, pages 22-31, Philadelphia, PA. SIAM.
    O.L. Mangasarian and W. H. Wolberg.
    Cancer Diagnosis via Linear Programming. SIAM News, Vol. 23, 1990, pages 1 & 18.
    W.H. Wolberg and O.L. Mangasarian.
    Multisurface Method of Pattern Separation for Medical Diagnosis Applied to Breast Cytology. Proceedings of the National Academy of Sciences, U.S.A., Vol. 87, 1990, pages 9193-9196.
    W.N. Street.
    Toward Automated Cancer Diagnosis: An Interactive System for Cell Feature Extraction. Technical Report 1052, Computer Sciences Department, University of Wisconsin, October 1991.
    W.H. Wolberg, K.P. Bennett and O.L. Mangasarian.
    Brast Cancer Diagnosis and Prognostic Determination from Cell Analysis. Manuscript, 1992, Departments of Surgery and Human Oncology and Computer Sciences, University of Wisconsin, Madison, WI 53706.
    W.H. Wolberg, W.N. Street, and O.L. Mangasarian.
    Breast Cytology Diagnosis via Digital Image Analysis. Analytical and Quantitative Cytology and Histology, Vol. 15 No. 6, pages 396-404, December 1993.
    W.N. Street, W.H. Wolberg and O.L. Mangasarian.
    Nuclear Feature Extraction For Breast Tumor Diagnosis. In IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science and Technology, volume 1905, pages 861-870, San Jose, CA, 1993.
    W.H. Wolberg, W.N. Street, and O.L. Mangasarian.
    Machine learning techniques to diagnose breast cancer from fine-needle aspirates. Cancer Letters Vol. 77, pages 163-171, 1994.
    W. N. Street
    Cancer Diagnosis and Prognosis via Linear-Programming-Based Machine Learning. Ph.D. Dissertation, University of Wisconsin-Madison, August 1994. Available as UW Mathematical Programming Technical Report 94-14. (abstract)
    W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian.
    Computerized breast cancer diagnosis and prognosis from fine needle aspirates. Archives of Surgery 1995; 130:511-516.
    W.H. Wolberg, W.N. Street, and O.L. Mangasarian.
    Image analysis and machine learning applied to breast cancer diagnosis and prognosis. Analytical and Quantitative Cytology and Histology, Vol. 17 No. 2, pages 77-87, April 1995.
    W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian.
    Computer-derived Nuclear Features Distinguish Malignant from Benign Breast Cytology. Human Pathology, Vol. 26, pages 792-796, 1995.
    W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian.
    Computer-derived Nuclear ``Grade'' and Breast Cancer Prognosis. Analytical and Quantitative Cytology and Histology, Vol. 17 No. 4, pages 257-264, August 1995.
    O.L. Mangasarian, W.N. Street and W.H. Wolberg.
    Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4), pages 570-577, July-August 1995. Available as UW Mathematical Programming Technical Report 94-10. (abstract)
    W. N. Street, O. L. Mangasarian, and W.H. Wolberg.
    An inductive learning approach to prognostic prediction. Proceedings of the Twelfth International Conference on Machine Learning, A. Prieditis and S. Russell, eds., pages 522-530, Morgan Kaufmann, 1995.
    M. W. Teague, W. H. Wolberg, W. N. Street, O. L. Mangasarian, S. C. Call and D. L. Page.
    Indeterminate Fine Needle Aspiration of the Breast: Image Analysis Aided Diagnosis. Cancer Cytopathology 81(2), 1997, 129-135.,
    W. N. Street, O. L. Mangasarian, and W. H. Wolberg.
    Individual and collective prognostic prediction.
    Technical Report 96-01, Computer Sciences Department, University of Wisconsin, Madison, WI, January 1996. Submitted to ICML and AAAI conferences. (abstract)

    Citation in the Medical and Popular Press


    Local Related Links


    Other Related Links


    paulb@cs.wisc.edu