This page describes various linear-programming-based machine learning
approaches which have been applied to the diagnosis and prognosis of
breast cancer. This work is the result of a collaboration at the
University of Wisconsin-Madison between
Prof. Olvi L. Mangasarian
of the Computer Sciences Department and
Dr. William H. Wolberg
of the departments of Surgery and Human Oncology.
Breast Cancer Datasets
Table of Contents
Diagnosis
This work grew out of the desire by Dr. Wolberg to accurately diagnose
breast masses based solely on a Fine Needle Aspiration (FNA). He
identified nine visually assessed characteristics of an FNA sample which
he considered
relevant to diagnosis. In collaboration with Prof. Mangasarian and
two of his graduate students, Rudy Setiono and
Kristin Bennett, a
classifier was constructed using the multisurface method (MSM) of pattern
separation on these nine features that
successfully diagnosed 97% of new cases. The resulting data set is
well-known as the
Wisconsin Breast Cancer Data.
The image analysis work began in 1990 with the addition of
Nick Street
to the research team. The goal was to diagnose the sample based on a
digital image of a small section of the FNA slide. The results of
this research have been consolidated into a software system known as
Xcyt, which is currently used by Dr. Wolberg in his clinical
practice. The diagnosis process is now performed as follows:
- An FNA is taken from the breast mass. This material is then
mounted on a microscope slide and stained to highlight the cellular
nuclei. A portion of the slide in which the cells are
well-differentiated is then scanned using a digital camera and a
frame-grabber board.
- The user then isolates the individual nuclei using Xcyt.
Using a mouse pointer, the user draws the approximate boundary of
each nucleus. Using a computer vision approach known as "snakes",
these approximations then converge to the exact nuclear boundaries.
This interactive process takes between two and five minutes per slide.
Here is an image showing
Xcyt in use.
- Once all (or most) of the nuclei have been isolated in this
fasion, the program computes values for each of ten characteristics of
each nuclei, measuring size, shape and texture. The mean, standard
error and extreme values of these features are computed, resulting in
a total of 30 nuclear features for each sample.
- Based on a training set of 569 cases, a linear classifier was
constructed to differentiate benign from malignant samples. This
classifier consists of a single separating plane in the space of three
of the features: Extreme Value of Area, Extreme Value of Smoothness,
and Mean Value of Texture. By projecting all the cases onto the
normal of this separating plane, approximate
probability densities of
the benign and
malignant points were constructed. These allow a simple Bayesian
computation of probability of malignancy for new patients. These
densities are shown to the patient, allowing her to judge the
"confidence" of her diagnosis by comparison to hundreds of previous samples.
To date, this system has correctly diagnosed 176 consecutive new
patients (119 benign, 57 malignant). In only eight of those cases did
Xcyt return a "suspicious" diagnosis (that is, an estimated
probability of malignancy between 0.3 and 0.7).
A small subset of the source images used in this research can be found in
images. These are very good
test cases for
image segmentation or object recognition algorithms. If your pet
segmentation algorithm can automatically identify all of the nuclei in
these images, please email me (street@cs.wisc.edu) and let's work together.
Prognosis
The second problem considered in this research is that of prognosis,
the prediction of the long-term behavior of the disease. We have
approached prognosis as a function-approximation problem, using input
features -- including those computed by Xcyt
-- to predict a
time of recurrence in malignant patients, using right-censored data.
Our solution is termed
the Recurrence Surface Approximation method (RSA), and utilizes a linear
program to construct a surface which predicts time of recurrence for
new patients. By examining the actual recurrence of those training cases
with similar predicted recurrence times, we can plot the probability of
disease-free survival for various times (out to 10 years) for an
individual patient. This capability has been incorporated into
Xcyt and an example is shown
here.
These survival curves plot the probability of disease-free survival versus
time (in years).
The black disease-free survival curve represents all patients in our
original study; the red curve represents the probability of
disease-free survival for the sample case. This particular case therefore
has an above-average prognosis, with a probability of being disease-free
after 10 years equal to about 80%.
The RSA procedure can also be used to compare the predictive power of
various prognostic factors. Our results indicate that precise,
detailed cytological information of the type provided by Xcyt
gives better prognostic accuracy than the traditional factors Tumor
Size and Lymph Node Status. If corroborated by other researchers,
this result could remove the need for the often painful axillary lymph
node surgery.
Chronological Bibliography
Linked papers are provided in postscript format; if you don't have a
postscript viewer, you can download the file (e.g., shift-click in Netscape)
and print it. Abstracts are ASCII text. To obtain papers which are not
linked, please contact the first author.
- O.L. Mangasarian, R. Setiono and W.H. Wolberg.
- Pattern Recognition via Linear Programming: Theory and
Application to Medical Diagnosis.
In
Proceedings of the Workshop on Large-Scale Numerical
Optimization,
1989, pages 22-31, Philadelphia, PA. SIAM.
- O.L. Mangasarian and W. H. Wolberg.
- Cancer Diagnosis via Linear Programming. SIAM News,
Vol. 23, 1990, pages 1 & 18.
- W.H. Wolberg and O.L. Mangasarian.
- Multisurface Method of Pattern Separation for Medical
Diagnosis Applied to Breast Cytology.
Proceedings of the National Academy of Sciences, U.S.A.,
Vol. 87, 1990, pages 9193-9196.
- W.N. Street.
- Toward Automated Cancer Diagnosis: An Interactive
System for Cell Feature Extraction.
Technical Report 1052, Computer Sciences Department,
University of Wisconsin, October 1991.
- W.H. Wolberg, K.P. Bennett and O.L. Mangasarian.
- Brast Cancer Diagnosis and Prognostic Determination
from Cell Analysis.
Manuscript, 1992,
Departments of Surgery and Human Oncology and
Computer Sciences, University of Wisconsin, Madison, WI 53706.
- W.H. Wolberg, W.N. Street, and O.L. Mangasarian.
- Breast Cytology
Diagnosis via Digital Image Analysis.
Analytical and Quantitative Cytology and Histology,
Vol. 15 No. 6, pages 396-404, December 1993.
- W.N. Street, W.H. Wolberg and O.L. Mangasarian.
-
Nuclear Feature Extraction For Breast Tumor Diagnosis.
In
IS&T/SPIE 1993 International Symposium on Electronic Imaging:
Science and Technology,
volume 1905, pages 861-870, San Jose, CA, 1993.
- W.H. Wolberg, W.N. Street, and O.L. Mangasarian.
- Machine learning
techniques to diagnose breast cancer from fine-needle aspirates.
Cancer Letters
Vol. 77, pages 163-171, 1994.
- W. N. Street
-
Cancer Diagnosis and Prognosis via
Linear-Programming-Based Machine Learning.
Ph.D. Dissertation, University of Wisconsin-Madison, August
1994.
Available as UW Mathematical Programming Technical Report 94-14.
(abstract)
- W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian.
-
Computerized breast cancer diagnosis and prognosis from fine needle
aspirates.
Archives of Surgery 1995; 130:511-516.
- W.H. Wolberg, W.N. Street, and O.L. Mangasarian.
-
Image analysis and machine learning applied to breast cancer
diagnosis and prognosis.
Analytical and Quantitative Cytology and Histology,
Vol. 17 No. 2, pages 77-87, April 1995.
- W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian.
-
Computer-derived Nuclear Features Distinguish Malignant from Benign
Breast Cytology.
Human Pathology,
Vol. 26, pages 792-796, 1995.
- W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian.
-
Computer-derived Nuclear ``Grade'' and Breast Cancer Prognosis.
Analytical and Quantitative Cytology and Histology,
Vol. 17 No. 4, pages 257-264, August 1995.
- O.L. Mangasarian, W.N. Street and W.H. Wolberg.
-
Breast cancer diagnosis and prognosis via linear programming.
Operations Research,
43(4), pages 570-577, July-August 1995.
Available as UW Mathematical Programming Technical Report 94-10.
(abstract)
- W. N. Street, O. L. Mangasarian, and W.H. Wolberg.
-
An inductive learning approach to prognostic prediction.
Proceedings of the Twelfth International Conference on
Machine Learning,
A. Prieditis and S. Russell, eds., pages 522-530,
Morgan Kaufmann, 1995.
- M. W. Teague, W. H. Wolberg, W. N. Street, O. L. Mangasarian,
S. C. Call and D. L. Page.
- Indeterminate Fine Needle Aspiration of the Breast:
Image Analysis Aided Diagnosis.
Cancer Cytopathology 81(2), 1997, 129-135.,
- W. N. Street, O. L. Mangasarian, and W. H. Wolberg.
-
Individual and collective prognostic prediction.
Technical Report 96-01, Computer Sciences Department, University of
Wisconsin, Madison, WI, January 1996. Submitted to ICML and AAAI conferences.
(abstract)
Citation in the Medical and Popular Press
- News from Medicine segment,
CNN Prime News, March 12, 1994.
- Breast Biopsy Without Surgery.
Tim Friend,
USA Today,
March 24, 1994.
- Cancer Detection Imitates Oil Prospecting.
Joe Manning,
Milwaukee Sentinel,
March 24, 1994.
- Analyzing Breast Cancer.
Detroit News,
March 28, 1994.
- A High-tech Cancer Hunt.
Marilynn Marchione,
Milwaukee Journal,
March 28, 1994.
- Computerized Interpretation of Breast FNA Biopsies: Progress Reported,
Oncology Times,
April 1994.
- Computer Program Hunts Breast Cancer,
Ruth SoRelle,
Houston Chronicle,
April 22, 1994.
- Computer Program May Improve Interpretation of Aspirate,
Oncology News International,
May 1994.
- New Data Suggest Needle Biopsies Could Replace Surgical Biopsy
for Diagnosing Breast Cancer.
Journal of the American Medical Association,
Medical News & Perspectives column, June 9, 1994, Vol. 271, No. 22.
- Diagnosis Via Image Analysis and Machine Learning,
Cope,
September/October 1994.
- Computer Seeks Out Breast Cancer,
Madison Capital Times,
January 17, 1995.
- Computer-Aided Cancer Prediction,
Los Angeles Times,
January 25, 1995.