Microsoft grant establishes UW Data Mining Institute
The almost infinite capacity of computers to collect and store information poses a practical dilemma: How does one find the gems in this mountain of raw data?
Research on that question at the University of Wisconsin-Madison received a boost this month from Microsoft Corp. The research division of the company based in Redmond, Wash., awarded the computer sciences department a four-year grant, valued at approximately $720,000, to establish a Data Mining Institute to study the hidden potential of huge databases.
"This grant is part of our overall commitment to collaborating with major academic institutions and fostering the growth of important new cross-disciplinary areas like data mining," said Jim Gray, senior researcher for scalable servers at Microsoft Research. "UW-Madison is one of the top academic database research groups in the world, and we're delighted to be working with this premiere group of scientists."
Raghu Ramakrishnan, a UW-Madison computer sciences professor and co-director of the institute, defines data mining as finding things in huge data sets that hadn't been identified before. "As opposed to a conventional search, data mining looks for trends and patterns," he said.
Numerous applications exist today, from medical prognosis to credit-card security, but the field is only beginning, he said. The institute will explore fundamental new methods of advancing the field, and also work on specific applications in science, medicine and industry.
Microsoft is recognized as an industry leader in data mining research and applications, and the grant will build on UW-Madison's national prominence in database research and mathematical programming. Ramakrishnan said the university is especially equipped to work on real-world applications with enormous databases.
"That's one of our strengths," he said. "For us, 10,000 data points is nothing. You talk a million points, we start getting mildly interested."
Applications on this scale already exist. Companies specializing in credit card fraud reduction have programs that can analyze millions of daily credit transactions. The search programs are trained to flag peculiarities that might suggest theft, such as changes in location or types of purchase.
The World Wide Web will be the catalyst for much broader data mining applications, he said, since it provides the ultimate publicly accessible database. Mining the Web is different from conventional keyword searching, since the programs are designed to find patterns or trends across different subjects.
One exciting example at UW-Madison is a breast cancer diagnosis and prognosis tool developed by UW-Madison computer sciences professor Olvi Mangasarian and Medical School colleagues. The data mining program analyzes tumor size and fine-needle aspirate samples to estimate cancer-free periods for patients.
The program recently mined a National Cancer Institute database of more than 40,000 breast cancer patients, helping the program achieve more reliable results. The goal is to provide a non-invasive option for prognosis. Patients currently have to undergo a painful removal of lymph nodes under their arm to receive an accurate prognosis, said Mangasarian, co-director of the institute.
Other members of the institute research team are computer sciences professors Michael Ferris and Jeffrey Naughton. The group is exploring seven research areas, including the development of parallel computing techniques for data mining, finding ways to exploit data that is dynamic and evolving, and increasing compatibility between different technologies.
Microsoft has donated two powerful multiprocessors and other data mining hardware to the group. The grant will help support a team of graduate students and will help the department attract a new faculty member with expertise in databases and mathematical programming.
"Data mining has received a great deal of attention among large corporations and industrial research labs for years," said Usama Fayyad, senior researcher in data mining and exploration for Microsoft Research. "I'm excited to see one of the top universities in database systems form a Data Mining Institute bridging several disciplines. I hope to see many more computer science departments establish formal academic programs in data mining."
Ramakrishnan said the computer sciences department and Microsoft have had a fruitful collaboration for years. Microsoft is one of the department's formal corporate affiliates and employs many of its graduates.