Privacy Preserving Clustering
The freedom and transparency of information flow on
the Internet has heightened concerns of privacy. Given a set of data
items, clustering algorithms group similar items
together. Clustering has many applications, such as customer-behavior
analysis, targeted marketing, forensics, and bioinformatics. In this
paper, we present the design and analysis of a privacy-preserving
$k$-means clustering algorithm, where only the cluster means at the
various steps of the algorithm are revealed to the participating
parties. The crucial step in our privacy-preserving $k$-means is
privacy-preserving computation of cluster means. We present two
protocols (one based on oblivious polynomial evaluation and the second
based on homomorphic encryption) for privacy-preserving computation of
cluster means. We have a JAVA implementation of our algorithm. Using
our implementation, we have performed a thorough evaluation of our
privacy-preserving clustering algorithm on three data sets. Our evaluation
demonstrates that privacy-preserving clustering is feasible, i.e.,
our homomorphic-encryption based algorithm finished clustering a
large data set in approximately $66$ seconds.
Download:[PS,PDF]
Somesh Jha
Last modified: Thu Mar 23 14:06:34 CST 2006