Approximate
Clustering without the Approximation. With Avrim Blum and
Anupam Gupta. SODA 2009. [view
abstract]
Approximation
algorithms
for clustering points in metric spaces is a flourishing area
of research, with much research effort spent on getting a
better understanding of the approximation guarantees possible
for many objective functions such as k-median, k-means, and
min-sum clustering. This quest for better approximation
algorithms is further fueled by the implicit hope that these
better approximations also give us better clusterings. E.g.,
for many problems such as clustering proteins by function, or
clustering images by subject, there is some unknown “correct”
target clustering—one that clusters proteins by function, or
that clusters images by content—and there is an implicit hope
is that approximately optimizing these objective functions
will in fact produce a clustering that is close (in symmetric
difference) to the truth.
In this paper, we show that if we make this implicit
assumption explicit—that is, if we assume that any
c-approximation to the given clustering objective F is
ε^O-close to the target—then we can produce clusterings that
are O(ε)-close to the target, even for values c for which
obtaining a c-approximation is NP-hard. In particular, for
k-median and k-means objectives, we show that we can achieve
this guarantee for any constant c > 1, and for min-sum
objective we can do this for any constant c > 2. Our
results also highlight a somewhat surprising conceptual
difference between assuming that the optimal solution to, say,
the k-median objective is ε-close to the target, and assuming
that any approximately optimal solution is ε-close to the
target, even for approximation factor say c = 1.01. In the
former case, the problem of finding a solution that is
O(ε)close to the target remains computationally hard, and yet
for the latter we have an efficient algorithm.
An extended
version to appear in Journal of the ACM.