In the kernel clustering problem we are given a large n × n positive-semidefinite matrix A = (aij) with
and a small k × k positive-semidefinite matrix B = (bij). The goal is to find a partition S1, …, Sk of {1, … n} which maximizes the quantity
![](//static.cambridge.org/content/id/urn%3Acambridge.org%3Aid%3Aarticle%3AS002557930000098X/resource/name/S002557930000098X_eqnU1.gif?pub-status=live)
We study the computational complexity of this generic clustering problem which originates in the theory of machine learning. We design a constant factor polynomial time approximation algorithm for this problem, answering a question posed by Song et al. In some cases we manage to compute the sharp approximation threshold for this problem assuming the unique games conjecture (UGC). In particular, when B is the 3 × 3 identity matrix the UGC hardness threshold of this problem is exactly 16π/27. We present and study a geometric conjecture of independent interest which we show would imply that the UGC threshold when B is the k × k identity matrix is (8π/9)(1 – 1/k) for every k ≥ 3.