Revision

Definition

See CS229 course on K-means.

Algorithm

Based on EM algorithm.

K-means++

K-means is really sensitive to initialisation. Bad initialisation can lead to poor results. Initialisation in K-means is done randomly.

K-means++ just modified a bit the initialisation. It works as follows:

Now that the initial centers have been chosen, proceed using standard k-means clustering.

K-means++ part was extract from the Wikipedia page on K-means++.

Choosing K

Pros and Cons

Pros

Cons

Resources

See CS229 course on K-means.