# COSC345 Assignment 4 — k-Means P

## Your name:

## Student number:

Suppose we have measured a collection of “cases”
that we would like to divide into `k` groups, so that
each group is pretty much the same, and the groups are different.
Also suppose that we have measured some numeric “features”
for each of thes cases, and that we define two cases to be
“similar” if the Euclidean distance between their
feature vectors is small.

The `k`-means algorithm is a famous algorithm for
doing this. You have to find your cases and measure them, and
choose `k`, but the algorithm takes it from there.

An outer iteration goes

- choose
`k` cases at random to be cluster centres
- improve these centres

and we repeat that several times and take the best result.

An inner iteration goes

- associate each case with the cluster centre it is closest to
- replace each cluster centre by the mean of the feature vectors
now associated with it.

and we repeat that at most some number of times or until it
stops improving.

- Is this the first, second, or third program you looked at?
(Put a circle around one.)
- Write the time:

- Read the k-Means listing until you feel that you have a rough
idea of what is going on.
- Write the time:

- Take a break if you like.

- Write the time:

- What is the input to this program?

- The random assignment procedure uses a well known algorithm
to choose a random subset of cases. Give a big-oh formula for
the cost of this step as a function of the integer inputs to the
program.

- Is the whole program certain to terminate in a finite time?
If so, give a simple bound in terms of the integer inputs.
If not, explain why not.

- Write the time:

- Take a break if you want one.

- Write the time:

- The program contains a deliberate mistake. What is it?

- Write the time: