COSC 348 assignment 2, worth 14 marks, due 25 Sept 2015

Inference of a phylogenetic tree by hierarchical clustering

The assignment is to derive a phylogenetic tree for the Caminalcules, using the vectors of Caminalcule data, where each row is one OTU and each column corresponds to a particular character.
Note: work with the first 29 rows and ignore the row No. 30 as it corresponds to an outgroup that would be used in turning the unrooted tree into a rooted one.
The characters are described here.

Apply an agglomerative method of bottom-up clustering, which builds the hierarchy from the individual elements by progressively merging them into clusters. This agglomerative method is described here in the notes of Dr. Richard O'Keefe and in wikipedia. You can develop your own program using whichever scripting language you want.

Here is a data class in Java, which you might or might not find useful:

You have to use some metric to hierarchically organise the clusters into a tree using the so-called linkage criteria.

Which metrics and linkage criteria to use?

Don't forget to print your trees out in Newick's format. To convert the Newick format into a tree, you can use tree2dot.c. This program requires 'dot' from GraphViz. Or you can make your own implementation or you can draw the tree by hand.

Submit your codes, their compiled executable versions, and your written report as well. In your report, describe the details of your implementation, choice of metric and linkage criteria and attach the drawing or print of the resulting phylogenetic tree.

Some other stuff for drawing phylogenies: