COSC 348: Lab07

Overview: Phylogeny inference lab 1/3

This is practice building parts that you will need for the assignment.

IMPORTANT: Submit the code you've developed for this lab (worth 1% of your final mark).


Binary Tree classes

In your preferred programming language, create classes suitable for phylogenetic trees. In these trees, the leaves represent observed taxonomic units (OTUs). There will always be at least one leaf. Let's say there are n leaves. The internal nodes represent hypothetical taxonomic units (HTUs). There will always be exactly n-1 of them. Each internal node has exactly two children, although the order of the children does not matter.

You will need:

  1. An abstract TU_Tree class.
    You will find that each node except the very top one will need a parent.
    If n=1 the top node will be an OTU, otherwise it will be an HTU.
    If a node has a parent, the parent will be an HTU.
  2. An HTU_Node class,
    having two children, each of which might be either an OTU_Node or an HTU_Node.
  3. An OTU_Node class,
    which needs something to record exactly which of many observed taxonomic units it represents. That might as well be some kind of integer.

You will need some way to print a tree.
The Newick format might as well be the definition of the usual Java toString() method.


Definition (Newick format)

We can accurately represent the structure of a phylogeny by means of a string in the Newick format:

  1. a leaf labelled XXX is represented by the string XXX
  2. an internal node with children A and B is represented by the string
    "(" ++ rep A ++ "," ++ rep B ++ ")".

The order of the children of a node is not significant, so there are many strings that represent the same tree. One way to pick a unique string is to order the children A B of a node so that the child which has the smallest left should go first. Having unique representations makes it easier to see if we all find the same tree.

The Newick string we get for our example is

(((((((1,17),24),(16,27)),((((2,(3,22)),12),4),((5,18),23))),
  (((19,26),29),20)),(((((7,15),13),8),(14,(25,28))),30)),
  (((6,10),(11,21)),9))


You will want to be able to create trees, maintaining the invariant that:

if p has a left or right child c   then p is c's parent.

The full thing

We need more than just a tree. We need to be able to choose a node of a tree at random. So we want something that contains both a TU_Tree and an array (or perhaps an ArrayList) of references to the nodes of the tree. So we need

A Phylogeny class, having a TU_Tree member and an array or ArrayList member.

The operations we want on these objects are

Use these classes in a simple program

Use these classes in a program that generates random trees with 10 OTUs and prints them in a Newick format.


Cosc348 home
Cosc348 labs