COSC345 Assignment 4 — Gen U

Your name:

Student number:

I have a text collection with over 4GB of text. I want to put it through a program that computes several readability metrics. Unfortunately, the author of that program thought that linear search was really wonderful, so the program spends nearly all its time chugging down long lists of words saying “are we there yet?”

A ternary tree is a search three that splits three ways. For example, given {"ANCHOR", "DEED", "DOG", "GAME"}, we'd make a node that split on D: 'D':({"ANCHOR"}, {"EED", "OG"}, "GAME"}. Looking for a word of k letters in a ternary search tree is O(k.w) where w is roughly log(alphabet size).

A ternary search tree can be expressed as executable code:

    c = s[0];
    if (c < 'D') {
	// code for {ANCHOR} with c already loaded
    } else
    if (c > 'D') {
	// code for {GAME} with c already loaded
    } else {
	s++;
	// code for {EED,OG}
    }

This is nice and fast, but you would have to be mad to want to write code like this by hand. In the good old-fashioned spirit of “I'd rather write programs to help me write programs to help me write programs than write programs to help me write programs”, I wrote a program to turn a list of words into C source code for a function, just like this.