# COSC345 Assignment 4 — Gen S

## Student number:

I have a text collection with over 4GB of text. I want to put it through a program that computes several readability metrics. Unfortunately, the author of that program thought that linear search was really wonderful, so the program spends nearly all its time chugging down long lists of words saying “are we there yet?”

A ternary tree is a search three that splits three ways. For example, given {`"ANCHOR"`, `"DEED"`, `"DOG"`, `"GAME"`}, we'd make a node that split on D: `'D'`:({`"ANCHOR"`}, {`"EED"`, `"OG"`}, `"GAME"`}. Looking for a word of k letters in a ternary search tree is O(k.w) where w is roughly log(alphabet size).

A ternary search tree can be expressed as executable code:

```    c = s[0];
if (c < 'D') {
// code for {ANCHOR} with c already loaded
} else
if (c > 'D') {
// code for {GAME} with c already loaded
} else {
s++;
// code for {EED,OG}
}
```

This is nice and fast, but you would have to be mad to want to write code like this by hand. In the good old-fashioned spirit of “I'd rather write programs to help me write programs to help me write programs than write programs to help me write programs”, I wrote a program to turn a list of words into C source code for a function, just like this.

• Is this the first, second, or third program you looked at? (Put a circle around one.)
• Write the time:

• Read the Gen listing until you feel that you have a rough idea of what is going on.
• Write the time:

• Take a break if you like.
• Write the time:

• What sorting algorithm does this program use?

• Why does it sort the input?

• Is this program certain to terminate in a finite time? If so, can you give a simple bound in terms of the sizes of the inputs. If not, can you explain why?

• Write the time:

• Take a break if you want one.
• Write the time:

• The real program is written in AWK. It has been tested, and works. Its output has been tested, and works. But I made two deliberate mistakes in the Chatterton version. What are they?

• Write the time: