Collatz


Preface

In the following sections I am going to develop some Prolog code designed to collect some data regarding properties of a certain function on the natural numbers called the Collatz function. The development won't be as disorganized as in a number of the other examples, since I had already designed this code in Haskell for COSC 459, and just lifted most of the ideas across. The main logic programming technique it is meant to illustrate is the creation and use of data structures (in this case a key-value pair binary search tree).
 

Introduction


The Collatz function is a very simply described function from the natural numbers to the natural numbers:

collatz(2k) = k
collatz(2k+1) = 3(2k+1) + 1 = 6k+4,

that is, it halves even numbers, and multiplies odd numbers by three then adds one. Here is how we define the collatz function in Prolog:
collatz(N, CN) :-
    0 is N mod 2,
    CN is (N/2).

collatz(N, CN) :-
    1 is N mod 2,
    CN is (3*N + 1).

The use of is in the modulo conditions is essential. If we wrote:

1 = N mod 2

instead, we would be asking "can (the atom) 1 be unified with the expression mod(N,2)?" and the answer would always be No. In the given context we are asking "can 1 be unified with the result of evaluating N mod 2?" and the answer will be Yes or No, depending on whether N is even or odd.

What makes the collatz function interesting is its behaviour when iterated. In the following examples, each successive number is obtained from the previous one by "collatzing":

1 4 2 1
3 10 5 16 8 4 2 1
6 3 ...
7 22 11 34 17 52 26 13 40 20 10 ...

We see that there is a "basic cycle" (1 4 2 1), and that, so far at least, if we iteratively collatz any natural number we eventually reach that cycle, though, as can be seen in the case of 7 this may take some while.

Problem: Is it true that for any natural number n, sufficiently many iterations of collatz beginning from n will eventually lead to 1?

The answer to this problem is unknown, despite the efforts of a legion of amateur, and a regiment of professional mathematicians and computer scientists in trying to resolve it. I won't provide references, but Google can provide you with a multitude of them if you are interested.

In the context of this problem the question of how many iterations are required is also of interest. That is what we are going to attack. At the end of the day we would like to specify a natural number n, and then learn how many steps are needed to get from any number k to 1 by iterations of collatz, for every k less than or equal to n. Obviously if we just wanted the answer for a single value of k the simplest thing would be just to compute the iterations of collatz starting from k until we reached 1. Though we won't actually be using it, here's how that could be done in Prolog:

collatzSeq(1, [1]) :- !.

collatzSeq(N, [N | L]) :-
    collatz(N, CN),
    collatzSeq(CN, L).

And we could use it to find a sequence and its length:

?- collatzSeq(7, L), length(L, N).

L = [7, 22, 11, 34, 17, 52, 26, 13, 40|...]
N = 17 

Yes

(Note the short form output for L, this is a default behaviour). 

All the values

If we are interested in all the values of the lengths of the collatz sequence for numbers up to n, then computing them individually is silly. This is because once we know the path length from some number (say 7) to 1, then we also know it for any number that reaches 7 after a few steps. For example we have the sequence:

9 28 14 7

and so the path length for 9 is 20, three more than that for 7. 

So our plan should be to store values for the path length as we compute them (including those for intermediate steps), and when trying to compute a new value, first look to see whether we have computed it already, and if not, then compute its collatz sequence only until we reach an already computed value, then update our storage.

There are various sophisticated ways that we could manage our storage, look up, and updates, but we will choose a fairly simple one -- using a binary tree whose nodes contain keys and values (path lengths) where the ordering is by the key (since this is our look up criterion). We will not worry about balancing or rebalancing the tree when we do insertions.

So our data structure is a collatz tree, whose general node is:

ct(k, v, left, right)

where k is the key, v its value, and left and right the left and right subtrees. We alos need a nilTree object to close trees off. Insertion and look up are straight modifications of the code for ordinary binary search trees:
 

insert(N, V, nilTree, ct(N, V, nilTree, nilTree)).

insert(N, V, ct(K, VK, L, R), ct(K, VK, L1, R)) :-
    N < K,
    insert(N, V, L, L1).

insert(N, V, ct(K, VK, L, R), ct(K, VK, L, R1)) :-
    N > K,
    insert(N, V, R, R1).

Note that insertion fails if we attempt to insert an existing key..

lookup(K, ct(K, V, _, _), V).

lookup(K, ct(K1, _, L, _), V) :-
    K < K1,
    lookup(K, L, V).

lookup(K, ct(K1, _, _, R), V) :-
    K > K1,
    lookup(K, R, V).

Our "top level" predicate is designed to "construct a collatz tree including at least the keys 1 through n"

collatzTree(1, ct(1, 0, nilTree, nilTree)).

collatzTree(N, Tree) :-
    N1 is N-1,
    N1 > 0,
    collatzTree(N1, Tree1),
    addPathFrom(N1, Tree1, Tree, _).

As usual we have one predicate to add, namely addPathFrom. There are two cases. If we have a successful lookup we don't need to do anything:

addPathFrom(N, TOld, TOld, V) :-
    lookup(N, TOld, V),
    !.

Otherwise, we need to add a path from the value of collatz(n) to the tree, and then add n itself.

addPathFrom(N, TOld, TNew, V) :-
    collatz(N, N1),
    addPathFrom(N1, TOld, TInt, V1),
    V is V1+1,
    insert(N, V, TInt, TNew).

And that's all.
 

Making use of it.


As it is, we can create the data structure, but have no good way to use it. We can do some things at the top level. Some "utility functions" are added below:

maxCKey(N, M) :-
    collatzTree(N, T),
    maxKey(T, M).

maxCValue(N, M) :-
    collatzTree(N, T),
    maxValue(T, M).

maxValue(nilTree, 0).

maxValue(ct(_, V, L, R), M) :-
    maxValue(L, ML),
    maxValue(R, MR),
    M is max(max(ML, MR), V).

maxKey(ct(K, _, _, nilTree), K) :- !.

maxKey(ct(_, _, _, R), K) :-
    maxKey(R, K).

Now we can do things like:

?- maxCKey(100, N).

and find out that just to get 1 through 100 into the tree we need to compute chains running through values up to 9232. If we look up the maxCValue in this tree we get 118, indicating the longest path is 118 steps long.

Probably more interesting would be to have printed out to the screen, or to a file, lines of data of the form

key value

preferably sorted by increasing key, say up to a certain bound. These could then be cut and pasted into a graphing program or whatever. A first attempt:

printTree(nilTree).

printTree(ct(K,V,L,R)) :-
    printTree(L),
    writef('%6l%6l\n', [K,V]),
    printTree(R).

Now we try:

?- collatzTree(10, T), printTree(T).

That works well except for two things: the keys larger than 10 are printed (since they belong to the three), and the tree itself is shown when the goal finally succeeds. The latter effect is mediated by a standard trick, we put a !, fail. at the end of the printTree. Oops, that's no good. Now we don't get side effects, since we can't get to the writef until a printTree succeeds. So, create a new top level predicate showTree:

showTree(T) :- printTree(T), !, fail.

That fixes the display of T itself. Now how do we limit the output values. Well, go back to printTree and add a parameter for the largest key we want printed:

printBoundedTree(ct(K,V,L,R), Bound) :-
    printBoundedTree(L,Bound),
    K <= Bound,
    writef('%6l%6l\n', [K,V]),
    printBoundedTree(R, Bound).

We could, and probably should, provide a corresponding showBoundedTree, but so long as the bound we supply is smaller than the maximum key, then we don't need it.

Finally, how do we get all this to a file? How about:

printBTreeToFile(Fn, T, B) :-
   open(Fn, write, Fd),
   tell(Fd),
   \+ printBoundedTree(T, B),
   close(Fd),
   !,
   fail.

Some funny business here. The first two lines of the clause open a file whose name we supply as Fn for writing, and set it to be the standard output stream. But then instead of doing printBoundedTree ... we do "not printBoundedTree ..." Eh? Well remember we set printBoundedTree up to always fail, so in order to get to the close line we need to make it succeed again, hence the not. Now we close the output file and restore the fail so that we don't get bothered by seeing the whole tree.

As it has been presented here this code is very unsafe -- printBoundedTree succeeds if the bound is larger than the biggest key in the tree, and this will leave the output file open. This defect is repaired in the actual code.