Garbage Retention

All of you in this class have learned Java and/or Python. Both of those language use garbage collection to manage memory. From time to time, the runtime system of your programming language figures out which objects will not be needed any more, and reclaims the memory used to hold them, so that it can be used for other things.

Sadly, it is impossible for the runtime system, or any algorithm, to know for sure which objects will be needed in the future. Imagine a program doing

    T x = new T();
    final int n = "read it from somewhere";
    assert n > 2;
    boolean found = false;
    for (int k = 1; !found; k++) {
        assert k > 0;
	for "1 <= a <= b <= c <= k" {
                 n    n     n
	    if (a  + b  == c ) {
                found = true;
                break;
	    }
	}
    }
    use(x);

If Fermat's Last Theorem is false, the loop will eventually stop, and x will be needed. If Fermat's Last Theorem is true, or if the the smallest value of c falsifying it won't fit in an int, the loop will eventually crash because the assertion about k will fail.

So whether x really is garbage or not depends on the status of Fermat's Last Theorem.

It took over three and a half centuries for human mathematicians to finally prove that theorem true. This is not something programming language runtime systems are going to be doing routinely.

So we have to use some approximation to “will this ever be needed again”.

The usual approximation is “can this be reached from some live variable by following pointers?”. (Note that a live variable is basically a named variable in your program and that “live” means the compiler thinks the current value of that variable might be used, so there's an approximation there too.)

This has an important practical consequence.

If X has an instance variable that points to Y, and X is reachable, then Y is reachable, even if the program will (or can!) never use it.

Here is a very simple case, a stack of objects.

public class Stack<E> {
    int depth;
    Object[] array; // E[] is not allowed, sigh.    

    public Stack(int capacity) {
	array = new Object[capacity];
	depth = 0;
    }

    public boolean isEmpty() {
	return depth == 0;
    }

    public void push(E x) {
	array[depth] = x;
	depth++;
    }

    public E pop() {
	assert depth > 0;
	depth--;
	return (E)array[depth];
    }
}

Now suppose we do this:

    Stack<char[]> s = new Stack<char[]>(10);
    s.push(new char[1000000]);
    ...
    char[] a = s.pop();
    ...
    a = null;

It all seems straightforward enough. Once we've executed the a = null statement, a doesn't point to anything, and only elements s.array[0..s.depth-1] count, and there aren't any of those, so it should be possible to reclaim that million-char array, right?

Not so fast. We know that only elements 0..depth-1 of array can ever be used again, but the garbage collector doesn't! There's no reason in principle why we couldn't tell the garbage collector what part of the array is still live (allowing it to replace other elements by null), but it so happens that Java and Python don't give us any way to do that.

The only way, in fact, for us to ensure that the garbage collector knows that array[depth] doesn't point to anything useful any more is for us to tell it by setting array[depth] to null. So a gc-correct implementation of pop() goes like this:

    public E pop() {
	assert depth > 0;
	depth--;
	E result = (E)array[depth];
	array[depth] = null; // inform the GC
	return result;;
    }

Now Java already has stacks and queues and stretchy arrays and it already does this. The point is not to show you how to make a Stack, but to remind you that if you know that the current value of some instance variable or array element can't be used again, you had better make sure the garbage collector knows that, otherwise you will have what's called a Memory Leak.