Arrays considered harmless

In February 2017 I encountered a web article called “Arrays considered somewhat harmful by Eric Lippert. The article is dated September 22, 2008.

Eric Lippert is a experienced and capable programmer with a depth of knowledge about C# that I could never hope to equal. Anything he writes is worth reading. But even Homer nods. In this case, there is a fact about C# which he appears to take for granted as The Way Things Must Be. But it isn't so.

Context

Eric Lippert wrote, “I got a moral question from an author of programming language textbooks the other day requesting my opinions on whether or not beginner programmers should be taught how to use arrays.”

This is a seriously weird question. Should beginning carpenters be taught about screws? Should beginning lawyers be taught about contracts? Should beginning pilots be taught how to operate radios? We don't need to teach beginning programmers about classes or assignment statements, and arguably shouldn't. And just as we teach children the alphabet before we teach the concept of a verb, so it makes sense to teach about numbers and functions before we teach about arrays. Based on the kind of code I see 3rd-year programmers writing, I would say that it is deeply unethical to teach beginning programmers about ArrayList<T> before they have mastered arrays.

Lippert's claims

You probably should not return an array as the value of a public method or property, particularly when the information content of the array is logically immutable.
Returning an array means that you have to make a fresh copy of the array every time you return it.
An array is a collection of variables. ... Variables vary.
tools providers need to create ways for people to program effectively without causing observable side effects. [In order to support parallelism.] Of all the sort of “basic” types, arrays most strongly work against this goal. An array's whole purpose is to be a mass of mutable state.

Every single one of these claims is true for C#, but false for some other languages, which shows us that the problems are not problems with the array concept but with C#. Java has the same gross defect. Ironically, C and C++ don't.

The presupposition

Arrays in general provide

Close to minimal storage cost per element.
O(1) size.
O(1) access to an element given an index.
Ideally, syntax for a sequence of compatible values.

Mutable arrays provide

O(1) time to replace an element given an index.
Zero space cost per replacement.

Stretchy arrays provide the benefits of mutable arrays plus

O(n) cost to grow
O(1) amortised cost to grow
at the price of abandoning the “close to minimal storage cost per element” property of plain arrays.

A read-only array is an array that is not mutable.

Lippert's presupposition is that read-only arrays do not or cannot exist, that all arrays are at least mutable.

That is true in Java and C#, bad cess to them. Ironically, it isn't true in C or C++.

The Joy of Read-Only Arrays

Let's look at read-only arrays in Standard ML.

#[e0,...,en] evaluates to a read-only array of n+1 elements.
tabulate (n, f) makes a read-only array of n elements holding f(0), ..., f(n-1), taking constant time per element plus whatever time f takes.
length v returns the number of elements in v, taking constant time.
sub (v, i) returns the ith element of v in constant time, where 0 ≤ i < length(v).
map (f, v) makes a new read-only array w where sub(w, i) = f(sub(v, i)), taking constant time per element plus whatever time f takes. There's a bunch of other functions for iterating over vectors.
concat (v, w) concatenates vectors v and w taking time and space proportional to length (v) + length (w).
update (v, i, e) returns a new read-only array w such that sub (w, j) = if j == i then e else sub (v, j). This takes time and space proportional to length (v); the original vector is unchanged afterwards.

There are no mutating operators.

Is it safe to return a read-only array as the value of a public method or property? Certainly it is!
Do you have to make a fresh copy of a read-only array every time you return it? Certainly not!
Is a read-only array a collection of variables? Certainly not!
Do read-arrays work strongly, or even at all, against the goal of programming without observable side effects? Certainly not!
Is the whole purpose of a read-only array to be a mass of mutable state? Certainly not! A read-only array is not mutable.

Read-only arrays provide compact storage, fast size and element access, and can support a rich collection of operations. (I have not begun to show a complete list here.)

My Smalltalk system has a ReadOnlyArray class (amongst many read-only collection classes), supporting 975 operations. The Array class adds only 73 more. The joy and freedom, moving from Java or C# to a language where there's a sequence data structure with minimal overheads that can be freely passed around without fears of inadvertent mutation is inexpressible. (Maybe Oh what a beautiful morning can express it.)

It isn't just arrays that are the problem

You probably should not return any mutable object as the value of a public method or property, or if you do, you should return a new copy. It's not the arrayness that's the problem, it's the mutability. Consider the following classes:

class Point {
    public int x;
    public int y;
    Point(int x_, int y_) { x = x_; y = y_; }
}
class Rectangle {
    public Point topLeft;
    public Point bottomRight;
    Rectangle(Point tl, Point br) { topLeft = tl; bottomRight = br; }
}

where you can imagine getter and setter methods if you like.

    Point aPoint = aRectangle.topLeft;
    aPoint.x = 0;

changes the position of the top left corner of aRectangle without aRectangle being involved in any way. Instead you have to write

class Rectangle {
    private int top, left, bottom, right;
    public Point topLeft() { return new Point(left, top); }
    public void topLeft(Point tl) { top = tl.y; bottom = tl.x; }
    public Point bottomRight() { return new Point(right, bottom); }
    public void bottomRight(Point br) { bottom = br.y; right = br.x; }
    public Rectangle(Point tl, Point br) { topLeft(tl); bottomRight(br); }
}

Now

    Point aPoint = aRectangle.topLeft();
    aPoint.x = 0;

does nothing to the rectangle. It isn't topLeft() returning an array that's the problem. It doesn't. It returns a Point. The problem is that topLeft() returns a mutable object. If Points were read-only, a Rectangle would not have to generate new Points every time it was asked for a corner.

A more deeply hidden belief

Why would anyone imagine that an array must be mutable? The obvious answer is “because they think that's how arrays have to be initialised”. That's not true. The building block we need is something like tabulate. For example, to implement map we can write

fun map (f, v) =
    tabulate (length v, fn i ⇒ f (sub (v, i)));