Why I never used IMP

I was a PhD student at Edinburgh at a time when the Edinburgh systems programming language IMP was flourishing. And I was very keen on programming languages.

Why should my opinion count for anything? At the risk of sounding either boastful or insane (or both), I need to establish some credentials as a programming language critic.

By 1979, I had written programs in Algol 60, APL, BASIC, Burroughs Algol, COBOL, CSMP (continuous systems modelling program), ESPOL (another Algol dialect), Euler (think of it as JavaScript without hash tables), Fortran, GLIM, Lisp, Pascal, PL/I, SNOBOL, and WFL (a sort of compiled shell).

I had a reading knowledge of Ada (the 1979 draft), ALEPH (from CWI), Algol 68, Algol W, BCPL, C, EAST, ESPL, JOSS, Marseilles Prolog, Modula 1, PL-11, PL/360, PL516, SETL, Simula 67, Snibbol (now MINT, rather FORTHish), and XPL. This is defined as ability to translate working code from one of these languages to a language I had access to a compiler for.

While at Edinburgh, I used Algol 68, AWK, BLISS-10, Bourne shell, Edinburgh Prolog, Hope, Lisp (Franz Lisp, InterLisp, MacLisp, Rutgers/UCI), M4, ML (predecessor of OCAML and SML), POP-2, and Simula 67.

I also learned to read Coral 66, IMP 77, Logo, Planner, RTL/2, SAIL, Scheme, and Smalltalk.

Clearly, these are not the tastes of a normal person, and the fact that I have tried to learn a new programming language every year since proves it. An academic who is willing to admit to having written COBOL (still less to having a COBOL compiler on his computer now) and still using SNOBOL occasionally is not likely to be deterred from using a language by its lack of popularity amongst his peers. Indeed, the way I think about programming has been influenced by languages I never got a chance to use. Cognoscenti will notice several “system implementation languages” in the lists above, so those were also one of my abnormal tastes.

And yet, I never took to IMP. I recently came across some material from Edinburgh mourning the demise of IMP, and attributing it to the lack of marketing and the failure of any manufacturer to take it up. That’s not what kept me from IMP, and I think it may be worth recording what was.

Acknowledgement

IMP was used for decades at Edinburgh. A lot of good work was done on it and in it, not least the innovative and useful operating system EMAS. IMP was available for a wide range of machines.

I still have copies of

The IMP Language and Compiler, by P. D. Stephens, EMAS Report 6, Reprinted 1977
Notes on IMP Programming, by P. D. Schofield, Revised: October 1977
The IMP-77 Language, by Peter S. Robertson, CSR 19-77, Revised May, 1979.

Lexical issues

Keywords and identifiers

IMP is an old language and must be judged as an old language.

Old character sets did not include lower case letters. Fortran and Algol 60 ignored blanks except inside string literals. The problem arose: how can we tell what is a keyword and what is not?

Burroughs Algol’s designers chose to take space between tokens as significant, to use the low line character to separate the parts of multi-part identifiers, and to define some words to be reserved words that could not be used as identifiers. For example,

FOR THE_INDEX := 1 STEP 1 UNTIL ARRAY_SIZE
   DO THE_ARRAY[THE_INDEX] := 0;

IMP’s designers chose to retain insignificant blanks, and not to allow low lines, which meant that the example about would have been written as

FOR THE INDEX := 1 STEP 1 UNTIL ARRAY SIZE
   DO THE ARRAY[THE INDEX] := 0;

and read as

{FORTHEINDEX} {:=} {1} {STEP1UNTILARRAYSIZE}
   {DOTHEARRAY{[}{THEINDEX}{]} {:=} {0}{;}

How then were the keywords to be distinguished? Algol had used several methods. For publication, bold or underlining were used. For actual computer entry, CASE stropping, 'quote' stropping, and %percent stropping were used by different systems. Case stropping was out, because you could not rely on having two cases. Quote stropping was out, because they wanted quotes for strings, and “The "" (double quote) character is not used within the language and is used by Emacs Director as a delete character when accepting input from a terminal.” (This was later changed.) That left percent stropping. So the example statement would have been written as

%FOR THE INDEX := 1 %STEP 1 %UNTIL ARRAY SIZE
   %DO THE ARRAY[THE INDEX] := 0;

except that that’s Algol, and IMP’s not. It would have been

%CYCLE THE INDEX = 1, 1, ARRAY SIZE
   THE ARRAY(THE INDEX) = 0
%REPEAT

So far, so good. But what happens if you have something like

%EXTERNAL %ROUTINE %SPEC FLAGAND FOLD(%RECORD (TRIPF) %ARRAY %NAME T)

which is roughly equivalent to C’s

extern void flagand_fold(struct tripf t[]);

Think of the percent sign as a case shift key. It shifts into a keyword case, and any non-letter shifts out again. The longest prefix of a sequence of letters in keyword case that corresponds to a known keyword is taken as a keyword. Because people didn’t like typing lots of percentage signs, what one inevitably sees is

%EXTERNALROUTINESPEC FLAGAND FOLD(%RECORD(TRIPF)%ARRAYNAME T)

Or even horrors like

%EXTERNALLONGREALFNSPEC COS(%LONGREAL THETA)

I am a bear of very little brain and long “words” confuse me. IFYOUFINDLONGRUNSOFCAPITALSEASYTOREADWEAREDIFFERENT.

What about spaces in identifiers? I like that. I like it a lot. But only if it’s consistent. For example, the design could have said that a run of one or more space characters inside an identifier was equivalent to a single space, and that identifiers had always to be spelled the same way. But following the Fortran tradition, IMP said that spaces were ignored. It's not just possible to have READSYMBOL, READ SYMBOL, READS YMBOL all in the same routine, this actually happens.

This decision meant that whatever the semantic merits of IMP, it was harder to read than it needed to be, even compared with contemporaries like BCPL and BLISS.

For an application programmer, BCPL was a far worse language than IMP. I cannot think of any ordinary programming problem for which I should prefer BCPL to IMP. Yet BCPL had a certain minimalist elegance which made it bearable to read and almost compensated for its lack of types.

Shriek

The exclamation mark character had four different uses within the IMP language.

!expr! is the absolute value of expr, as if the exclamation mark were the vertical bar. This feature was removed from the language in about 1977, being replaced by IMOD(i) and MOD(r) functions. The old syntax can still be found in archives.
e₁!e₂ is the bitwise inclusive OR of e₁ and e₁.
e₁!!e₂ is the bitwise exclusive OR of e₁ and e₁.
! introduces end-of-line comments.

No, you’re not seeing things. You did just read that. Imagine what it would be like in C or Java or JavaScript if // were a division operator as well as a comment introducer.

The resolution of this appears to be that the exclamation mark began a comment only at a place where a declaration or a statement could begin.

I was used to languages with end-of-line comments. Burroughs Algol and Prolog had them, using the percent sign. Lisp had them, using the semicolon. Pop-2 and Bliss had them, using the exclamation mark. None of them restricted comments to statement/declaration beginning positions.

It used to be possible to write f(!x+y!, !x-y!), and in any case, a statement cannot begin after a comma. None of the documentation ever hinted that a comment could appear after a comma. Indeed, a common practice was to use curly brace comments because of this:

%integer %c
  x,    {comment about x}
  y,    {comment about y}
  z     {comment about z}

Yet some IMP code does use

%integer %c
  x,    !comment about x
  y,    !comment about y
  z    ;!comment about z

While we're at it, continued comments

    ! This is a comment,
and it is continued %c
onto two more lines

are explicitly illegal in IMP77 but legal in IMP80 and do occur (too often) in actual code.

And and and

Consider this statement.

%IF X = 0 %AND Y = 0 %THEN X = 1 %AND Y = 1

The first two occurrences of = are equality tests. The last two are assignment symbols. (PL/I pulled the same trick.) Not only that, the first %AND is a short-circuit conjunction operator like C’s && while the second is a statement separator like C’s comma.

Why isn’t this read as X = (1 %AND Y = 1)? Because the right hand side of an assignment is an expression, and a conjunction is not an expression.

A conjunction is not an expression? It was in Fortran and Algol, but not in a number of old languages. Even in Pop-2, as described in The Pop-2 Papers and Programming in Pop-2, conditions could have conjunction and disjunction, but conditions were not expressions. This was one more mental burden for an Algol and Lisp programmer to boggle at (and eventually fixed in Pop-2).

Strings

Algol 60 had a string type but literally nothing that you could do with it. The I/O addendum added the ability to print a string.

Since IMP was a systems implementation language (as well as a tolerable applications language), it did not support garbage collection. XPL, a much smaller language, had a garbage collector specifically for strings. AWK, also a small language, typically uses reference counted garbage collection for strings and strings only. This works very well indeed. The Small Machines Algol-Like Language developed at Auckland University at about the time of IMP77 also provided special memory management for strings. The official subset of Algol 68 adopted the same policy: automatic memory management for strings only. With hindsight, that policy probably could have been adopted for IMP. A good AWK implementation is surprisingly efficient.

That was the path not taken. Instead, IMP adopted the approach found in some BASICs and in PL/I (CHARACTER VARYING).

%STRING (size) variable

declares variable to be a string of 0 to size characters. This is basically implemented as

%BYTE %INTEGER %ARRAY variable(0:size)

with element 0 holding the current size. (This is nowadays known as the “Pascal” convention, but BCPL, PL/I, and IMP were there first.)

What’s wrong with this?

String assignment is expensive.
String length is limited by the size of a character; limiting strings to 255 characters on machine with a 32-bit address space is just ridiculous. (Or would be if string assignment weren’t so expensive.)
Substrings cannot be shared. C’s NUL-termination convention means that an array of n characters terminated by a NUL can represent n+1 strings at the same time: the initial string and all its substrings. A (length,pointer) convention means that a single n character array can store the characters of all its 1+½n(n+1) substrings just once. (This is why SML has a string type and also a substring type.) This makes splitting a string into pieces costly in IMP.
The size being built into the type means that functions intended to work on strings are either limited and expensive or syntactically limited and tricky. Let’s take opening a file as an example.
```
%EXTERNAL %INTEGER %FN %SPEC DEFINE INPUT(
   %INTEGER CHANNEL, %STRING (255) TITLE)
```
In order to work with whatever title you have, this has an argument of the maximum possible size (which has not been big enough for Unix file names since the 1980s). Calling the function implies copying the string you have to a new 256-byte array. (Presumably only n+1 bytes are filled in when the source has n characters.) What if you don’t want this copying?
How about converting a string of hex digits to integer?
```
%EXTERNAL %STRING (*) %FN SUBSTRING(
   %STRING (*)%NAME S, %INTEGER F, T)
```
%NAME is how IMP spells “pointer”. This passes the address of a variable to the function, which is cheap. The problem with this is that it is syntactically limited: the string parameter must be a variable. Since IMP uses an infix dot for concatenation, it seems to be illegal to write
```
    X = SUBSTRING(" ".X, 1, LENGTH(X))
```

There is a special syntactic form called string resolution. Let’s see what it might have been. I’ll use Ada syntax.

procedure Find(
   Source: in  String;
   Target: in  String;
   Offset: out Natural;
   Length: out Natural;
   Found : out Boolean);
-- ...
Find(Source, "Target", Offset, Length, Found);
if Found then
   Prefix := Head(Source, Offset);
   Suffix := Tail(Source, Offset+Length);
else
   -- ...
end if;

The IMP equivalent is

%IF SOURCE -> PREFIX.("Target").SUFFIX %THEN ...

This is admittedly more concise. But that is as good as it gets. The Ada version generalises nicely to seeking the last occurrence instead of the first or to doing a search that ignores alphabetic case and/or accents. The IMP version does not.

It is precisely this kind of sharp division between special-case string operations that the compiler understands and those you have to program yourself that made my old friend’s life so painful when using PL/I strings. Nor is the kind of efficiency you get by building this specific match into the language the kind you really want. What you want is

...
Prepare("Target", Search_Info);
...
Find(Source, Search_Info, Offset, Length, Found);
if Found then ...

allowing you to precompute the tables needed by the Knuth-Morris-Pratt or Boyer-Moore algorithm, and get really fast searching by using a better algorithm. While the IMP compilers generated good code, they weren’t that smart.

Note again that the IMP approach implies copying. Suppose I want to split a string at spaces.

%CONSTANT %INTEGER MAX LEN = 80
%STRING (MAX LEN) %ARRAY PIECES(1:MAX LEN)
%INTEGER PIECE COUNT
%STRING (MAX LEN) LINE

PIECE COUNT = 1
PIECE COUNT = PIECE COUNT + 1 %C
   %WHILE LINE -> PIECES(PIECE COUNT).(" ").LINE
PIECES(PIECE COUNT) = LINE

(Here stmt %WHILE cond is the same as %WHILE cond %CYCLE stmt %REPEAT, which Perl programmers should find familiar, and %C is a continuation mark.)

The searching will be linear in the size of LINE. The string assignments will be quadratic. If LINE is initially the integers 0 to 29 separated by single spaces, 1340 bytes will be moved. If LINE is initially the integers 0 to 59, 5435 bytes will be moved. Doubling the size of the input quadrupled the cost.

It’s easy to make this linear with the Ada kind of interface. Add a Skip parameter that says how many characters from the beginning to skip before you start looking. Then

Skip := 0;
Piece_Count := 0;
loop
   Find(Source, " ", Skip, Offset, Length, Found);
   exit when not Found;
   Piece(Piece_Count) := Source(Offset+1 .. Offset + Length);
      -- I’m skating over some details there.
   Skip := Offset + Length;
end loop;

There is no Boolean type

The IMP Language and Compiler says “Purists have criticised IMP for not having a variable of type logical, and it is true that allowing bit operations on integer variables presumes a conversion between integers and bit representations.” That mis-states the nature of the criticism. It’s not that IMP doesn’t have a bit-string data type (as PL/I and Common Lisp do); it’s that it doesn’t have a Boolean type. The problem is not that X&3 is an expression, but that X < 3 is not an expression.

IMP has two separate syntactic categories: expressions and conditions. Expressions have integer, floating point, or string values. These values can be printed, stored in variables, read, passed as arguments to procedures, and returned from functions. Conditions have true or false outcomes. Those outcomes can’t be printed, stored in variables, read, passed as arguments to procedures, or returned as values from functions.

Contrast

%INTEGER %FUNCTION PARITY(%INTEGER X)
   %RESULT = (-1)\\REM(X, 2)
%END

which returns either +1 if X is even or -1 if it is odd, with

%PREDICATE IS EVEN(%INTEGER X)
   %IF REM(x, 2) = 0 %THEN %TRUE %ELSE %FALSE
%END

%TRUE and %FALSE are not Boolean values, they are a kind of return statement.

There is no conditional expression

All the Algols, all the functional languages, and most of the system implementation languages, have an expression that means “if e_c is true then the value of e_t otherwise the value of e_f”. Not IMP. This is no more than an annoyance, but it is an annoyance.

There are no enumerations

Pascal introduced enumeration types. Functional languages extend this to sum-of-product types, providing a safe replacement for variant records as well. However, the improved type checking offered by enumerations is well worth having. Even C eventually adopted them, in a somewhat crippled and confused fashion.

IMP didn’t have them. In no version of IMP for which I have been able to locate documentation or examples did it ever acquire them. And it needed them. IMP code is littered with

%CONSTANT %INTEGER %c
   foo = 0,
   bar = 1,
   ...,
   ugh = n

declarations that give you everything except convenience and safety.

There is no typedef

PL/I had showed us that you could let programmers ask for the integer size they wanted:

DECLARE X BINARY FIXED (15), Y BINARY FIXED (31);

Pascal had showed us that you could let programmers ask for the integer size they wanted in a human-oriented way::

var X: 0..20000; Y: 0..1000000000;

IMP offers %BYTE %INTEGER, %SHORT %INTEGER, %INTEGER, and on some but not all machines, %LONG INTEGER. What these actually mean is up to the compiler. If that sounds unpleasantly like C, that’s because it is. If that sounds like laying out bit fields in a machine-dependent record is impossible, that’s because it is.

Suppose we want to write a program that is portable between a 16-bit machine and a 32-bit machine. The obvious thing to do would be to do something like

%IF WORD SIZE = 16 %THEN %START
   %TYPE %FORMAT INT16 (%INTEGER)
   %TYPE %FORMAT INT32 (%LONG %INTEGER)
%FINISH %ELSE %START
   %TYPE %FORMAT INT16 (%SHORT %INTEGER)
   %TYPE %FORMAT INT32 (%INTEGER)
%FINISH
%TYPE(INT16) LITTLE
%TYPE(INT32) BIG

Impressively, the %IF part of this is perfectly OK. IMP allows conditionals that select between groups of declarations. This is not a preprocessor feature; IMP has none.

The problem is that there is no way of introducing your own type names. You can introduce your own record types, but not give them aliases, and you cannot perform arithmetic operations on records. The %TYPE syntax above is not real IMP; it's modelled on %RECORD %FORMAT, used to introduce a new record type, and %RECORD(name), used to refer to a record type.

Pointers

You can declare a variable that points to a number, a string, an array, a record, or a procedure. But you cannot declare a pointer that points to a pointer. Algol 68 and C programmers knew the value of this.

You can declare an array whose elements are pointers to numbers, strings, or records, but not an array whose elements are pointers to arrays. For example, in Pascal you can write

type vec = array [1..6] of real;
var  mat : array [1..5] of ↑vec;
...mat[i]↑[j]...

but you cannot do this in IMP. The closest you can come is

%RECORD %FORMAT vec(%REAL %ARRAY vec(1:6))
%RECORD (vec) %NAME %ARRAY mat(1:5)
...mat(i)_vec(j)

As the example shows, some of the quirks of the IMP type system can be worked around by introducing records. The problem is the frustrating irregularity of the language.

There is no case statement

Instead, IMP uses the SWITCH keyword from Algol 60 with the semantics of label arrays from PL/I. It sort of worked, but Algol 68, Algol W, Burroughs Algol, and BCPL, to name just four, had all picked up a case statement by the time IMP77 was written up.

Specification

The IMP compiler used some sort of table driven recursive descent. (Odd: I thought one of the major reasons for using recursive descent was to avoid the space and time overheads of using tables.) The tables are generated from some machine readable grammar.

No complete grammar appears in the documentation of the language. Trying to construct one so that I could write a parser in Yacc has proven amazingly painful.

Here’s one way in which it’s painful.

%IF c1 %THEN %START
   b1
%FINISH %ELSE
%IF c2 %THEN %START
   b2
%FINISH %ELSE %START
   b3
%FINISH

can be abbreviated as

%IF c1 %THEN %START
   b1
%ELSE %IF c2 %THEN
   b2
%ELSE
   b3
%FINISH

But what exactly are the rules here? Thinking about how to parse this, it seemed to me that the easiest hack was to patch around it in the lexical analyser:

If the current token is %ELSE and we’re inside %IF %THEN %START, hallucinate a %FINISH before the %ELSE. If the next token is not %IF then hallucinate a %START after it.
If the current token is %THEN and we’re inside %IF %THEN %START, hallucinate a %START after then %THEN.

More frustrating was the lack of a clear semantics for IMP. For example, there are two kinds of assignment.

var = expr
If the value of expr will not fit in var, an event may be signalled.
var <- expr
If the value of expr will not fit in var, it will be truncated.

So what happens if you assign a %LONG %REAL value to a %REAL variable? Must <- truncate, or may it round? Presumably if conversion to a shorter format overflows, = may signal an event, but what if the conversion is just inexact?

For that matter, when is a floating point literal of type %LONG %REAL and when is it of type %REAL? Does X*2.0 involve mixed precision when X is of type %REAL?

Then there is the assignment of records. “The right-hand record is copied bit by bit into the left-hand record. The formats of the two records must be equivalent.” When are the formats of two records equivalent? The language report is silent.

You might object that it is unfair to expect the design of an old language to settle more recent questions, but the distinction between “name equivalence” and “structural equivalence” was very much a live issue in the ’60s and ’70s. Algol 68 took structural equivalence, which proved to be a brute not only to implement but even to define, while Pascal took name equivalence, which was easier for people to understand as well as compilers.

Was the assignment of arrays forbidden, or simply overlooked in the manual? Either choice can be defended, the problem is not saying.

Separate compilation model

The namespace model offered by IMP is the traditional “assembler” model where there is one “public” name space and each load unit has a private name space as well. There is no way of carving up the public name space into portions as there is in Turbo Pascal, Modula (any Modula), or Ada. That is no reflection on IMP. To this day, C limps along with the same model, and while C++ has namespaces, they are just a prefixing hack. A genuine module system is proposed for the next version of C++. If C and C++ don’t have modules in 2015, IMP cannot be faulted for lacking them in 1977.

No version of IMP that I’ve found documentation for ever had type safe separate compilation. C doesn’t have it either, and while C++ claims to have it, that only applies to functions, not to variables. If all we had to go by was the C family, IMP would not look too bad.

However, by the time IMP came along, Burroughs had demonstrated that it was possible to have multi-language type-safe separate compilation. That is, you could have a system where the linker guaranteed that an identifier defined in one component and used in another had the same type in both places. Simula 67 had demonstrated that you could provide this guarantee even when the host operating system’s linker did not support it, by providing a pre-linker.

This was known to be an issue at the time. There were two grounds for the Minority Report that recommended rejecting Algol 68. The first was the way the language was defined. The second was that the language did not provide adequate facilities for composing large structured programs reliably. This is a problem that IMP never solved, any more than C did. The Algol 68 community did solve it: RSRE Algol 68 and the Berlin Algol 68 compiler both provided type-safe separate compilation. Both are of the same vintage as IMP.

Having said that, it was no worse than C in that respect. But not being better meant that an opportunity to make it adoption-worthy was lost.

But wait, there’s more

Only don’t wait too long, because I don’t have time to write about it now. Hints:

No overloading
Pointers to records are in the language, but there are no new/malloc or dispose/free analogues in the language or the standard predefined procedures. Some libraries include GET VEC and FREE VEC, which are remarkably unpleasant to use for dynamically allocated records.
Input/output streams are represented by small integers, not by a pointer to a record. The same integer may be used for an input channel and an unrelated output channel. Channel number management is the programmer's problem. This makes it very hard to write a library package that uses files.
Array limitations. Only one-dimensional arrays can be initialised. You can’t fix this by having an array of records because records cannot be initialised. Record fields can only be one-dimensional. Own (static) arrays can only be one-dimensional. The syntax for declaring a pointer to a multi-dimensional array is different from the syntax for declaring a pointer to a one-dimensional array.