Notes for Lecture COSC345 Lecture 17, 2016

Names

Grammar of names

Separate your words

Alongtimeagowhenpeoplestartedtowritetheyrantheirwordstogether. Suchtextisnoteasytoread. ItIsNotMuchBetterWhenYouUseInternalCaps, TheBaStudlyCapsStyle, BecauseWhatYouGetIsStillLongBlackBlobs. DoNotBeSoRudeToYourReadersIfYouCanPossiblyAvoidIt.

People have to be able to decode identifiers. Distinguish a workplace that is unionised from a chemical that is un_ionised. Distinguish a man who works for the UN, a UN_man, from a mutant who isn't quite human, an unman. You may want to distinguish things that have capital letters in ordinary text (like proper names and acronyms) from things that do not, so it is unwise to use capitals for separation. For example, George Boring Facebook picture of a George Boring presumably isn't boring. According to a family name site, Superman was drawn in the 1940s and 1950s by a Boring artist (but not a boring one).

Compare

If you are not already familiar with Romberg integration, in which of these languages is it obvious that Romberg is a proper noun? In which might Romberg be confused with a colour space?
Make the breakdown of your names into separate words obviously unambiguous.
Do not confuse Boring people with boring people.
Use keyword arguments if you have them.

Completely self-documenting code is impossible.

Look at the examples above. Three of the arguments are numbers. There are six possible ways they might be ordered. The Java version cannot be self-documenting.

If you are not familiar with Romberg integration, what does the name Romberg_integral tell you, all by itself?

The staff member who wrote the original version of these slides exhorting you to "Aim for ... completely self-documenting code" has an average of one comment line for every 12 SLOC in his magnum opus. And in my view, because I have struggled to understand it, it does not have enough comments! And he was aiming for self-documenting code.
Nobody struggling to understand an unfamiliar body of code ever said “I wish this had fewer helpful comments.”

Distinguishing constants from variables.

Some people are adamant that you should WRITE CONSTANTS IN UPPER CASE so that you can tell them from constants. (Sorry, tell them from variables. It doesn't really make sense either way.) In antique C, where constants were normally declared as macros, the const keyword not having been adopted, that made sense, because mutable variables and #defined constants followed different scope rules.

Other people are adamant that variables and constants should be named exactly the same way. After all, if a variable isn't changed in some region of code, why do you even care? (Java 8 has the notion of “effectively final” variables.) We have better things to spend notational capital on.

I suggest a meta-guideline. If a programming language is such that you routinely need to know whether a name is a constant or a variable, then by all means use capitalisation to distinguish them. If, however, they are mostly interchangeable, then don't.
Don't use case style to distinguish constants from variables unless the reader of your code needs to know.

On the importance of comments

Consider the problem of specifying a point near the Earth's surface. We clearly need at least 3 numbers, and in order to recognise the same point if we see it again, it's clear that we'd like these co-ordinates to be referred to a reference frame that rotates rigidly with the Earth.

Centuries of tradition tell us that the answer is latitude (North/South), longitude (East/West), and height above mean sea level.

So now we have

typedef struct geo {
    double lat, lon, hgt;
} geo;

What do we have?

So now we have reached

typedef struct geo {
    double lat; //  -90 (S) to  +90 (N) degrees, 0=equator
    double lon; // -180 (W) to +180 (E) degrees, 0=Greenwich
    double hgt; // -100 to +200 km, 0 = mean sea level
} geo;

Are we done yet? And did we really need those comments? Couldn't we have used

typedef struct geographical_location_3d {
    double latitude_in_degrees_north_of_equator;
    double longitude_in_degrees_east_of_Greenwich;
    float  height_in_metres_above_mean_sea_level;
} geographical_location;

and had self-documenting code?

Just how long do you think you would be willing to type those names?

At this point, someone is bound to say “but my IDE offers completion based on the first few letters, so I don't have to type much”. That works fine until you need

in the same program.

Above all, those names do not tell us everything we desperately need to know!

Some programming languages let you say more than others. For example, we can express range and precision information precisely in Ada, where the compiler can see them, check them, and take advantage of them. Here's what it looks like.

-- ISO 6709:2008 geographical point representation.
-- The Coordinate Reference System (CRS) WGS_84
-- (World Geodetic System 1984, as revised in 2004) is always used.
-- Latitude and longitude are measured in degrees.
-- +ve latitude is north; +ve longitude is east (0 = prime meridian).
-- Height is measured in metres.
-- The deltas are chosen for about 10 cm resolution.

type Latitude_Range
  is delta 0.000_001 digits 8 range    -90.0 ..    90.0;
type Longitude_Range
  is delta 0.000_001 digits 9 range   -180.0 ..   180.0;
type Height_Range
  is delta 0.1       digits 7 range -100_000 .. 200_000;
type Geographic_Location
  is record
     Latitude  : Latitude_Range;
     Longitude : Longitude_Range;
     Height    : Height_Range;
  end record;

The Novopay system is implemented in Oracle's PL/SQL, which lets you write this:

-- ISO 6709:2008 geographical point representation.
-- The Coordinate Reference System (CRS) WGS_84
-- (World Geodetic System 1984, as revised in 2004) is always used.
-- Latitude and longitude are measured in degrees.
-- +ve latitude is north; +ve longitude is east (0 = prime meridian).
-- Height is measured in metres.
-- The deltas are chosen for about 10 cm resolution.

DECLARE
  SUBTYPE Latitude_Range      IS NUMERIC(8,6);
  SUBTYPE Longitude_Range     IS NUMERIC(9,6);
  SUBTYPE Height_Range        IS NUMERIC(7,1);
  TYPE    Geographic_Location IS RECORD (
             Latitude  Latitude_Range,
             Longitude Longitude_Range,
             Height    Height_Range);

where we can state the precision but not the true range.

We can't even do that in C. The best we can do is

/*  ISO 6709:2008 geographical point representation.
    The Coordinate Reference System (CRS) WGS_84
    (World Geodetic System 1984, as revised in 2004) is always used.
    Latitude and longitude are measured in degrees.
    +ve latitude is north; +ve longitude is east (0 = prime meridian).
    Height is measured in metres.
    We want to have about 10 cm resolution, so single precision
    floats are NOT adequate for latitude & longitude.
*/
typedef double latitude_range;  // -90 to +90
typedef double longitude_range; // -180 to +180
typedef double height_range;    // -100 km to +200 km in m.
typedef struct geographic_location {
    latitude_range   latitude;
    longitude_range  longitude;
    height_range     height;
} geographic_location;

Rules about comments.

If there is important information about something in your program which cannot be expressed in your programming language, it must be expressed in the name or in a comment.
If the information should be expressed every time the thing is mentioned and can be expressed in the name, put the information in the name. If the information is too bulky to go in the name, it must go in a comment.
Interfaces must be commented adequately, and it is the client of the interface who determines what counts as adequate, not the author of the implementation.

Comments do not have to be in-line

There are annotation editors that let you keep a file of annotations in parallel with a source file. This is especially useful if you want to make notes about a source file that you are not able for some practical or legal reason to edit.

The Eclipse IDE supports such annotations.

Review Board allows annotations as part of its support for code review. You should visit the Review Board web site and take a look.

There is a package 'annot.el' for Emacs which offers incredibly simple annotation. There are several ports of Emacs for Mac OS X, but AquaMacs would be top of the list. Annotations are stored in files in your ~/.annot directory. As long as you don't change the file, or only change it through Emacs with annot.el loaded, no problems. Change it with another editor, the MD5 hash changes, and annot.el doesn't recognise the file. Still some work to do there. But it is perfect for annotating things you don't change.

Annotations are also great for tools. Compiler error messages can be converted to annotations. Intel have a tool that can generate parallelism annotations. Seeing “comments” from a static checker in situ without changing the file is handy.

What Lind and Vairavan really found

One of the slides says “Highly commented parts of code have the highest error rate (Lind and Vairavan 1989). What did Lind and Vairavan actually find?

They studied one program. It was a big one, with thousands of procedures, some in (old) Fortran and some in Pascal.

There is a dogma that short functions are best, that anything over one page is bad. They actually found that short functions (1-50 lines) had nearly twice the error rate of medium ones (51-100 lines).

They found that comments are correlated with errors for two reasons.

They also found that the slide has causality backwards. The slide seems to say “if you put lots of comments in your code, it will then turn out to have lots of errors”. In fact they suggested the opposite: code that had lots of errors ended up with lots of comments.

That's why the next line on the slide says “Don't comment tricky code, re-write it!”. Of course, that assumes that it is possible to eliminate the trickiness by rewriting. If only that were true.

Indentation

First we get world peace. Then we convert everyone to the same religion. Then we eliminate poverty. And then we get people to agree on a layout style.

Some things are not a matter of taste.
The purpose of indentation is to reveal structure.
To reveal structure, indentation must be consistent.
Using the TAB key to indent is stupid because nobody agrees about where tab stops are. The UNIX standard is every 8 columns. Xcode has Cmd-] to indent and Cmd-[ to outdent. Vi has >> to indent by shiftwidth and <lt; to outdent by shiftwidth. Emacs has more indentation support than you can shake a directory tree at, starting with Ctrl-X TAB.

See also Rob Pike's Notes on Programming in C.