taken_pieces count: [:each | each is_pawn]reads quite naturally. If you are going to refer to single elements,
int pawn_count = 0; for (int i = 0; i < taken_piece_count; i++) if (taken_piece[i].is_pawn()) pawn_count++;reads better. The same person writing the same algorithm in C and Smalltalk might reasonably use a singular name in C and a plural name in Smalltalk.
Alongtimeagowhenpeoplestartedtowritetheyrantheirwordstogether. Suchtextisnoteasytoread. ItIsNotMuchBetterWhenYouUseInternalCaps, TheBaStudlyCapsStyle, BecauseWhatYouGetIsStillLongBlackBlobs. DoNotBeSoRudeToYourReadersIfYouCanPossiblyAvoidIt.
People have to be able to decode identifiers.
Distinguish a workplace that is unionised
from a chemical that is un_ionised
.
Distinguish a man who works for the UN, a UN_man, from a mutant
who isn't quite human, an unman.
You may want to distinguish things that have capital
letters in ordinary text (like proper names and acronyms)
from things that do not, so it is unwise to use capitals
for separation. For example, George Boring
presumably isn't boring. According to
a family name
site, Superman was drawn in the 1940s and 1950s by a Boring artist
(but not a boring one).
Compare
Romberg integral of(f) from:(0) to:(1) epsilon:(0.01)
— Algol 60
RBGINT(F, 0, 1, 0.1)
— Fortran 66
(Romberg-integral :fun f :from 0 :to 1 :epsilon 0.01)
— Lisp
Romberg.integral(FUN=f, from=0, to=1, epsilon=0.01)
— R
Romberg_Integral(Fun => F, Lower_Bound => 0, Upper_Bound => 1,
Epsilon => 0.01)
— Ada
RombergIntegral of: f from: 0 to: 1 epsilon: 0.01
— Smalltalk-80
Romberg_Integral of: f from: 0 to: 1 epsilon: 0.01
— modern Smalltalk
rombergIntegral(f, 0, 1, 0.01)
— Java
If you are not already familiar with Romberg integration, in which of these languages is it obvious that Romberg is a proper noun? In which might Romberg be confused with a colour space?
Make the breakdown of your names into separate words obviously unambiguous. |
Do not confuse Boring people with boring people. |
Use keyword arguments if you have them. |
Look at the examples above. Three of the arguments are numbers. There are six possible ways they might be ordered. The Java version cannot be self-documenting.
If you are not familiar with Romberg integration, what does
the name Romberg_integral
tell you, all by
itself?
The staff member who wrote the original version of these slides exhorting you to "Aim for ... completely self-documenting code" has an average of one comment line for every 12 SLOC in his magnum opus. And in my view, because I have struggled to understand it, it does not have enough comments! And he was aiming for self-documenting code.
Nobody struggling to understand an unfamiliar body of code ever said “I wish this had fewer helpful comments.” |
Some people are adamant that you should WRITE CONSTANTS IN UPPER
CASE so that you can tell them from constants. (Sorry, tell them from
variables. It doesn't really make sense either way.)
In antique C, where
constants were normally declared as macros, the const
keyword not having been adopted, that made sense, because mutable
variables and #define
d constants followed different
scope rules.
Other people are adamant that variables and constants should be named exactly the same way. After all, if a variable isn't changed in some region of code, why do you even care? (Java 8 has the notion of “effectively final” variables.) We have better things to spend notational capital on.
I suggest a meta-guideline. If a programming language is such that you routinely need to know whether a name is a constant or a variable, then by all means use capitalisation to distinguish them. If, however, they are mostly interchangeable, then don't.
Don't use case style to distinguish constants from variables unless the reader of your code needs to know. |
Consider the problem of specifying a point near the Earth's surface. We clearly need at least 3 numbers, and in order to recognise the same point if we see it again, it's clear that we'd like these co-ordinates to be referred to a reference frame that rotates rigidly with the Earth.
Centuries of tradition tell us that the answer is latitude (North/South), longitude (East/West), and height above mean sea level.
So now we have
typedef struct geo { double lat, lon, hgt; } geo;
What do we have?
So now we have reached
typedef struct geo { double lat; // -90 (S) to +90 (N) degrees, 0=equator double lon; // -180 (W) to +180 (E) degrees, 0=Greenwich double hgt; // -100 to +200 km, 0 = mean sea level } geo;
Are we done yet? And did we really need those comments? Couldn't we have used
typedef struct geographical_location_3d { double latitude_in_degrees_north_of_equator; double longitude_in_degrees_east_of_Greenwich; float height_in_metres_above_mean_sea_level; } geographical_location;
and had self-documenting code?
Just how long do you think you would be willing to type those names?
At this point, someone is bound to say “but my IDE offers completion based on the first few letters, so I don't have to type much”. That works fine until you need
latitude_in_degrees_north_of_equator
latitude_in_seconds_north_of_equator
longitude_in_degrees_east_of_Greenwich
longitude_in_degrees_east_of_Paris
in the same program.
Above all, those names do not tell us everything we desperately need to know!
Some programming languages let you say more than others. For example, we can express range and precision information precisely in Ada, where the compiler can see them, check them, and take advantage of them. Here's what it looks like.
-- ISO 6709:2008 geographical point representation. -- The Coordinate Reference System (CRS) WGS_84 -- (World Geodetic System 1984, as revised in 2004) is always used. -- Latitude and longitude are measured in degrees. -- +ve latitude is north; +ve longitude is east (0 = prime meridian). -- Height is measured in metres. -- The deltas are chosen for about 10 cm resolution. type Latitude_Range is delta 0.000_001 digits 8 range -90.0 .. 90.0; type Longitude_Range is delta 0.000_001 digits 9 range -180.0 .. 180.0; type Height_Range is delta 0.1 digits 7 range -100_000 .. 200_000; type Geographic_Location is record Latitude : Latitude_Range; Longitude : Longitude_Range; Height : Height_Range; end record;
The Novopay system is implemented in Oracle's PL/SQL, which lets you write this:
-- ISO 6709:2008 geographical point representation. -- The Coordinate Reference System (CRS) WGS_84 -- (World Geodetic System 1984, as revised in 2004) is always used. -- Latitude and longitude are measured in degrees. -- +ve latitude is north; +ve longitude is east (0 = prime meridian). -- Height is measured in metres. -- The deltas are chosen for about 10 cm resolution. DECLARE SUBTYPE Latitude_Range IS NUMERIC(8,6); SUBTYPE Longitude_Range IS NUMERIC(9,6); SUBTYPE Height_Range IS NUMERIC(7,1); TYPE Geographic_Location IS RECORD ( Latitude Latitude_Range, Longitude Longitude_Range, Height Height_Range);
where we can state the precision but not the true range.
We can't even do that in C. The best we can do is
/* ISO 6709:2008 geographical point representation. The Coordinate Reference System (CRS) WGS_84 (World Geodetic System 1984, as revised in 2004) is always used. Latitude and longitude are measured in degrees. +ve latitude is north; +ve longitude is east (0 = prime meridian). Height is measured in metres. We want to have about 10 cm resolution, so single precision floats are NOT adequate for latitude & longitude. */ typedef double latitude_range; // -90 to +90 typedef double longitude_range; // -180 to +180 typedef double height_range; // -100 km to +200 km in m. typedef struct geographic_location { latitude_range latitude; longitude_range longitude; height_range height; } geographic_location;
If there is important information about something in your program which cannot be expressed in your programming language, it must be expressed in the name or in a comment. |
If the information should be expressed every time the thing is mentioned and can be expressed in the name, put the information in the name. If the information is too bulky to go in the name, it must go in a comment. |
Interfaces must be commented adequately, and it is the client of the interface who determines what counts as adequate, not the author of the implementation. |
There are annotation editors that let you keep a file of annotations in parallel with a source file. This is especially useful if you want to make notes about a source file that you are not able for some practical or legal reason to edit.
The Eclipse IDE supports such annotations.
Review Board allows annotations as part of its support for code review. You should visit the Review Board web site and take a look.
There is a package 'annot.el' for Emacs which offers incredibly simple
annotation. There are several ports of Emacs for Mac OS X, but AquaMacs
would be top of the list. Annotations are stored in files in your
~/.annot
directory. As long as you don't change the file,
or only change it through Emacs with annot.el loaded, no problems.
Change it with another editor, the MD5 hash changes, and annot.el doesn't
recognise the file. Still some work to do there. But it is perfect for
annotating things you don't change.
Annotations are also great for tools. Compiler error messages can be converted to annotations. Intel have a tool that can generate parallelism annotations. Seeing “comments” from a static checker in situ without changing the file is handy.
One of the slides says “Highly commented parts of code have the highest error rate (Lind and Vairavan 1989). What did Lind and Vairavan actually find?
They studied one program. It was a big one, with thousands of procedures, some in (old) Fortran and some in Pascal.
There is a dogma that short functions are best, that anything over one page is bad. They actually found that short functions (1-50 lines) had nearly twice the error rate of medium ones (51-100 lines).
They found that comments are correlated with errors for two reasons.
They also found that the slide has causality backwards. The slide seems to say “if you put lots of comments in your code, it will then turn out to have lots of errors”. In fact they suggested the opposite: code that had lots of errors ended up with lots of comments.
That's why the next line on the slide says “Don't comment tricky code, re-write it!”. Of course, that assumes that it is possible to eliminate the trickiness by rewriting. If only that were true.
First we get world peace. Then we convert everyone to the same religion. Then we eliminate poverty. And then we get people to agree on a layout style.
Some things are not a matter of taste.
The purpose of indentation is to reveal structure. |
To reveal structure, indentation must be consistent. |
Using the TAB key to indent is stupid because nobody agrees about where tab stops are. The UNIX standard is every 8 columns. Xcode has Cmd-] to indent and Cmd-[ to outdent. Vi has >> to indent by shiftwidth and <lt; to outdent by shiftwidth. Emacs has more indentation support than you can shake a directory tree at, starting with Ctrl-X TAB. |
See also Rob Pike's Notes on Programming in C.