Free software by Richard A. O'Keefe

Lenses for Erlang

lens.erl is a crude lens library for Erlang. It represents a lens as a triple {Get,Put,Upd} where

Think of a lens as a way of accessing a B field from an A value.

lens1.erl is an earlier draft that used a pair {Get,Put} — this works but is less efficient for updates. This whole thing really needs cross-module inlining to work well, but given that, it would solve the "how do I update nested fields" problem.

A faster random number generator for Java.

Random48.java uses the same 48-bit linear congruential generator as java.util.Random, but designing it not to be shared between threads (which doesn't really make sense for a random number generator anyway) makes it 5 times faster. It also provides a useful .toString() and a constructor that takes a String() argument, with the text representation looking like a Java identifier, so you can easily save and restore Random48 states using plain text files. RandomTime.java is the program I used to measure the speed of the generator.

Linear congruential generators have known limitations, so if you seriously want the best available generator, this isn't it. This is for you if you were happy with the results from java.util.Random but wished it were not so sluggish.

Lists with O(1) concatenation and reversal

The Reverse-Append-Unit-eMpty data structure is a list-like data structure that has constant time append and reverse, and linear time batch operations. Head and tail are not constant time; they are not even logarithmic time in the worst case. This is not a very sophisticated data structure, but that's the point: to show that quite an unsophisticated data structure can do surprisingly well. This version is in SML and works with SML/NJ 110.70 and MLton June 15, 2009.

ANSI Smalltalk Compiler and library

For some years I have been building a static compiler for ANSI Smalltalk. This compiles a variant of ANSI Smalltalk -- all the ANSI classes and methods in ansi.st, plus a lot more in other files -- to ANSI C89, which can then be compiled by gcc or Sun's C compiler. It is routinely tested on SPARC/Solaris and Intel/OSX; it is also periodically built and tested on Intel/OpenSolaris, Intel/Linux, Intel/OpenBSD (versions from 4.7 to 5.6), and Intel/Windows 7 + SUA and Cygwin. I have acquired ARM and MIPS tiny boards with the intention of testing on this, but have been too busy as yet.

Some things are still missing or incomplete:

The main thing that's woefully incomplete is the documentation, although

so that's about 274 pages of documentation...

Source code in Smalltalk comes to
Raw linesSLOCarea
183k106klibrary
24k18ktest files
26k17kexamples
30k20kRosettaCode solutions

Perhaps the main departure from Smalltalk common practice is that I will not use (though I do provide) #shouldNotImplement. Of course, having a batch compiler instead of an IDE, and allowing embedded C instead of calling primitives, aren't traditional either.

While it's not finished it is already useful and all of the code is completely free to anyone who wants it for any purpose as long as they don't claim credit for it. In particular, Smalltalk code in the libraries may be used freely by the maintainers of any Smalltalk system, including commercial ones.

The code is in astc-1711.tar.

One thing I was particularly interested in was just how well a fairly naive compilation-via-C strategy would go, especially considering that dynamic dispatch is done by binary search -- an idea swiped from SmallEiffel/SmartEiffel. The answer is that it goes very well, about as well as VisualWorks non-commercial, sometimes better, sometimes worse. One test, processing a 180 MB XML file, took 12 times as long as the corresponding C code, but considering all the work it takes to stuff a character into a string, that's going to be hard to improve much without flow sensitive type inference, which currently isn't tried. This was after all started to provide a naive reference point. Perhaps the weakest performance issue is the boxing and unboxing of floating-point numbers, and I do have ideas about that which I have not begun to work on yet.

The major practical problem is that the C file it produces is hundreds of thousands of lines of C.

This Smalltalk system is also serving as an education in Unicode. Currently, the compiler only accepts Latin 1, but the run time system handles UTF8 and a tolerably wide range of 8-bit encodings. By wrapping byte streams, you can also use 16- and 32-bit Unicode, and even SCSU. The next major Unicode task will be handling character classification.

Comparing files against model answers

pcfpcmp was written for Programming Contest judges to use, so that problems with floating point numbers in their output could be used. Details are in the file README.pcfpcmp and source code and an example are in an uncompressed `tar' file pcfpcmp.tar (33kB).

Token styling and extracting for several programming languages

m2h is a program that can tokenize most of the programming languages I use, after a fashion. It can be used for at least five purposes:

The sources are provided as a gzipped tar file, m2h.tgz. Documentation is in the README file, plus in core.c.

Beware: this is not polished code, to put it kindly. There are a couple of features that are not implemented yet. It's fairly easy to add new languages except for a minor issue (which has held up support for Lisp and Haskell) and a major one (which hasn't been a problem for me yet, but will be). The minor issue is that the tokenising framework doesn't handle nesting comments. The major issue is that there is no support at all for wide characters or for encodings other than ISO Latin 1, and as part of this, C/C++/Java Universal Character Names (\uxxxx \Uxxxxxxxx) are not processed correctly. It's free software and worth what you paid for it, but I have found it very useful and you may too.

Beware: this is not a pretty-printer, just a token styler. It does not add or remove line breaks or indentation for any language. Although I've had some ideas about adjusting inter-token spacing, they have NOT been fully implemented. All this does is filter tokens and maybe add some markup.

UnRolled Strict Lists

Unrolled strict lists are spine-strict lists which have been unrolled, in this case by a factor of four, so that each "box" contains 4 elements, not 1. This saves space, and should also save time. I've provided versions for Haskell (Ursl.hs) and Clean (Ursl.dcl and Ursl.icl). The Clean versions were for Clean 1.3 and have not been tried in Clean 2.x. If anyone is interested I'll be happy to whip up an SML version. An Erlang version also exists, but has not yet been tested.

Four XML tools:

The files are