Linear congruential generators have known limitations, so if you seriously want the best available generator, this isn't it. This is for you if you were happy with the results from java.util.Random but wished it were not so sluggish.
The Reverse-Append-Unit-eMpty data structure is a list-like data structure that has constant time append and reverse, and linear time batch operations. Head and tail are not constant time; they are not even logarithmic time in the worst case. This is not a very sophisticated data structure, but that's the point: to show that quite an unsophisticated data structure can do surprisingly well. This version is in SML and works with SML/NJ 110.70 and MLton June 15, 2009.
For some years I have been building a static compiler for ANSI Smalltalk. This compiles a variant of ANSI Smalltalk -- all the ANSI classes and methods in ansi.st, plus a lot more in other files -- to ANSI C89, which can then be compiled by gcc or Sun's C compiler. It is routinely tested on SPARC/Solaris and Intel/OSX; it is also periodically built and tested on Intel/OpenSolaris, Intel/Linux, and Intel/Windows 7 + SUA & Cygwin. Some things are still missing or incomplete:
The main thing that's woefully incomplete is the documentation.
Perhaps the main departure from Smalltalk common practice is that I will not use (though I do provide) #shouldNotImplement. Of course, having a batch compiler instead of an IDE, and allowing embedded C instead of calling primitives, aren't traditional either.
While it's not finished it is already useful and all of the code is completely free to anyone who wants it for any purpose as long as they don't claim credit for it. In particular, Smalltalk code in the libraries may be used freely by the maintainers of any Smalltalk system, including commercial ones.
The code is in astc-1411.tar.
One thing I was particularly interested in was just how well a fairly naive compilation-via-C strategy would go, especially considering that dynamic dispatch is done by binary search -- an idea swiped from SmallEiffel/SmartEiffel. The answer is that it goes very well, about as well as VisualWorks non-commercial, sometimes better, sometimes worse. The latest test, processing a 180 MB XML file, took 12 times as long as the corresponding C code, but considering all the work it takes to stuff a character into a string, that's going to be hard to improve much without flow sensitive type inference, which currently isn't tried. This was after all started to provide a naive reference point.
The major practical problem is that the C file it produces is hundreds of thousands of lines of C.
This Smalltalk system is also serving as an education in Unicode. Currently, the compiler only accepts Latin 1, but the run time system handles UTF8 and a tolerably wide range of 8-bit encodings. By wrapping byte streams, you can also use 16- and 32-bit Unicode, and even SCSU. The next major Unicode task will be handling character classification.
pcfpcmp was written for Programming Contest judges
to use, so that problems with floating point numbers in their output
could be used. Details are in the file
and source code and an example are in an uncompressed `tar' file
m2h is a program that can tokenize most of the programming
languages I use, after a fashion. It can be used for at least five
m2h -fs sourcefile...
m2h -ex sourcefile...
m2h -iv sourcefile | spell -b
m2h -fl foo.c >foo-c.tex
m2h -fr foo.adb > foo-adb.r
m2h -fh foo.erl > foo-erl.htm
m2h -fcemits ANSI terminal control sequences for the colours;
m2h -ffemits HTML with <font> tags.
The sources are provided as a gzipped tar file, m2h.tgz. Documentation is in the README file, plus in core.c.
Beware: this is not polished code, to put it kindly. There are a couple of features that are not implemented yet. It's fairly easy to add new languages except for a minor issue (which has held up support for Lisp and Haskell) and a major one (which hasn't been a problem for me yet, but will be). The minor issue is that the tokenising framework doesn't handle nesting comments. The major issue is that there is no support at all for wide characters or for encodings other than ISO Latin 1, and as part of this, C/C++/Java Universal Character Names (\uxxxx \Uxxxxxxxx) are not processed correctly. It's free software and worth what you paid for it, but I have found it very useful and you may too.
Beware: this is not a pretty-printer, just a token styler. It does not add or remove line breaks or indentation for any language. Although I've had some ideas about adjusting inter-token spacing, they have NOT been fully implemented. All this does is filter tokens and maybe add some markup.
Unrolled strict lists are spine-strict lists which have been unrolled, in this case by a factor of four, so that each "box" contains 4 elements, not 1. This saves space, and should also save time. I've provided versions for Haskell (Ursl.hs) and Clean (Ursl.dcl and Ursl.icl). The Clean versions were for Clean 1.3 and have not been tried in Clean 2.x. If anyone is interested I'll be happy to whip up an SML version. An Erlang version also exists, but has not yet been tested.
The files are