Delenda Est Preprocessor! Since the publication of the Erlang book, Erlang has acquired a preprocessor. My task is to consider what changes must be made to Erlang if it is to be suitable for developing large (10 to 100 million SLOC) programs. This includes additions, like -claim and abstract patterns, and it also includes deletions. At the very top of the list of things that need deletion and should on no account be part of the ``Standard Erlang Specification'' stands the preprocessor, head and shoulders above all the other changes. In order to explain why, I must first review what the preprocessor does, what that is useful for, and what is wrong with the way it does it. I must then demonstrate that there are better alternatives. It is noteworthy that several ``declarative programming'' communities seem to suffer from what I can only describe as ``C envy''. For example, the BSI Prolog committee ignored the fact that DEC-10 Prolog had a perfectly good syntax for non-decimal integer literals, which had been successfully copied and extended in other Prolog systems, and were hell-bent on conforming Prolog integer syntax to the notably inferior syntax of C. One Prolog company radically changed their operator precedence table between two releases, throwing compatibility with other Prologs (and previous versions of their own product) to the winds in order to match C syntax, which they actually failed to accomplish. There are two glaring examples of C envy in the Erlang specifications. The first is the formalism used to describe the syntax. They could have used any of several variants of EBNF that have found favour in international standards. There is even a standard for EBNF that they could have followed. They didn't. That's understandable: there is a great deal of advantage to be obtained by using YACC, or ML-YACC, or YECC syntax, not the least of them being a machine check that every symbol is appropriately declared and that the grammar is LALR(1). But they didn't do that either. Instead, they used the formalism invented for C, which is clumsier than EBNF. Let's take just one example to prove that point, from section 8.2. EBNF: module_declaration = file_attribute* module_attribute (header_attribute | anywhere_attribute)* function_declaration (function_declaration | anywhere_attribute)* YACC: module_declaration = file_attributes module_and_attributes declarations; file_attributes = /*empty*/ | file_attributes file_attribute; module_and_attributes = module_attribute | module_and_attributes header_attribute | module_and_attributes anywhere_attribute; declarations = function_declaration | declarations function_declaration | declarations anywhere_attribute; What's actually offered: ModuleDeclaration: FileAttributesopt ModuleAttribute HeaderFormsopt ProgramForms FileAttributes: FileAttribute FileAttributes FileAttribute HeaderForms: HeaderForm HeaderForms HeaderForm HeaderForm: HeaderAttribute AnywhereAttribute ProgramForms: FunctionDeclaration ProgramForms FunctionDeclaration ProgramForms AnywhereAttribute Why does this matter? Using EBNF, the thing can be said in 6 lines, which can be understood at a glance. No superfluous names are introduced. Using YACC, the thing can be said in 13 lines. It's no longer easy to check manually, but the benefit is that it can easily be checked mechanically (as, of course, can the EBNF, using a tool like PCCTS); since Erlang comes with a YACC clone (more C envy: why not a PCCTS clone?) called YECC, the Erlang specification grammar could then have been checked by Erlang itself. The notation used in the Erlang specification takes 15 lines and requires the use of subscripts. Not only that, it cannot express NT* easily; most of the Xopt occurrences are really X* not X?. It's easily the bulkiest of the three, but where is the automatic checker and where the parser generator? The only benefit the chosen notation has is its familiarity to C programmers who don't use YACC. I personally find it confusing that Erlang uses the same symbols as Prolog for equality tests: =:= and ==, but switches their interpretations. Can it be an accident that this switch makes Erlang's "arithmetic equality" symbol the same as C's? I don't want to make too much of this point, because where it would have made sense to use the same symbols as either Prolog (/\, \/, ><, \) or C (&, |, ^, ~) Erlang has gone its own way, so == may be a happy accident. The instance of C envy that really matters is the preprocessor. There have been many preprocessors in the history of computing, from character oriented ones like TRAC, GPM, and M4, to coroutine-in-the-compiler ones like PL/I, Burroughs Algol, and SAIL. There are Lisp macros, Scheme hygienic macros, and Prolog's term_expansion/2. There's the Pop-2 "a macro is a user-defined procedure invoked by the compiler to transform the input token stream" approach and the WPop "forms" approach built on top of it (very nice, that was, using nonterminals as macro argument types). There's the new macro facility for Dylan. There is a vast range of ways to do preprocessing, including slurping an entire file into a SNOBOL program and repeatedly hitting it with grammar match-and-replace operations. The Erlang preprocessor is unabashedly a pastiche of the C one. It inherits many of the C preprocessor limitations: - macros can only be `variable-like' or `function-like' - you cannot declare an infix or distfix operator as a macro, so language extensions like 'fun end' are beyond the power of the preprocessor - there is no macro iteration - there is no macro conditional - macros cannot query the properties of identifiers - there are no varargs macros (there will be in C9x) - macros cannot do arithmetic (e.g. there is absolutely no way to do -define(bar, 42). -define(ugh, 137). -define(foo(), (?bar+?ugh)). so that ?foo() can be used in a pattern) - a macro cannot have discontiguous consequences (e.g. a macro cannot act as a declaration that will have consequences in later clauses) - there is no way to temporarily redefine a macro and then restore the original definition (like pushdef and popdef in M4) and adds a few of its own: - a macro cannot expand to more than one function - conditional compilation cannot do arithmetic tests - conditional compilation cannot govern part of a declaration, only entire declarations. - there is no -elif - shockingly, there is no macro expansion in an -include directive. One of the most useful features of the C preprocessor that you can do /* filenames.h */ #define foo "dir/foo.h" #define bar "sub/bar.h" /* end filenames.h */ /* somefile.c */ #include "filenames.h" #include foo #include bar ... and thereby keep operating-system and installation-specific header names out of every source file save one. For a programming language to be used under Windows NT, UNIX, and QNX, amongst others, this is a serious issue. - in a macro argument, all sorts of brackets must be balanced (no bad idea). A summary of the preprocessor is -define(Macro, Replacement). -undef(Macro). -ifdef(Name). or -ifndef(Name). -else. -endif. -include(FileName). -include_lib(FileName). It's not a lot, is it? What is it good for? Conditional compilation. There are two levels of conditional compilation. C preprocessor syntax is almost completely separate from C language syntax, so the C preprocessor can both select alternatives within a function and select alternative blocks of complete declarations. The Erlang preprocessor can only select blocks of complete declarations, so to get the effect of int main(int argc, char **argv) { #ifdef MacOS ccommand(&argc, &argv); #endif ... } Erlang would have to do -ifdef(MacOs). -define(if_macos(X,Y), X). -else. -define(if_macos(X,Y), Y). -endif. foo(Argc, Argc) -> {X,Y} = ?if_macos(ccommand(Argc, Argv), {Argc, Argv}), ... But why should we use such a clumsy dodge when we have a whole range of perfectly good conditional constructions in the language? Because we can only use those conditional constructions when the alternatives to select from are expressions to be evaluated. This example could be done as -ifdef(MacOS). #is_macos() -> true. #is_not_macos() -> false. -else #is_macos() -> false. #is_not_macos() -> true. -endif. foo(Argc, Argv) -> {X,Y} = if #is_macos() -> ccommand(Argc, Argv) ; #is_not_macos() -> {Argc,Argv} end, ... We cannot use these conditional constructions in patterns: foo([]) -> []; foo([?if_macos($:,$/)|Cs]) -> Cs; foo([_|Cs]) -> foo(Cs). would work, but there appears to be nothing we could replace it with. Indeed, even the abstract patterns I have proposed elsewhere could not do this job, because the arguments of a pattern represent outputs. But we could do it in the guard. Assume yet another proposed extension: SimpleGuard --> '(' Guard {';' Guard}... ')' and we can write foo([]) -> []; foo([C|Cs]) when (#is_macos(), $: = C ; #is_not_macos(), $/ = C) -> Cs; foo([_|Cs]) -> foo(Cs). The most interesting point here is that the preprocessor has to be an extra-linguistic hunk of thaumatronics because it doesn't work with terms, only with token lists, while the alternative using abstract patterns is built entirely out of normal computational elements. Abstract patterns don't have to be expanded out at compile time, although they may be. Whether the guard in this little example is simplified at compile time or executed in full at run time makes no detectable difference to the observable behaviour. Of course, the normal way to handle this particular example using abstract patterns would be to have -ifdef(MacOS). ....as before... #directory_separator() -> $: . -else. -ifdef(Windows). ...similar... #directory_separator() -> $\\ . -else. ... #directory_separator() -> $/ . -endif. -endif. foo([]) -> []. foo([$directory_separator()|Cs]) -> Cs; foo([_|Cs]) -> foo(Cs). which is of course a possibility open to macros as well, but this way can be done at run time, by an interpreter, staying close to the source form. We can use abstract patterns to include or exclude clauses from functions by adding guards: bar(...) when #feature_wanted(1) -> .. bar(...) when #feature_wanted(2) -> ... bar(...) when #feature_wanted(3) -> ... Oddly enough, this cannot easily be done with the Erlang preprocessor, because (a) the preprocessor can only include or exclude entire declarations, not clauses, and (b) Erlang macro expansions must be uniform, there is no equivalent of #feature_wanted(1) -> true; #feature_wanted(2) -> false; #feature_wanted(3) -> true. for macros. I have checked every occurrence of -ifdef or -ifndef in the Erlang 4.4 Development Environment. There aren't many. erl_parse -ifdef(JAM). -compile(...) Pulls in a special transformer for JAM-specific optimisation. Cannot be handled by abstract patterns or inlining. disk_log -ifdef(DEBUG). verbose(...) -> Can be done by normal functions and inlining. filename -ifndef(nt40). -define(win32, true). -ifdef(win32). select directory/drive separators and file name parsers. If we want to choose between alternative definitions of a function, provided all alternatives are syntactically legal, we can just do f(X1,...,Xn) when win32 = #os() -> f_win32(X1,...,Xn). f(X1,...,Xn) when unix = #os() -> f_unix( X1,...,Xn). f(X1,...,Xn) when macos = #os() -> f_macos(X1,...,Xn). f(X1,...,Xn) when rtems = #os() -> f_rtems(X1,...,Xn). We don't even need abstract patterns; a `case' will do: f(X1,...,Xn) -> case os() of ; win32 -> f_win32(X1,...,Xn) ; unix -> f_unix( X1,...,Xn) ; macos -> f_macos(X1,...,Xn) ; rtems -> f_rtems(X1,...,Xn) end. Yes, this means the compiler has to know what the target OS is (so bye-bye cross-platform .jam files), but so does the use of -define and -if mean that, and the compiler doesn't even know it has happened so cannot tag the .jam file to indicate what system it was specialised for (as it should). Alternatively, and this is what macros don't buy is, the decision can be left to run-time. os -ifdef(nt40). -ifdef(win32). Selects the os type and version. The UNIX version of this code is wrong; the information I get is wrong on this machine. It should be built on top of uname(2) or sysinfo(2), which should be emulated in the C runtime on systems that do not support the underlying system enquiry directly. Selects the string to transmogrify a command for passing to system(). Do I need to point out that it gets UNIX commands wrong? If you doubt me, try X = os:cmd("echo 'foo * bar'"). The result is X = "foo\n" when it should be X = "foo * bar\n". unix:cmd/1 is similarly broken. udp -ifndef(no_ets) selects one of two data base implementations for pids and fds. It selects a group of functions. This is one which I feel is better handled by configuration parameters on modules passed on to submodules. In this case, however, the code that is selected appears to be independent of the rest of the module. It could just as well have been an independent module with two different implementations, selected at installation time. That would also have made it possible to replace that submodule. The thing _can_ be done with ordinary functions, `case', and inlining. I would have structured the code rather differently. gtk_port_handler -ifdef(nt40). Selects one of two function implementations. Can be done with ordinary functions, `case', and inlining. igsemantic -ifdef(debug). -define(io(...), ...). There is one thing that this approach to conditional debugging output can give you. ?io(X,Y) doesn't evaluate X,Y when output is not wanted. On the other hand, this is precisely what is confusing about it: it is notoriously the case that novices put side-effective code in C assert() calls and then wonder why their programs break when NDEBUG is enabled. As it happens, none of the occurrences of ?io(...) in this file has any function calls among its arguments, so this could have been io(X,Y) -> case debug() of ; true -> io:format(X, Y) ; false -> ok end. without any special machinery at at. Note that different files use different debug flags without any apparent pattern. Note also that debug() needs to be module-specific. ig.hrl -ifdef(silent). is output to be suppressed or not. -ifdef(debug). is debugging output to be produced or not. The one thing that really appears to need -if is the example of plugging in a JAM-specific optimiser. I am in general proposing machinery that strictly speaking takes effect at run time but _can_ be processed at compile time by a compiler of very moderate intelligence. This is the only one that really needs compile-time machinery. But it is a compiler option, and could just as well go in a Makefile-like script. Better still, it shouldn't be needed! The thing we cannot do with abstract patterns is the thing we can do with Erlang's -if, and that is select entire declarations and/or directives. Except for its dependence on -define, Erlang's conditional inclusion features seem pretty harmless. However, given their limited power, the things we can do without them, and the better approach that is possible (giving modules pattern parameters to express variant and version configuration information, using that to select other modules and submodules), they also seem pretty pointless. Without in any way explaining, this is how I would really like to do the directory separator example. -module(path_name(unix, [1,0])). -export([#directory_separator/0, ...]). #directory_separator() -> $/ . ----------------------------------------------------------- -module(path_name(dos, [1,0])). -export([#directory_separator/0, ...]). #directory_separator() -> $\\ . ----------------------------------------------------------- -module(path_name(macos, [1,0])). -export([#directory_separator/0, ...]). #directory_separator() -> $: . ----------------------------------------------------------- -module(foo(OS)). -use_early(path_name(OS,[1,0])). -import(path_name, [#directory_separator()/0, ...]). foo([]) -> []; foo([$directory_separator()|Cs]) -> Cs; foo([_|Cs]) -> foo(Cs). The configuration process basically ``proves that a consistent configuration exists'' by finding one and then pulls the pieces in; this approach, which I mean to describe more fully later, basically lets the compiler into the secret of what the different variants and versions are about and how they connect. (Hey, when you asked me to look at Erlang, you knew Prolog was going to make a comeback somewhere! ) Source file inclusion Source file inclusion is a popular concept. The Fortran 90 designers thought they had a better approach, and they did: modules and interfaces. But due to popular demand, Fortran 90 included INCLUDE as well, presumably to deal with legacy code that had used the feature. I have used the feature often and have been reasonably fond of it. However, -include has a number of rather horrible aspects, which are particularly bad news for Erlang. The first horrible thing is that Erlang is supposed to be pretty much machine-independent. Is the machine big-endian or little-endian? Erlang can't tell and has no reason to care. Is the machine's native integer arithmetic 16 bits, 32 bits, or 64 bits? It doesn't matter: you're guaranteed at least 60 bits. Floating point arithmetic properties can show through, alas, but Erlang isn't intended to support heavy floating point calculations. Are pointers 16 bits, 32, or 48? Erlang doesn't know and doesn't care. Can all pointers be cast to integer and back, and if so, what sort of integer? Erlang doesn't know and doesn't care. Is the windowing system Mac Toolbox, Display Postscript, X (R5? R6?), Windows, OS/2 Presentation Manager, or something else? The implementation of GS cares, but nothing that uses it knows or cares. Does the implementation environment prefer to work with EBCDIC, ASCII, XNS, JIS, Big Five, or what? Erlang uses Unicode, so it doesn't need to care. The only area where operating-system issues force themselves on an Erlang programmer's attention concern files and devices, especially file names. Putting file names in Erlang source files ties them to a particular file system, quite easily to a specific installation. Not the object files: the source files are unnecessarily file-system-specific. C has the same problem, but C uses the preprocessor to patch around the problem. If you don't know whether the header you want will be "/usr/local/include/frobboz/snark.h" or "C:\FROBBOZ\SNARK.H" or "Frob 98:Boz Includes/Snark.h" then you can put a suitable macro definition in a header, include that header, and then do #include snark_header which leaves you with precisely one literal include to fix in each file. Perversely, Erlang forbids this. It was to avoid this problem that Quintus invented `library(_)', which was supposed to be one member of a framework of user-defined logical paths, not altogether unlike TOPS-10 logical paths (surprise surprise). It was to avoid a very similar problem with installation dependency that Macintosh Common Lisp had a form of logical path names well before ANSI Common Lisp put them in, and it was for this precise reason that ANSI Common Lisp has pathname objects and logical pathnames. Erlang could work around the problem by defining its own logical file name representation for use in -include and -include_lib. A tuple will do: portable_source_file_name = '{' [ path_name ',' ] ['[' [directory_name {',' directory_name}...] ']' ',' ] file_name [',' extension] '}'. path_name = directory_name = file_name = extension = atom. If the path_name is missing, the current directory is used as the base. If the directory list is missing or empty, the directory is the base directory, otherwise it is a list of subdirectories. Whether it is the path_name or directory list that is missing can be determined by the type of the first element. A missing extension indicates no extension at all, not even a dot. For example, instead of -include("thingy.hrl"). -include("previous/thingy.hrl"). -include("/home/users/okeefe_r/demo.jnk"). -include("/home/users/okeefe_r/erlang/demo.jnk"). one might have -include({thingy,hrl}). -include({[previous],thingy,hrl}). -include({home,demo,jnk}). -include({home,[erlang],demo,jnk}). Aside from choice of punctuation marks, this looks a lot like VMS. It's no surprise that one file name syntax designed to provide a degree of location independence via logical path names resembles another. If -include and -include_lib allowed this syntax for file names (in addition to or in place of the existing strings) and if macros could be expanded in these forms, one of my major objections to the preprocessor would disappear. But only one. I note that using such tuples to open files would provide a welcome degree of file system and installation independence at run time as well. What else is horrible about -include? There are functional languages with concurrency extensions a-plenty. Concurrent ML, LCS, Scheme with `engines', even some concurrency extensions to Haskell. The distinguishing feature of Erlang is that it is supposed to be useful for building large long-lived high availability systems that cannot be shut down to modify the software. The ability to move from one consistent configuration (in the `configuration management' sense) to another consistent configuration with a minimum of disruption to vital services is what Erlang is all about. We want to avoid inconsistent configurations. As always, there are two sides to such avoidance: early-detection-and-prevention so that we don't even TRY to move an an inconsistent configuration, and late-detection-and-backout so that we can notice a bad move just before it is too late and back out of a change before doing any serious damage. There is a major flaw with the System Application Service Library (check name), which is that it begins to address this issue, but puts the vital information in entirely the wrong place. Since modules are the units of loading and replacements in Erlang, the dependency information about a module belongs IN the module, not in some other file entirely that might not even be checked if a module is loaded some other way. Heartbreakingly, Erlang allows version control information to be attached inside modules, but for lack of any enforcement of the contents, the version information in the Erlang Development Environment 4.4 that I examined was a dog's breakfast of different ways of saying the same things. Version information needs not merely to be machine READABLE but machine TRACTABLE and machine CHECKED for plausibility. For example, if we are going to have date information in a file, it should be in a form such as -date_created(1997,11,11,15,20,21). -date_revised(1998, 3, 8,10, 5,53). May I suggest that modifying RCS to check for -xxx at the beginning of a line instead of $Magic anywhere in a line, and to replace the following material by suitable machine-tractable forms, would be an excellent thing for Erlang? The two central things about modules are encapsulation and controlled access between modules. By controlled I mean that the possibility of access is enforced by the machine and the actuality of access is scrutable by the machine. Erlang is actually somewhat weak here. -export does a weak job of controlling what goes OUT of a module (we need -export_to as described elsewhere), but there is no control over what may come INTO a module. The presence of an -import directive governing a function does not mean that the module does use that function, nor does the absence of an -import directive mean that the module does not use that function. Not only that, the Erlang style guide recommends not using the -import directive at all, thus removing what little information there is about what a module writer intended to do. It would be a good example of that ``controlled use of redundancy'' that characterises engineered software if there were an -use_only(Other_Module, [Functors]). directive which said that the only functors in Other_Module that this module should be allowed to call were those listed in such a directive. That would not imply any restrictions on other modules, not would it imply any restriction on calling closures returned by functions in Other_Module. In the absence of such a directive, an Erlang run-time system can determine a lower bound on the set of remote functions called by a module, and thus a lower bound on the set of modules needed by it. What has this to do with -include? Simply that -include breaks both these central aspects of modules. A file that is -included by two modules creates a strong form of COUPLING between them. Not only that, it is a form of coupling to which the run time system is necessarily completely blind. The run-time system knows nothing about -included files; even the compiler proper knows nothing about them, the -file directive notwithstanding. (If you run a source code generator on machine A, the -file directives it produces refer to files on machine A. Now copy the result to machine B and compile it. What does the compiler know about these files? Nothing. It doesn't even have any reason to believe that there ever were such files anywhere; the generator might have been lying or mistaken.) For concreteness, suppose that we have ugh.hrl foo.erl includes ugh.hrl foo.jam is the result of compiling foo.erl bar.erl includes ugh.hrl bar.jam is the result of compiling bar.erl Now an Erlang system loads the two .jam files. Next, we need to change the .hrl file for some reason. Can we reload it? No, there is no .jam file that corresponds to that file, nor can there be. The effects of the .hrl file on foo and on bar might be disjoint. The only way to ``reload the .hrl file'' is to hunt down every single module that includes it, recompile them, and reload them. More accurately, we do not have to worry about modules that have not been loaded, so we only have to check every loaded module to see if its source file included the file in question. If we have a useful set of constants, or a popular record type, then a change to the header that defines it may very easily force a massive recompilation and reload. Some compiler sophistication to ignore changes without real consequences will be needed. Do we NEED massive recompilations to get the effect of shared constants and records? No, abstract patterns provide a workable alternative How does -include violate encapsulation? Roughly, an -include imports information into a module at compile time, but the only way to find out WHAT information is imported is to inspect the .hrl source file. -included files ``don't have interfaces''. This is bad because to the extent that the lines following the -include use macros, they are totally vulnerable to the -include file. For example, consider -module(foo). -export([ick/1,ack/2]). -define(debug,true). -ifdef(debug). ick(X) -> show(ick,X), ack(X, X). -else. ick(X) -> ack(X,X). -endif. -include("ugh.hrl"). -ifdef(debug). show(L, X) -> .... -endif. Look ok? How do you know "ugh.hrl" doesn't -undef(debug)? You don't. File inclusion is about sharing information between modules. The right way to share information between modules is to have one module `own' it and export it, and for the others to explicitly import it. All such sharing needs to be properly declared in interfaces where people and machines can see it. I'm about to say harsh things about -define, but the baneful effects of -define are at least local. -include warps the module structure and interferes with Erlang's fundamental way of doing things. Unconditional macro replacement So what's wrong with -define? There are several different semantic levels at which one could define some kind of macro system. (1) The level of characters. This is where we find the classical systems like TRAC, GPM, and M4. The Reiser preprocessor for C was at this level. These tools are amazingly powerful, with some pretensions to being general-purpose tools. I've used M4 to good effect. (2) The level of tokens. This is where we find the Burroughs Algol macros, and to some extent the PL/I preprocessor. The ANSI C preprocessor is at this level. Such a preprocessor is language-specific. ANSI C, for example, has weird and wonderful lexical structure, unlike any other language (except C++, of course), which means that its preprocessor is going to misbehave in very weird ways if you feed it Fortran or Prolog. However, the preprocessor doesn't actually get much benefit from it; notoriously, cpp can't evaluate sizeof (int). (3) The level of interpreted tokens. If I have understood correctly, this is where we find the SAIL preprocessor and a number of older macro assemblers. A preprocessor of this type can make decisions based on the semantic properties of identifiers. The Pop-2 macro facility is of this sort: a macro is a function that consumes a lazy token list and has full access to syntactic properties and global value (if any). (4) The level of pure parse trees. Lisp macros have basically been of this form. It is literally impossible for a Lisp macro to receive ill-formed arguments or return ill-formed results. Dylan macros are, I believe, of this kind, and the language specification spells out what must be in the trees. Prolog's term_expansion/2 is of this type, and so is Erlang's program transformation facility. I find it perverse that the Erlang manuals claim that `no sane programmer' would use a facility at this level of cleanliness when Erlang has something far more horrible that is used freely. (5) The level of interpreted parse trees. Some of the Scheme macro systems are of this form, where the arguments of a macro contain `identifiers' not `symbols', i.e. they receive symbol table entries rather than strings. Scheme is noteworthy for its `hygienic' macros which solve the problems of inadvertent variable capture. I must admit that I've never quite got my head around the low-level hygienic macro stuff, but I appreciate the problem it addresses. This used not to be of concern to Erlang, but with the addition of closures, all the problems of nested functions (but not all of the benefits) have come to haunt Erlang. This level may be compiler-specific; in Scheme it happens not to be except in the sense that older Scheme systems do not all support it. (6) The level of full access to compiler data structures. In some older Lisp compilers, so-called `compiler macros' approximated this: special macros that were called by the compiler and could call back into the compiler and directly affect code generation. This level is definitely compiler-specific. (7) The level of semantics. Darlington-style program transformations can be seen as macros of this type. Profiling transformations that detect events with particular meanings, however expressed, and add code to record, count, or trap them could be thought of as such macros. Some uses of Prolog's expand_term/2 facility are of this kind: taking a new language with Prolog-like syntax but not necessarily Prolog-like semantics and transforming it into Prolog code that does the right thing. There are, for example, several transformers around that turn functional syntax into Prolog, including `let' and nested functions. There are several DCG transformers around, offering recursive descent parsing, left corner parsing, chart parsing, and extensions such as feature unification. Another way to think about macro systems is whether their effects must be local or may be global in effect. For example, Quintus Prolog offers only first-argument indexing. There is a means to provide (independent) indices on any number of arguments, but it is a transformation that takes a declaration and stores it, and then when it sees a clause for the declared predicate, emits clauses for several other predicates. This effect is non-local in that it is not and cannot be confined to the scope of a single clause. In a similar way, I once had a package of M4 macros that added tracing code to Pascal procedures and functions. If you wrote Proc(id)(args); decls; Begin stmts End; or Func(id)(args):type; decls; Begin stmts End; then Proc and Func stored information away (on a stack, to allow for nested routines) which Begin used and End used and popped. Here again the point is information being saved in one macro call and used in a later one. M4 can do this. Prolog can do this. Lisp can do this. PL/I can do this. But C cannot. This leads to the curiosity of some things in C that should have been macro calls being implemented as #define ARG1 #define ARG2 #include MACRO which has occasionally annoyed me enough to go back to M4. Erlang's macro system is even more constrained than C's: at least C's macros can generate more than one function and more than one declaration, and in C9x they can generate any number of pragmas as well. While Lisp's and Prolog's and Scheme's and Dylan's macros do not have that constraint, they do have the constraint that vaguely sensible parse trees must be transformed into vaguely sensible parse trees. In particular, by virtue of being defined as tree-to-tree transformations, they cannot be made to accept ill-bracketed inputs nor to produce ill-bracketed outputs. This has the very pleasant consequence that syntax-aware editors can never be confused by macros in those languages. On the other hand, consider #define paste(x,y) x##y #define foo(x,y) paste(x,2) = 0 foo(for,}); foo(goto,{); Editors that provide ``syntax colouring'' will typically give `for' and `goto' here the colour for keywords, although no keyword is actually involved. Editors that try to do bracket matching for you will be extremely confused. It isn't wise to confuse your tools. It isn't wise to make it possible for your tools to be confused, because when things turn weird, you need to rely on something that isn't. A syntax colouring editor for Erlang will not be confusable in quite the same way because the Erlang preprocessor does not support stringizing or token pasting (##).. This is wise, given the dreadfulness of pp_numbers in C, the problems of stringizing when there are two kinds of strings, and the truly horrendous problems of stringizing and token pasting combined with Universal Character Names. However, the absence of token pasting, when combined with the failure of variable tokens to scope over macro definitions, means that there is no way for a macro expansion to safely introduce a new variable into a clause (one form of the `macro hygiene' problem). However, it is perfectly easy to confusing a syntax colouring editor for Erlang by working from the other end. -define(lambda, fun (). foo(X) -> ?lambda Y ) -> X+Y end. Will the editor know to colour ?lambda the way it would have coloured fun? If the editor counts all brackets, is it going to think there is one end too many? If the editor is more like vi or QUED/M, is it going to think you have a mismatched right parenthesis? Never mind whether this is likely, do we really want it to be possible ? Because C macros rely on parentheses and commas to delimit their arguments, it is tricky to get unbalanced parentheses into their arguments, unlike brackets and braces which are all too easy to unbalance. It is tricky. It is not, however, impossible, as the well known hack for printf-like macros shows. Suppose we want a db_printf() macro that can be used like printf but does nothing if NDEBUG is enabled. #define _ , #ifdef NDEBUG #define db_printf(args) (void)(0) #else #define db_printf(args) fprintf(stderr, args) #endif ... db_printf("File '%s' line %d\n" _ __FILE__ _ __LINE__) In the same way, we can do #define _l ( #define _r ) #define group(l,x,r) l x r ... group({,a group([,1,]) = group(_l,x+y,_r)*z;,}) Or can we? Try it! Erlang macro calls are somewhat different. This is a double-edged sword. One cut is that while macro expansions in Erlang may be wildly unbalanced, macro invocations cannot be, so bracket matching editors will get the calls right, but miss the big picture. This is helpful? The other cut is that C programmers (that is, programmers already indoctrinated with the long obsolete notion that C macros are a Neat Idea) know that macro arguments do not have to be bracket or brace balanced, so Erlang macros turn out to be faux amis. C macros and Erlang macros both need to know which commas separate arguments and which commas do not. There was a very easy option open to the Erlang designers. C macro calls look like C function calls, but Erlang macro calls do not look like Erlang function calls, and there is no reason why macro calls and function calls in Erlang should have to have the same argument separator. Suppose the argument separator were a token that was not legal outside macros? Doubled commas, perhaps. Then -define(l, (). -define(r, )). -define(group(l,,x,,r), l x r). ... ?group({,,a ?group([,,1,,]) = ?group(?l,,x+y,,r)*z;,,}) would have been usable. As it is, the preprocessor knows both too much Erlang syntax (it knows all matching left/right bracket pairs) and too little (it doesn't know about operator precedence). Why do I say too much? Suppose a new bracketed construct were introduced into Erlang. For argument's sake, consider let Body within Body end be a new form where the variables defined in the first Body are visible inside the second but not outside the form. Then not only does the parser have to be modified, so does the preprocessor, and not only does the preprocessor have to be modified as well as the parser, it has to be modified differently. Coupling, anyone? Not a hypothetical case: `try' and `fun' were new not so long ago, and `cond' is new now. There's strike one (the macros are too restricted to do much that is interesting or useful) and strike two (the macros are too unrestricted for compatibility with useful bug-avoidance tools like bracket matching and syntax colouring editors). Is there a strike three? Debugging. It is notoriously the case that the C preprocessor interferes with debugging. This is not the case in Lisp, or at least, not to the same extent. Setting a breakpoint on a Lisp macro, if it accomplishes anything, results in a break point happening once for a calling site, not once for each call. But at least an expression involving a macro can be called from within the debugger in exactly the same way it can be called in the program; at least the debugger knows the thing is a macro. What is the difference between #define C1 137 /*a*/ enum {C2 = 137}; /*b*/ int const C3 = 137; /*c*/ in C? The first two can be used in array declarations, the third cannot, but that will be fixed in C9x. The last two follow normal scope rules, the first does not. The debugger knows about the last two, but not about the first. The preprocessor knows about the first one, but not the last two. So if you want to use a constant name in a #if test, you must use (a). If you want to use a constant name when debugging, you must use (b) or (c). You can't have it both ways. If you want a constant of integral type other than int, you have to use (c). If you want to use the constant in an array size, in C89 (but not C9x) you have to use (b). If you want a constant of type 'unsigned short int' that you can use in array sizes and #if tests, you are out of luck: there is no such animal. Erlang isn't quite this bad. Given -define(c1(), 137). c2() -> 137. then ?c1() can be used anywhere that c2() could (but it is not true that ?c1 can be used anywhere that c2 could; try passing a macro where a closure is expected and be ready for an unpleasant surprise). But what about debugging? After the preprocessor has run, does it leave behind in the compiled form of the module complete information about macros so that they can be `called' from the debugger? How can it? For unlike a function, a macro can have many declarations. -define(c1, 137). ... use c1 ... -undef(c1). -define(c1, 42). ... use c1 ... -undef(c1). -define(c1, 616). ... use c1 ... -undef(c1). Which of these definitions will be stored with the module? When I am debugging, how is the debugger to know which of them I intend to be used? To be as useful as possible for debugging, whatever we use instead of macros has to have exactly one definition visible in a module where it is used, and that definition must be available to the debugger. Lisp and Scheme macros have this property; C and Erlang macros do not. There is a fourth strike against macros, which C programmers are acutely aware of. Because they are defined in terms of token sequences, not trees or anything more `semantic', they are extremely vulnerable to errors like this: -define(c1(), 1+1). c2() -> 1+1. f(1) -> ?c1()*?c1(); f(2) -> c2()* c2(). The two clauses for f/1 look similar, but f(1) is 3, while f(2) is 4. The problems with identifier scoping in C macros (another notorious cause of bugs) can perhaps be assimilated to this: the things C (and Erlang) macros deal with are uninterpreted symbols, not resolved identifiers. In Erlang, this relates to variable names. Apparently, -define(ugh, X = 2). f(Y) -> ?ugh(), X = "bar", zic(X, Y). is legal, and the expansion of ?ugh() will refer to a variable that was not in scope when ugh was declared. Now that's nasty. Changing every visible occurrence of X in f/1 to Z, giving f(Y) -> ?ugh(), Z = "bar", zic(Z, Y). ought not to change the behaviour of f/1 in a sensible language, but if I have understood the preprocessor documentation correctly, it will. This is not the stuff that reliable software is made from, not when there are better alternatives. Further proof that even the best programmers find C-style macros hard to get their heads around in anything resembling a reliable sort of way, if such proof could be imagined to be necessary, can be found in the 0.6 draft of the ``Standard Erlang'' specification. Let's start with definitions. It's simple enough. Erlang directives look vaguely like function calls, but they are absolutely delimited by full stops, and the inventors presumably didn't wish for any particular restriction on the replacement sequence for a macro. So it's a doddle to specify macro definitions: macro_definition = '-' 'define' '(' macro_head ',' token* ')' '. ' macro_head = (symbol | variable) ['(' [variable {',' variable}...] ')'] No muss, no fuss. Parse a macro definition by stripping '-define(' off the beginning of a list of tokens and ').' off the end, a little parsing of the macro head, and the rest is the replacement sequence. What do we find instead of this? MacroDefinition: - define ( MacroName MacroParamsopt , MacroBodyRpar FullStop MacroName: AtomLiteral Variable MacroParams: ( Variablesopt ) Variables: Variable Variables , Variable So far so good. Long-winded, but clearly talking about the same thing. MacroBodyRpar: NonemptyMacroBodyopt ) MacroBodyRpar ) NonemptyMacroBody: TokenNotRpar NonemptyMacroBody TokenNotRpar MacroBodyRpar TokenNotRpar TokenNotRpar: any Token except ) Let's simplify this last block to see what we get. MacroBodyRpar: NonemptyMacroBodyopt ')'+ NonemptyMacroBody: NonemptyMacroBodyopt ')'* TokenNotRpar+ which finally boils down to MacroBodyRpar: Token* ) So why do we have to talk about NonemptyMacroBodies and TokenNotRpars at all? It's as if the presence of the parentheses caused the specifier to think that there must be some sort of counting and some kind of exception for parentheses, when in fact nothing of the kind is necessary. Why all this ``left parentheses are allowed anywhere but right parentheses may only occur here and here '' which turn out to be no restriction at all? Nothing like that kind of confusion is there in function definitions. There isn't any similar confusion in the specification of macro invocations, although you have to read the fine print very very carefully to see that the comma is one of the ParenLike characters. One also finds that BalancedExprs is used in two different senses: in one sense it is a sequence of separate balanced exprs, and in another sense it is a single sequence of tokens within which a comma will not be taken as a separator. It would be better to have MacroApplication: ? MacroName ? MacroName ( ) ? MacroName ( MacroArguments ) MacroArguments: BalancedExpr MacroArguments , BalancedExpr WHOOPS! My claim that macros are confusing has just been clinched beyond any possible argument, by my discovery that certain arguments are not possible. Read the grammar on page 124 (section 7.2.3, Macro expansion) very carefully, and you will see that ?foo(bar(1)) is not a legal macro call. Why not? Because `bar(1)' is not a BalancedExpr. The symbol `bar' is (being an instance of NotParenLikes-opt) and the group `( 1 )' is (being an instance of ( BalancedExprs )), but there is no rule that allows two BalancedExprs to be concatenated to make a large one. The change would be simple: MacroArguments: BalancedExpr-opt MacroArguments , BalancedExpr-opt BalancedExpr: ( BalancedExprs-opt ) [ BalancedExprs-opt ] ... fun BalancedExprs-opt end NotParenLike BalancedExpr BalancedExpr % added NotParenLike: not any of () [] ... , ; | || -> BalancedExprs: BalancedExpr BalancedExprs Comma BalancedExpr Comma: one of , ; | || -> This is quite a staggering gap in the definition of macro invocation. It is a gap that slipped by an able specifier and several commentators, and that only drew my attention because I was trying to attack the very notion of a preprocessor in Erlang. (Delenda est preprocessor!) Do I need to point out that ``implicit fun expressions'' derail the definition of balanced exprs? Implicit fun expressions (fun Name/Arity) derail the definition of balanced exprs because they begin with a ``left bracket'' (fun) but do not end with the corresponding ``right bracket'' (end). Accordingly, ?no_fun(fun self/0) is not a legal macro invocation. Fun till it hurts: ?no_fun(fun self/0 end) is a legal macro invocation, although the argument is imblanaced. There's another even more staggering gap in this specification: a macro invocation may not contain another macro invocation! That's right, according to section 7.2.3, this program fragment -define(foo(x), (x*2)). f(X) -> ?foo(?foo(X)). is not legal. That's a novel way to work around some of the problems in C (does the inner macro call get expanded eagerly as in M4 or lazily as in C?) but it will seriously confuse anyone who is used to C. Yes, `?foo' is a sequence of NotParenLike tokens, but as with ordinary function calls, there is currently no provision for both `?foo' and `(X)' in the one BalancedExpr. There is another gap concerning strings. The C standard has a fairly complex notion of `translation phases', not the least function of which is to sort out all the hairy lexical transformations that have accreted. The C standard does carefully define the interaction between string pasting and macro expansion; the Erlang specification does not. As things stand, there are two reasons why the following example from page 126 won't work: -define(SRCDIR(FN), "/usr/local/src/myproj/" FN ".erl). -include(?SRCDIR("bliss")). will not work. The first is the obvious one, that Erlang perversely fails to expand macros in directives (where they could do some real irredundant good). But the second is that it wouldn't be legal Erlang syntax if it were allowed. Why not? Because the definition of a macro is a sequence of tokens (instances of the non-terminal Token), and the arguments of macros are basically sequences of Tokens. So the definition of ?SRCDIR contains two AtomicLiterals, and the argument in the call contains one AtomicLiteral, and the expansion of that directive would be - Operator include AtomicLiteral ( Separator "/usr/local/src/myproj/" AtomicLiteral "bliss" AtomicLiteral ".erl" AtomicLiteral ) Separator . FullStop There is no rule that a sequence of AtomicLiterals can be turned into a StringLiteral. There isn't even any rule that a sequence of StringLiterals can be turned into a StringLiteral. A remedy is possible: define MacroTokens to be Separators, Keywords, Operators, IntegerLiterals, FloatLiterals, CharLiterals, OneStringLiterals, AtomLiterals, Variables, and Universal Patterns, and make it clear in the accompanying commentary that this is so that string pasting will happen after macro expansion. As far as I can see, in the presence of the macro processor, Erlang needs at least the following translation phases, in the following order. 1. Replace Universal Character Names by Unicode characters. 2. Pretokenise (everything except pasting). 3. Handle preprocessor directives (without macro expansion). 4. Expand macros in all other token sequences. 5. Do string pasting (and why not atom pasting as well, for quoted atoms). 6. Parse. (5 and 6 can be folded together, as they had to be in C9x when __FUNCTION__ was added). Note that C constrains the use of UCNs so that characters of significance to C may not be written as UCNs. That means C doesn't need Erlang's translation phase 1: \u0022xy\u0022 is not a possible spelling of "xy". The Erlang specification does not so constrain the use of UCNs, so it does require a separate translation phase 1 to be carried out before pretokenising. At the moment, the specification cannot be taken literally, on at least these three grounds (no function calls in macro arguments, no macro calls in macro arguments, no string pasting in macro results). If we cannot take the specification literally, we cannot take it seriously. And if we cannot take it seriously, what use is it? Let's be clear about this: we are not talking about a 2nd-year student's first fumbling exercises in BNF, or a 3rd-year student's first encounter with how macro processors really work. We are talking about a 20+ year old design, extensively overhauled 10+ years ago, overhauled again recently, in a major language, where the design and its flaws has been debated publicly in an open forum (comp.std.c); we are talking about a specification done by an experienced and capable person with the actual code and its designers available to him, and we are presumably talking about the 6th draft. More than that, it is labelled the final draft. If a mechanism is this hard to describe correctly, who do we trust to implement it correctly and who do we trust to use it safely? Nobody, and no-one. At an absolute minimum, Erlang macros - ought to accept parse trees as arguments - ought to deliver a parse tree as result - ought to have variables occurring in their definition systematically renamed on application to keep the variables in macro definitions scrupulously apart from the variables that visibly appear in clauses. In fact, it does not seem that any of the uses of Erlang macros in the 4.4 EDE would be limited by these restrictions. On the contrary, one would be able to use them more confidently. The problem is not that Erlang macros necessarily always lead to evils; it is that we are never certain that they haven't until we carefully manually check each use of each macro. The most telling point against Erlang macros is that we don't need them. The one important thing they could do that inlined functions and abstract patterns cannot do is build directives, but perversely, Erlang forbids that. -define(MODULE, fred). -module(?MODULE). ... is illegal. Too weak to be useful, too dangerous to be trusted, and superfluous in a language with inlined functions and abstract patterns. Macros? Who needs them! Delenda est preprocessor! There are a number of things I would be sorry to see go. ?MODULE should not be necessary, but it is. However, there is not the slightest reason why it has to be a macro. It could perfectly well be an automatically defined abstract pattern. Simply specify that -module(xxx). causes #module() -> xxx. to be automatically defined, and all uses of ?MODULE can then be changed to uses of #module(). Indeed, #module() is safer than ?MODULE, because there is no way to undefine a function or pattern, while macros can be undefined. Nothing in sections 7.2.2 or 7.2.4 says that the predefined macros cannot be undefined and subsequently redefined. If I do -module(xxx). -include("common.hrl"). f(X) -> ?MODULE:g(X). I have no right whatsoever to assume that ?MODULE expands to xxx. I would like to be able to trust ?MODULE, because I would like to be able to write -module(fred). ... -import(?MODULE, [foo/1]). Obviously, at the moment that isn't allowed because Erlang perversely refuses to macro-expand directives, but supposing that puzzling restriction were lifted, would such an import directive be allowed? It does make perfect sense. Since in general -import(M, [F/N]). means that a call to F(E1,...,EN) is to be interpreted as a call to M:F(E1,...,EN), then a self-import would mean that every call to the function in question was a remote call, even though it could have been a local call. With the idea that long-lived stuff should be done by tail recursive loops using remote calls, it could be very sensible to declare in one place that all calls to such a predicate would be remote calls and eliminate the possibility of accidentally leaving out the self prefix. ?FILE is useful for debugging. Thanks to the -file directive, this one does have to be a macro, because it can take different values at different points in the module. Without the -include and -include_lib directives, and Erlang should be without them, -file would be the only reason it would change. On the other hand, the specification unambiguously says that ?FILE is ``a full path to the file that is being compiled '' and there is only one file that satisfies that description. An included file does not satisfy that description. The original file from which some other tool generated Erlang code including -file directives does not satisfy that description. Only the file whose name was initially passed to the compiler satisfies that description. Again a two-edged sword. One cut is that we could implement that specification using an abstract pattern. The other cut is that that specification is pretty much useless and would make the -file directive almost entirely pointless. As it happens, -file is pretty much pointless, because it is only useful in conjunction with ?LINE, and that is useless. ?LINE ought to be useful for debugging. Its prototype, __LINE__ in C and __line__ in M4, certainly is. However, the specification is quite unambiguous and quite useless. Consider -define(assert(X), % line 20 cond X -> ok; true -> error(?FILE, ?LINE end % line 21 ). % line 22 ... f(X) -> ?assert(X > 10), ... % line 99 The equivalent in C would mention line 99. But the Erlang specification is clear: ``?LINE is expanded to a single token which is an integer literal that is the number of the line on which the LINE token appears.'' There is only one number that satisfies that description, and it's 21, not 99. By far the main point of __LINE__ in C is to have something you can put in a macro to tell you at run time where the call was; what Erlang gives you is something that says where the definition was. Of course, without macros, the entire thing becomes pretty pointless anyway, but with macros, the definition of ?LINE has to be something along the lines of ``the line where the MacroName of the outermost macro invocation dynamically enclosing this macro invocation was''. (If ?LINE itself is the outermost invocation, the two definitions coincide, but not otherwise.) Do I need to explain what is wrong with -file(F,N)? ?DATE would be useful, but Erlang has no such analogue of __DATE__. ?TIME would be useful, but Erlang has no such analogue of __TIME__ ?FUNCTION would be especially useful. The general C convention for reporting run-time errors has been to cite the file name and line number, but C9x has added __FUNCTION__. (The pressure for this comes, I believe, from an AT&T tool that adds preconditions and postconditions to C.) That predefined macro expands to the name of the enclosing function, as a string. The Erlang convention for reporting errors is generally to build a tuple containing the module name (?MODULE), the function name (?FUNCTION would be this, if it existed), and the arguments (?ARGUMENTS would be this, if it existed). Presumably ?FUNCTION doesn't exist because the Erlang preprocessor doesn't know enough about Erlang syntax; however, it would be quite easy to add, because the value would simply be the first token of the current token sequence. Of course, when you are writing a function, you know what its name is, but a macro that you call inside it does not know this and has no other way to find out what to put in an error tuple. ?ARGUMENTS would chiefly be useful so that one could have -define(CALL_INFO, {?MODULE,?FUNCTION,?ARGUMENTS}). and then use ?CALL_INFO in a macro that wanted to generate run time error reporting code, but it could be useful in some kinds of profiling and soft type checking code. Of the built-in macros that exist, the only one that is intrinsically useful is ?MODULE, which could be handled another way. There are, of course, other macros that are used in module bodies but not defined in them, and those are the ``configuration'' flags like `debug', `DEBUG', and `silent'. Those are defined when the compiler is invoked. Here again, we can allow ourselves one predefined abstract pattern, and I suggest #option(FlagName) -> AtomicLiteral. as a starting point for discussion. Instead of -ifdef(debug). -define(verbose(X, Y), io:format(X, Y)). -else -define(verbose(X,Y), ok). -endif we would have verbose(X, Y) when true == #option(debug) -> io:format(X, Y); verbose(X, Y) when false == #option(debug) -> ok. I consider it important that there would be precisely two compiler-defined abstract patterns, and that there would be no depending on something not being defined. I also consider it important that while these two abstract patterns would be defined by the compiler, they would be defined, and would be available in a debugger, and could be exported by the module. (This is a very small first step in the direction of making configuration information about a module available.) In any case, I hope that the trickiness of defining ?LINE correctly will be one more nail in the preprocessor's coffin. For the very last nail, let's see how -define is actually used in the Erlang 4.4 Development Environment. I did find $Erlang -name '*.[eh]rl' \ -exec grep -q '^[ \t]*-[ \t]*define' {}" ";" \ -exec view "{}" ";" to make sure I saw every occurrence of -define. Quite tedious. This is what I found. Module/header Use of -define pman_shell Named constants canvasbutton Named constants toolbar_graphics Named constants erl_parse #module() dets Named constants, inlined function, conditional output disk_log Named constants, conditional output filename Named constants gen Named constants gen_event Unhygienic: -define(reply(X), From!{Tag,X}). It makes handle_msg/5 *LESS* clear by hiding the message sends, and *much* harder to check by hiding the variable threading. lib Inlined function os Named constants supervisor Guard function erl_prim_loader Named constants error_logger Named constants file Named constants heart Named constants socket Named constants gtk_font Named constants gtk_port_handler Named constants gtk_window Named constants bonk Named constants cols Named constants mandel Named constants othello_board Named constants file_dial Named constants graph_placer Named constants rb_mod Named constants xref Named constants xref_gp Named constants appmon Named constants appmon_a Named constants appmon_info Inlined functions appmon_txt Named constants igserver lib/ig-1.7/src/igserver.erl; fascinating igsemantic Conditional output ig Named constants, conditional output, and_then or_else duplex_main Named constants yecc_parser #module() ig.hrl Named constants, conditional output, and_then or_else plus -> Named constants speak for themselves. Most of them could perfectly well have been plain functions, the rest can be abstract patterns. Inlined functions were trivial. Conditional output has been discussed above in the section on conditional compilation. The andthen and orelse macros will presumably be replaced by the staggeringly ugly 'all_true' and 'some_true' constructs in Erlang 5.0 (why oh why aren't they operators like and_then and or_else in the current Pascal standard?). There's one case of abbreviation which could have been a local function. There is an inlined 'guard function' that could be an abstract pattern. That leaves only two interesting cases. There are only two files that contain uses of -define that don't convert quite directly into plain functions or abstract patterns. Both of them are unhygienic, and both of them can be done quite simply and cleanly without any macros at all. The first is gen_event. -module(gen_event). -define(reply(X), From ! {Tag, X}). handle_msg(Msg, Parent, ServerName, MSL, Debug) -> case Msg of {notify, Event} -> MSL1 = server_notify(Event, handle_event, MSL), loop(Parent, ServerName, MSL1, Debug); {From, Tag, {call, Mod, Query}} -> {Reply, MSL1} = server_call(Mod, Query, MSL), ?reply(Reply), loop(Parent, ServerName, MSL1, Debug); {From, Tag, {add_handler, Mod, Args}} -> {Reply, MSL1} = server_add_handler(Mod, Args, MSL), ?reply(Reply), loop(Parent, ServerName, MSL1, Debug); {From, Tag, {delete_handler, Mod, Args}} -> {Reply, MSL1} = server_delete_handler(Mod, Args, MSL), ?reply(Reply), loop(Parent, ServerName, MSL1, Debug); {From, Tag, {swap_handler, Mod1, Args1, Mod2, Args2}} -> {Reply, MSL1} = server_swap_handler(Mod1, Args1, Mod2, Args2, MSL), ?reply(Reply), loop(Parent, ServerName, MSL1, Debug); {From, Tag, stop} -> stop_handlers(MSL), ?reply(ok); {From, Tag, which_handlers} -> ?reply(the_handlers(MSL)), loop(Parent, ServerName, MSL, Debug); {get_modules, From} -> From ! {modules, the_handlers(MSL)}, loop(Parent, ServerName, MSL, Debug); Other -> MSL1 = server_notify(Other, handle_info, MSL), loop(Parent, ServerName, MSL1, Debug) end. This could just as well be handle_msg(Msg, Parent, ServerName, MSL, Debug) -> case Msg of {notify, Event} -> MSL1 = server_notify(Event, handle_event, MSL), loop(Parent, ServerName, MSL1, Debug); {From, Tag, {call, Mod, Query}} -> {Reply, MSL1} = server_call(Mod, Query, MSL), From ! {Tag, Reply}, loop(Parent, ServerName, MSL1, Debug); {From, Tag, {add_handler, Mod, Args}} -> {Reply, MSL1} = server_add_handler(Mod, Args, MSL), From ! {Tag, Reply}, loop(Parent, ServerName, MSL1, Debug); {From, Tag, {delete_handler, Mod, Args}} -> {Reply, MSL1} = server_delete_handler(Mod, Args, MSL), From ! {Tag, Reply}, loop(Parent, ServerName, MSL1, Debug); {From, Tag, {swap_handler, Mod1, Args1, Mod2, Args2}} -> {Reply, MSL1} = server_swap_handler(Mod1, Args1, Mod2, Args2, MSL), From ! {Tag, Reply}, loop(Parent, ServerName, MSL1, Debug); {From, Tag, stop} -> stop_handlers(MSL), From ! {Tag, ok}; {From, Tag, which_handlers} -> From ! {Tag, the_handlers(MSL)}, loop(Parent, ServerName, MSL, Debug); {get_modules, From} -> From ! {modules, the_handlers(MSL)}, loop(Parent, ServerName, MSL, Debug); Other -> MSL1 = server_notify(Other, handle_info, MSL), loop(Parent, ServerName, MSL1, Debug) end. which I find far more readable. First, I can see at a glance that there are a lot of "From ! " actions, and I can also see at a glance that they all send {Tag, } except for one case which sends {modules, }. Previously the similarity and the difference were hidden. Secone, I can now check variable threading _locally_, without having to search for the definition of ?reply. A better abstraction would be reply(From, Tag, Parent, ServerName, Debug, MSL, Reply) -> From ! {Tag, Reply}, loop(Parent, ServerName, MSL, Debug). reply(From, Tag, Parent, ServerName, Debug, {Reply,MSL}) -> reply(From, Tag, Parent, ServerName, Debug, MSL, Reply). handle_msg(Msg, Parent, ServerName, MSL, Debug) -> case Msg of {notify, Event} -> MSL1 = server_notify(Event, handle_event, MSL), loop(Parent, ServerName, MSL1, Debug); {From, Tag, stop} -> stop_handlers(MSL), From ! {Tag, ok}; {From, Tag, {call, Mod, Query}} -> reply(From, Tag, Parent, ServerName, Debug, server_call(Mod, Query, MSL)); {From, Tag, {add_handler, Mod, Args}} -> reply(From, Tag, Parent, ServerName, Debug, server_add_handler(Mod, Args, MSL)); {From, Tag, {delete_handler, Mod, Args}} -> reply(From, Tag, Parent, ServerName, Debug, server_delete_handler(Mod, Args, MSL)); {From, Tag, {swap_handler, Mod1, Args1, Mod2, Args2}} -> reply(From, Tag, Parent, ServerName, Debug, server_swap_handler(Mod1, Args1, Mod2, Args2, MSL)); {From, Tag, which_handlers} -> reply(From, Tag, Parent, ServerName, Debug, MSL, the_handlers(MSL)); {get_modules, From} -> reply(From, modules, Parent, ServerName, Debug, MSL, the_handlers(MSL)); Other -> MSL1 = server_notify(Other, handle_info, MSL), loop(Parent, ServerName, MSL1, Debug) end. Both the original version and this version come to 35 SLOC, but this version expresses the commonality more clearly without any macros at all. This version could be cleaned up somewhat. The second is igserver, which uses macros to generate receive cases. -module(igserver). %% Really bad macros to handle reporting of all received messages %% while still matching out stuff from within messages. Note also that %% the variable _NewDebug_ is used in a bad, bad way at some of the %% calls of this macro. -define(rec_case(Debug,Case,Res), Case when Debug==[] -> _NewDebug_ = [], Res; Case -> _NewDebug_ = sys:handle_debug(Debug,{igserver,print_event}, self(),{in,Case}), Res). -define(wait_for(Debug,Case,Result), receive ?rec_case(Debug,Case,Result) end). -define(wait_for2(Debug,Case1,Result1,Case2,Result2), receive ?rec_case(Debug,Case1,Result1); ?rec_case(Debug,Case2,Result2) end). -define(wait_for3(Debug,Case1,Result1,Case2,Result2,Case3,Result3), receive ?rec_case(Debug,Case1,Result1); ?rec_case(Debug,Case2,Result2); ?rec_case(Debug,Case3,Result3) end). %% Must return {data, Result, NewDebug} get_fun_msg({port,Port},Debug) -> %% Note extremely ugly use of macro variable _NewDebug_ ?wait_for(Debug,{Port,{data,Res}},{data,Res,_NewDebug_}); get_fun_msg({server,Socket},Debug) -> %% Note extremely ugly use of macro variable _NewDebug_ ?wait_for(Debug,{Socket,{fromsocket,Res}},{data,Res,_NewDebug_}); get_fun_msg({client,Socket},Debug) -> %% Note extremely ugly use of macro variable _NewDebug_ ?wait_for(Debug,{Socket,{fromsocket,Res}},{data,Res,_NewDebug_}). get_cback_msg({port,Port},Debug) -> ?wait_for2(Debug,{Port,{data,Res}},Res, {'EXIT',Port,Reason},{'EXIT',Reason}); get_cback_msg({client,Socket},Debug) -> ?wait_for3(Debug,{Socket,{fromsocket,Res}},Res, {Socket,{socket_closed,Reason}},{'EXIT',Reason}, {'EXIT',Socket,Reason},{'EXIT',Reason}); get_cback_msg({server,Socket},Debug) -> ?wait_for3(Debug,{Socket,{fromsocket,Res}},Res, {Socket,{socket_closed,Reason}},{'EXIT',Reason}, {'EXIT',Socket,Reason},{'EXIT',Reason}). close_conn(...) -> ...; close_conn({port,Port}) -> cast_hook({port,Port},terminate,{}), ?wait_for([],{'EXIT',Port,Reason},Reason). %use [] for Debug Let's get rid of the last one first. Since Debug is known to be [], it might just as well be written close_conn(...) -> ...; close_conn({port,Port}) -> cast_hook({port,Port},terminate,{}), receive {'EXIT',Port,Reason} -> Reason end. Now let's note that sys:handle_debug(DEBUG, {igserver,print_event}, self(), {in,STUFF}) occurs elsewhere, and give it a name: sys_debug([], _) -> []; sys_debug(Debug, Stuff) -> sys:handle_debug(Debug, {igserver,print_event}, self(), {in,Stuff}). Now let's rewrite the macros a bit. -define(wait_for(Debug, NewDebug, Case, Result), receive Case -> NewDebug = sys_debug(Debug, Case), Result end). Only one function calls ?wait_for(), get_fun_msg/2. Let's expand that out. get_fun_msg({port,Port}, Debug) -> receive Msg = {Port,{data,Res}} -> {data,Res,sys_debug(Debug,Msg)} end; get_fun_msg({server,Socket}, Debug) -> receive Msg = {Socket,{fromsocket,Res}} -> {data,Res,sys_debug(Debug,Msg)} end; get_fun_msg({client,Socket}, Debug) -> receive Msg = {Socket,{fromsocket,Res}} -> {data,Res,sys_debug(Debug,Msg)} end. where I have used "AS-patterns" in the receive commands. Erlang scoping being what it is, the whole thing can be rewritten as get_fun_msg({Type,Source}, Debug) -> case Type of port -> receive Msg = {Source,{data,Res}} -> true end; server -> receive Msg = {Source,{fromsocket,Res}} -> true end; client -> receive Msg = {Source,{fromsocket,Res}} -> true end end, {data,Res,sys_debug(Debug,Msg)}. which I think is vastly clearer than the original. -define(wait_for2(Debug, NewDebug, Case1, Res1, Case2, Res2), receive Case1 -> NewDebug = sys_debug(Debug, Case1), Res1 ; Case2 -> NewDebug = sys_debug(Debug, Case2), Res2 end). -define(wait_for3(Debug, NewDebug, Case1, Res1, Case2, Res2, Case3, Res3), receive Case1 -> NewDebug = sys_debug(Debug, Case1), Res1 ; Case2 -> NewDebug = sys_debug(Debug, Case2), Res2 ; Case3 -> NewDebug = sys_debug(Debug, Case3), Res3 end). Only one place calls ?wait_for2 or ?wait_for3. Let's expand that out. get_cback_msg({port,Port}, Debug) -> receive Msg = {Port,{data,Res}} -> sys_debug(Debug, Msg), Res ; Msg = {'EXIT',Port,Reason} -> sys_debug(Debug, Msg), {'EXIT',Reason} end; get_cback_msg({client,Socket}, Debug) -> receive Msg = {Socket,{fromsocket,Res}} -> sys_debug(Debug, Msg), Res ; Msg = {'EXIT',Reason} -> sys_debug(Debug, Msg), {'EXIT',Reason} ; Msg = {'EXIT',Socket,Reason} -> sys_debug(Debug, Msg), {'EXIT',Reason} end; get_cback_msg({server,Socket}, Debug) -> receive Msg = {Socket,{fromsocket,Res}} -> sys_debug(Debug, Msg), Res ; Msg = {Socket,{socket_closed,Reason}} -> sys_debug(Debug, Msg), {'EXIT',Reason} ; Msg = {'EXIT',Socket,Reason} -> sys_debug(Debug, Msg), {'EXIT',Reason} end. Now we see that the calls to sys_debug/2 can be moved out, getting get_cback_msg({port,Port}, Debug) -> receive Msg = {Port,{data,Res}} -> true ; Msg = {'EXIT',Port,Reason} -> Res = {'EXIT',Reason} end, sys_debug(Debug, Msg), Res; get_cback_msg({client,Socket}, Debug) -> receive Msg = {Socket,{fromsocket,Res}} -> true ; Msg = {'EXIT',Reason} -> Res = {'EXIT',Reason} ; Msg = {'EXIT',Socket,Reason} -> Res = {'EXIT',Reason} end, sys_debug(Debug, Msg), Res; get_cback_msg({server,Socket}, Debug) -> receive Msg = {Socket,{fromsocket,Res}} -> true ; Msg = {Socket,{socket_closed,Reason}} -> Res = {'EXIT',Reason} ; Msg = {'EXIT',Socket,Reason} -> Res = {'EXIT',Reason} end, sys_debug(Debug, Msg), Res. This can be simplified further: get_cback_msg({Type,Source}, Debug) -> case Type of port -> receive Msg = {Source,{data,Res}} -> true ; Msg = {'EXIT',Source,Why} -> Res = {'EXIT',Why} end; client -> receive Msg = {Source,{fromsocket,Res}} -> true ; Msg = {'EXIT',Why} -> Res = {'EXIT',Why} ; Msg = {'EXIT',Source,Why} -> Res = {'EXIT',Why} end, server -> receive Msg = {Source,{fromsocket,Res}} -> true ; Msg = {Source,{socket_closed,Why}} -> Res = {'EXIT',Why} ; Msg = {'EXIT',Source,Why} -> Res = {'EXIT',Why} end end, sys_debug(Debug, Msg), Res. Let's put these pieces together: sys_debug([], _) -> []; sys_debug(Debug, Stuff) -> sys:handle_debug(Debug, {igserver,print_event}, self(), {in,Stuff}). get_fun_msg({Type,Source}, Debug) -> case Type of port -> receive Msg = {Source,{data,Res}} -> true end; server -> receive Msg = {Source,{fromsocket,Res}} -> true end; client -> receive Msg = {Source,{fromsocket,Res}} -> true end end, {data,Res,sys_debug(Debug,Msg)}. get_cback_msg({Type,Source}, Debug) -> case Type of port -> receive Msg = {Source,{data,Res}} -> true ; Msg = {'EXIT',Source,Why} -> Res = {'EXIT',Why} end; client -> receive Msg = {Source,{fromsocket,Res}} -> true ; Msg = {'EXIT',Why} -> Res = {'EXIT',Why} ; Msg = {'EXIT',Source,Why} -> Res = {'EXIT',Why} end, server -> receive Msg = {Source,{fromsocket,Res}} -> true ; Msg = {Source,{socket_closed,Why}} -> Res = {'EXIT',Why} ; Msg = {'EXIT',Source,Why} -> Res = {'EXIT',Why} end end, sys_debug(Debug, Msg), Res. That's 29 non-blank lines. The original, using macros, required 40 lines. I see no benefit from macros here. In fact, I think this is the kind of experience we can expect from the preprocessor. DELENDA EST PREPROCESSOR!