Macros in Racket, part two

:: computer, lisp

The second part of my notes on writing macros in Racket.


This is the second part of at least three: the first part is here, and the third part is here. This won’t make much sense unless you’ve read that. As before I make no claims to be an expert in Racket’s macro system although I am familiar with Lisp macros in general: this is just some more notes I wrote while learning it.

The unwashed Lisp hacker’s version of collecting

So, we can write clet: can we write collecting? Yes, we can:

(require (for-syntax racket/list))

(define-syntax (collecting stx)
  (datum->syntax
   (quote-syntax collecting)
   `(let ([r '()])
      (define (,(datum->syntax stx 'collect) it)
        (set! r (cons it r)) it)
      ,@(rest (syntax->list stx))
      (reverse r))))

This works because, in the internal definition of collect, we’ve intentionally given it a name which uses the context of the syntax object we’re transforming, not the context of the macro. It’s easy to confirm that this works the way you would expect, and in particular that it’s safe in both directions: for instance

> (let ((reverse (λ (x) x)))
    (collecting (collect 1) (collect 2)))
'(1 2)

shows that the binding of reverse when the macro is called has not ‘infected’ the macro definition.

It seems as if that should be all you need: so long as you are careful about which context you choose, and you make sure that the ‘default’ context is the one from the macro not from where it is used. In fact it isn’t, quite: see below. However even if it were, it’s clearly a pain to write macros this way.

Pattern matching

Pretty much all macros do two things:

  1. deconstruct their arguments in some more-or-less complicated way, but almost always in a way which is significantly more complicated than anything that needs to be done for the arguments of a function;
  2. construct a form which is the result of the macro and which, again, may be complicated.

The beauty of traditional Lisp macros is that since the arguments and results of the macro were just what the reader spat out — lists and symbols and so on — and since Lisp was kind of good at doing things to these structures as it was designed for that, and finally since the whole power of the language was available in the macro, this was not horrible even without special tools, although it was not particularly pleasant for complicated macros.

Hygienic macros make this much less pleasant because the objects that need to be deconstructed and constructed are now opaque syntax objects, and there is additional worrying about context to do. The answer to this is to provide special tools which do the boring bits for you: this makes everything simpler, at the cost of making it still more opaque what is actually happening. In almost all cases that’s a tradeoff worth making. Pattern matching is also a fashionable thing amongst the young and hip, of course.

The way this is done in Racket is via syntax-case, its slightly simpler friend syntax-rules, and by syntax and variants on it.

syntax-case takes a bit of syntax and matches it against patterns, binding matches, which can then be used in syntax forms lexically within it to return syntax objects, whose context is that of the syntax-case form (so hygienic). There is syntactic sugar for syntax: (syntax ...) can be written #'... in the same way that (quote ...) can be written '.... There is also quasisyntax which works the same way as quasiquote, except that the various unquoting things are preceeded with #. quasisyntax, unsurprisingly also has syntactic sugar coating: (quasisyntax ...) can be written #`....

I’m not going to describe the patterns in any detail, largely because I only understand the simple cases. However the simple cases are relatively easy to understand and pleasant to use.

Once a case has matched in syntax-case the corresponding expression is evaluated, and its value is the value of the form. Generally that wants to be a bit of syntax.

The first important thing to understand is that syntax is not quote-for-syntax: it interpolates things which matched in a lexically surrounding syntax-case, if there is one (if there isn’t, then I think it is quote-for-syntax).

The second important thing to understand is that syntax-case and syntax turn Racket into a sort of bodged Lisp–2: the things matched by syntax-case can be used only in syntax forms. But it’s not actually a separate namespace, because if you refer to them outwith such a form you get a compile-time error. I don’t know why this is — perhaps to avoid accidentally naming matches outside a syntax form — but it is certainly annoying.

So, here are some examples.

A simple while form:

(define-syntax (while stx)
  (syntax-case stx ()
    [(_ test body ...)
     #'(let loop ()
         (when test
           body ...
           (loop)))]))

A simple implementation of let, leaving out the named-let case, which shows how good the pattern matching is:

(define-syntax (with stx)
  (syntax-case stx ()
    [(_ ([var val] ...) body ...)
     #'((λ (var ...) body ...) val ...)]))

A better implementation which deals with the empty body case ((λ (...)) is illegal in Racket) and also optimises a simple case:

(define-syntax (with stx)
  (syntax-case stx ()
    [(_ () body ...)
     ;; no vars: trivial case
     #'(begin body ...)]
    [(_ ([var val] ...))
     ;; null body: make sure vars are evaluated
     #'(begin val ... (void))]
    [(_ ([var val] ...) body ...)
     #'((λ (var ...) body ...) val ...)]))

One thing which syntax-case allows is the notion of literal names which must occur in the source. So for instance let’s say I wanted to write some mutant loop macro whose syntax was (loop for x in y do ...): where for, in, do are literals. Well, I can write something to match this:

> (define-syntax (loop stx)
    (syntax-case stx (for in do)
    [(_ for v in l do body ...)
     #'(for ([v (in-list l)]) body ...)]))
> (loop for x in '(1 2 3) do (print x))
123
> (loop with x in '(1 2 3) do (print x))
loop: bad syntax in: (loop with x in (quote (1 2 3)) do (print x))

The syntax object that corresponds to stx here is the whole form: the equivalent to CL’s &WHOLE. It’s almost never necessary to worry about the car of this since it will obviously be loop. However I’m always tempted to provide it as a literal.

syntax-rules is (almost: there is some complexity I think) a wrapper around syntax-case which provides the function wrapper for it and which implicitly wraps the right hand side of the cases, which must be just one form, in a syntax form. So the above definition of with could be written:

(define-syntax with
  (syntax-rules ()
    [(_ () body ...)
     ;; no vars: trivial case
     (begin body ...)]
    [(_ ([var val] ...))
     ;; null body: make sure vars are evaluated
     (begin val ... (void))]
    [(_ ([var val] ...) body ...)
     ((λ (var ...) body ...) val ...)]))

syntax-rules can be defined something like this (this is due to bmastenbrook):

(require (for-syntax 
          (rename-in racket 
                     [syntax-rules racket:syntax-rules])))

(begin-for-syntax
  (define-syntax syntax-rules
    (racket:syntax-rules ()
      [(_ literals (pattern expansion) ...)
       (lambda (s)
         (syntax-case s literals
           (pattern #'expansion) ...))])))

define-syntax-rule combines define-syntax and a single rule for syntax-rules. I think it might be equivalent to this:

(define-syntax define-syntax-rule
  (syntax-rules ()
    [(_ (name pat ...) expansion)
     (define-syntax name
       (syntax-rules ()
         [(name pat ...) expansion]))]))

although I am probably missing some complexity here.

There is a useful variant on syntax-case called with-syntax: it looks more like let-style thing, and all the patterns in the clauses must match, when all the pattern variables will be bound.

So, what about our desirable macros?

collect is pretty easy. Here are two different versions. The first uses quasisyntax:

(define-syntax (collecting stx)
  (syntax-case stx ()
    [(_) #'(void)]
    [(_ body ...)
     #`(let ([r '()])
         (define (#,(datum->syntax stx 'collect) it)
           (set! r (cons it r)) it)
         body ...
         (reverse r))]))

The second uses with-syntax:

(define-syntax (collecting stx)
  (syntax-case stx ()
    [(_) #'(void)]
    [(_ body ...)
     (with-syntax ([collect (datum->syntax stx 'collect)])
       #'(let ([r '()])
         (define (collect it)
           (set! r (cons it r)) it)
           body ...
           (reverse r)))]))

This is pretty nice, I think. Note that you could not do this with syntax-rules, or at least I can’t see how to do it: syntax-rules is quite a lot less general than syntax-case.

clet is harder, because each element of the binding list may be either an identifier or a two-element list. If we insisted on a two-element list it would be easy (see above). Here is the best I can do:

(require racket/undefined)        

(define-syntax (clet stx)
  (syntax-case stx ()
    [(_ ()) #'(void)]
    [(_ () body ...) #'(begin body ...)]
    [(_ (b ...) body ...)
     (let-values ([(vars vals)
                   (for/lists (as vs) ([binding (syntax->list #'(b ...))])
                     (syntax-case binding ()
                       [(var val) 
                        (identifier? #'var)
                        (values #'var #'val)]
                       [var
                        (identifier? #'var)
                        (values #'var #'undefined)]
                       [_ (raise-syntax-error #f "bad binding" stx)]))])
       #`((λ #,vars body ...) #,@vals))]))

Well, this is still quite hairy, but almost all of the hair involves processing the binding list, which is done using syntax-case again, using an additional feature of it whereby it can use a ‘guard’ expression to decide whether a clause matches: identifer? returnt true if a syntax object refers to an identifier. I think there must be a way of using with-syntax to avoid the quasisyntax form.

Even with all this hair, this version of clet is far easier to read than the previous one, and not harder to read than the CL equivalent.

A better version of clet would, I think, need a proper parser for syntax. I think that is what syntax-parse is, although I have not investigated that.

Macro composition

As mentioned above, we don’t yet have quite all the tools we need to write some kinds of macros: specifically macros which are intentionally slightly unygienic, such as collecting. As an example, let’s suppose we wanted a general purpose, intentionally-unhygenic, with-abort macro which provided an abort function which would, well, abort. Without thinking too hard about the implications of call/cc we could write this as:

(define-syntax (with-abort stx)
  (syntax-case stx ()
    [(_ body ...)
     #`(call/cc (λ (#,(datum->syntax stx 'abort))
                  body ...))]))

So now (with-abort (abort 2) (end-the-world)) returns 2 and does not end the world.

Well, we might want to use this macro in another macro:

(define-syntax-rule (while/abort test body ...)
  (with-abort
    (let loop ([r test])
      (when r
        body ...
        (loop test)))))

Now something like the following will work:

> (let ([x 0])
    (while/abort (< x 10) (set! x (+ x 1)) (print x)))
12345678910

But the whole point was to be able to use abort in the body, and that doesn’t work:

> (let ([x 0])
    (while/abort (< x 10) (set! x (+ x 1)) (when (> x 1) (abort 'done))))
abort: undefined;
 cannot reference an identifier before its definition

Oh, dear. The problem here is that while/abort is hygenic, so the abort binding that is introduced by with-abort is not visible in the body.

We could fix this by better design:

(define-syntax-rule (with-named-abort (abort) body ...)
  ;; a better macro
  (call/cc (λ (abort) body ...)))

(define-syntax (with-abort stx)
  ;; backwards compatible
  (syntax-case stx ()
    [(_ body ...)
     #`(with-abort (#,(datum->syntax stx 'abort)) body ...)]))

(define-syntax (while/abort stx)
  ;; the end result
  (syntax-case stx ()
    [(_ test body ...)
     #`(with-named-abort (#,(datum->syntax stx 'abort))
         (let loop ([r test])
           (when r
             body ...
             (loop test))))]))

But that’s not the solution we’re after.

Racket’s answer to this is syntax parameters. I don’t completely understand these, but they are at least close to dynamic variables, except at macro-expansion time. What you do is to define a syntax parameter, and then rebind it during the expansion: the rebound value is visible to macros which are expanded dynamically within the rebinding form. As with Racket’s ordinary special variables these look like functions (yet another namespace in disguise).

So we can define a syntax parameter called abort using define-syntax-parameter:

(require racket/stxparam)

(define-syntax-parameter abort
  (λ (stx)
    (raise-syntax-error #f "not available" stx)))

So now any reference to abort will result in a syntax error:

> (abort)
abort: not available in: (abort)
> abort
abort: not available in: abort

And we can now try to use syntax-parameterize, to rebind abort as a macro:

(define-syntax with-abort
  (syntax-rules (with-abort)
    [(with-abort) (void)]
    [(with-abort body ...)
     (call/cc
      (λ (a)
        (syntax-parameterize ([abort
                               (syntax-rules ()
                                 [(_ ...) (a ...)])])
          body ...)))]))

And this fails horribly, because the outer syntax-rules thinks it owns the patterns and sees ...s that it does not expect. So much for that.

Well, we could at least check this works with a specific number of arguments:

(define-syntax with-abort
  (syntax-rules (with-abort)
    [(with-abort) (void)]
    [(with-abort body ...)
     (call/cc
      (λ (a)
        (syntax-parameterize ([abort
                               (λ (stx)
                                 (syntax-case stx (abort)
                                   [(abort) #'(a)]
                                   [(abort x) #'(a x)]
                                   [_ (raise-syntax-error #f "I give up" stx)]))])
          body ...)))]))

But this is obviously just a rubbish answer.

Well, there is an answer to this: all we really need to do is to make the abort macro attach itself to a, and there is a special hack, make-rename-transformer, to do this:

(define-syntax with-abort
  (syntax-rules (with-abort)
    [(with-abort) (begin)]
    [(with-abort body ...)
     (call/cc
      (λ (a)
        (syntax-parameterize ([abort (make-rename-transformer #'a)])
          body ...)))]))

And this now works:

> (with-abort (abort 1 2 3))

1
2
3

And we can use this to write a really robust version of collecting

(require racket/stxparam)

(define-syntax-parameter collect
  (λ (stx)
    (raise-syntax-error #f "not collecting" stx)))

(define-syntax collecting
  (syntax-rules ()
    [(_) (void)]
    [(_ body ...)
     (let ([r '()])
       (define (clct it)
         (set! r (cons it r)) it)
       (syntax-parameterize ([collect (make-rename-transformer #'clct)])
         body ...
         (reverse r)))]))

As far as I can see there is still a problem, however: it is very hard to write macros which expand to other macros which themselves do pattern-matching, since the patterns get acquired by the outer macros. There must be some answer to this, but I can’t see what it is.

On the other hand, this is also extremely painful in CL: here is a version of collecting where collect is a local macro:

(defmacro collecting (&body forms)
  ;; collect lists forwards using a tail pointer
  ;; local macro version
  (let ((rn (make-symbol "R"))
        (rtn (make-symbol "RT"))
        (itn (make-symbol "IT")))
    `(let ((,rn '())
           (,rtn nil))
       (macrolet ((collect (form)
                    `(let ((,',itn ,form))
                       (if (not (null ,',rn))
                           (setf (cdr ,',rtn) (cons ,',itn nil)
                                 ,',rtn (cdr ,',rtn))
                         (setf ,',rn (cons ,',itn nil)
                               ,',rtn ,',rn))
                       ,',itn)))
         ,@forms)
       ,rn)))

This is not easy to understand.

Additionally, the problem almost always comes from ellipses, and in many interesting cases they can be avoided by using dotted pairs as patterns — here is yet another version of with-abort that does this:

(require racket/stxparam)

(define-syntax-parameter abort
  (λ (stx)
    (raise-syntax-error #f "not available" stx)))

(define-syntax with-abort
  (syntax-rules (with-abort)
    [(with-abort) (void)]
    [(with-abort body ...)
     (call/ec
      (λ (a)
        (syntax-parameterize ([abort
                               (syntax-rules (abort)
                                 [(abort . args) (a . args)])])


          body ...)))]))

This is clearly better than the CL version.

Summary

Well, I think I now know enough about Racket’s macros to be going on with: I can certainly write the macros I need to be able to write now without it just being cargo-cult programming. There are still things I don’t understand, and the whole system smells to me as if, by trying remain ideologically pure, it has become vast and essentially incomprehensible. This seems to be a common problem with Scheme, unfortunately.

Small notes

Macro definitions scope properly, so you can define a local macro the same way you can define a local function, so this works:

(define (foo ...)
  (define-syntax-rule (while test body ...)
    (let loop ()
      (when test
        body ...
        (loop))))
  ... (while ... ...) ...)

This makes the equivalent of CL’s MACROLET easy to do.

For fun, here is a version of with which can deal with named-let: There must be a way of implementing this without assignment, but I can never work out what it is.

(define-syntax (with stx)
  (syntax-case stx ()
    [(_ ())
     ;; all null
     #'(void)]
    [(_ () body ...)
     ;; no vars: trivial case
     #'(begin body ...)]
    [(_ ([var val] ...))
     ;; null body: make sure vars are evaluated
     #'(begin val ... (void))]
    [(_ ([var val] ...) body ...)
     ;; normal let
     #'((λ (var ...) body ...) val ...)]
    [(_ n ())
     (identifier? #'n)
     ;; named null
     #'(void)]
    [(_ n ([var val] ...))
     (identifier? #'n)
     ;; named null body
     #'(begin val ... (void))]
    [(_ n ([var val] ...) body ...)
     ;; named let with arguments
     ;; (is there an implementation without assignment?
     (identifier? #'n)
     #'((λ (n)
          ((λ (l)
             (set! n l)
             (l val ...))
           (λ (var ...) body ...)))
        #f)]
    [_ (raise-syntax-error #f "bad syntax" stx)]))

Things I still do not know or understand

At this point I’m mostly comfortable writing macros in Racket, but there are things I still do not understand:

  • protecting and arming syntax objects — I just don’t understand what this is about at all;
  • syntax-parse is, I think, not difficult but I have not bothered to learn about it as it seems to add yet another layer.
  • there are probably other things that I don’t even know I don’t know.

At some point I might write a further part of this series on some of that.


Pointers

Eli Barilay’s paper on syntax-parameterize.

Fear of Macros, again.

Macros in Racket, part one

:: computer, lisp

I’ve written in Lisp for a long time, but I’ve never used a hygienic macro system in any way other than the most simple. Here are some initial notes on my experiences learning Racket’s macro system.


This is the first part of several: see part two and part three. I’m not completely fluent with Racket macros yet: there are almost certainly mistakes and confusions here. Despite appearances, I also have no axe to grind: I’m learning Racket because I want to and I have time. Finally this is not a tutorial: look at Greg Hendershott’s Fear of Macros for something closer to that. This is just some notes which were useful to me, and might be useful to other CL people.

Macros in Common Lisp

Common Lisp’s macro system is, in essence, simple: it’s what you’d end up writing if you had to write a macro system for a Lisp. That’s not surprising because it is the descendent of the first macro systems people wrote for Lisp. In CL what happens is this:

  1. the reader ingests the source text and produces data structures which represent the source of the program;
  2. these structures are possibly transformed by macros, which are simply Lisp functions which are given the Lisp representation of the source and return some other representation;
  3. once all macros are expanded, then the code is compiled, evaluated or both.

(I have missed out some subtleties here, but they don’t matter for my purposes.)

In CL, what the reader produces is exactly what you would expect. If it reads "(defun foo (a) a)" then, with standard settings, it returns a list whose car is the symbol DEFUN (in the CL package) and so on. It is this structure that macros transform.

CL provides relatively limited support for writing macros: there is backquote, which is critical to being able to write macros which are even slightly readable, limited pattern matching in the form of destructuring, and there are mechanisms to generate unique names as well a few other things. There is a semi-standard way of enquiring about bindings in the environment at macro expansion time, although this is not in the standard.

In practice, CL’s macro system has turned out to work very well; in theory it has all sorts of problems, the most important being that the programmer is entirely responsible for making sure that macros don’t introduce or accidentally use names they should not. Consider this:

(defmacro collecting (&body forms)
  ;; collect lists forwards using a tail pointer
  ;; polluting version
  `(let ((r '())
         (rt nil))
     (flet ((collect (form)
              (if (not (null r))
                  (setf (cdr rt) (cons form nil)
                        rt (cdr rt))
                (setf r (cons form nil)
                      rt r))
              form))
       ,@forms)
     r))

This intentionally introduces a function binding, collect, but also accidentally introduces bindings for r and rt.

(let ((r 2))
  (collecting
    (+ r r)))

Does not do what it should. One right way to write the collecting macro is like this:

(defmacro collecting (&body forms)
  ;; collect lists forwards using a tail pointer
  ;; non-polluting version
  (let ((rn (make-symbol "R"))
        (rtn (make-symbol "RT")))
    `(let ((,rn '())
           (,rtn nil))
       (flet ((collect (form)
                (if (not (null ,rn))
                    (setf (cdr ,rtn) (cons form nil)
                          ,rtn (cdr ,rtn))
                  (setf ,rn (cons form nil)
                        ,rtn ,rn))
                form))
         ,@forms)
       ,rn)))

And now the above form does not signal an error and correctly returns ().

Note that the problem is with names and not just bindings. Consider this CL code:

(defvar *stashes* '())
(defvar *mark* nil)

(defun stash (name thing)
  ;; Stash something under a name
  (setf *stashes* (acons name thing *stashes*))
  (values name thing))

(defun retrieve (name)
  ;; Retrieve the value of a name, dropping everything stashed more
  ;; recently, and stopping at the mark, if any.
  (let ((mark *mark*))
    (labels ((rl (tail)
               (if (or (null tail)
                       (eq (first tail) mark))
                   (values nil nil)
                 (destructuring-bind ((n . v) . r) tail
                   (if (eql n name)
                       (progn
                         (setf *stashes* r)
                         (values v t))
                     (rl r))))))
      (rl *stashes*))))

(defmacro with-marked-stash (&body forms)
  ;; mark the stack of stashes for the dynamic extent of FORMS
  (let ((mn (make-symbol "MARK")))
    `(let ((*stashes* (cons ',mn *stashes*))
           (*mark* ',mn))
       ,@forms)))

In this code the marks on the stack of stashes established by with-marked-stash are not bound anywhere: they are just names. But it’s important to the correct functioning of the code that they are unique names. (There are better ways of doing this such as using a fresh cons for the mark: I just wanted an example where a name mattered other than as the name of a variable.)

The politically correct way of saying that we’re talking about names is to talk about ‘lexical context’ or ‘lexical information’: it’s the same thing but more confusing to those not initiated into the cult, which is always good.

The disadvantages of the CL macro system are this problem with hygiene and the lack of any clever tools to do pattern matching on macro forms. The second of these is easily overcome by using any of a number of tools, while the first is generally not a problem in practice: CL being a Lisp–2 (separate namespaces for functions and variables) helps here.

The advantage of the CL macro system is that there is no magic: macros get passed the things that the source code looks like — generally a structure whose interesting parts are lists and symbols — which you process using the normal list-processing tools to produce some other structure which is the expansion of the macro. It’s easy enough that you could write it yourself: there are no special opaque objects being handed around.

That being said, having a standard set of tools for pattern matching in macros and a way of dealing with the hygiene problems which is less ugly than in CL might well be worth the cost in transparency.

Macros in Scheme

I am not a native Scheme person, but it has clearly taken the whole hygiene thing very seriously: Scheme, as a set of languages, treats purity as much more than CL, which revels in being a fairly grungy language, does. However these posts are not about Scheme: the only reason I am mentioning it is to say that I have not cared at all whether anything here applies generally to Scheme or is specific to Racket.

Macros in Racket: baby steps

For a long time the only kind of macros that I’ve really been able to define in Racket are annoyingly trivial ones using define-syntax-rule, things like:

(define-syntax-rule (while test body ...)
  (let loop ()
    (when test
      body ...
      (loop))))

That’s all very well, but the ‘obvious’ (and obviously wrong) definition of collect then looks like this:

(define-syntax-rule (collecting body ...)
  ;; horribly wrong	
  (let ([s '()])
    (define (collect it)
      (set! s (cons it s))
      it)
    body ...
    (reverse s)))

(There’s no obvious way to build lists backwards in Racket: reversing the list is probably as cheap as anything). This is either introducing a spurious binding for s or not introducing a deliberate one for collect, and in fact, of course, it’s the latter.

Quite apart from this, define-syntax-rule gives the strong impression that it lets you write only the sort of macros that would give people who write C++ great pride: simple ones. (Actually you can do reasonably hairy things even with this because the pattern matching is very competent:

(define-syntax-rule (mlet ([var val] ...) body ...)
  ((λ (var ...) body ...) val ...))

is an implementation of simple let, for instance. Indeed we can defined named let as well:

(define-syntax-rule (nlet label ([var val] ...) body ...)
  (mlet ()
    (define (label var ...) body ...)
    (label val ...)))

What I can’t work out how to do is to make mlet do both things: I think this is too hard for define-syntax-rule although I might be wrong.)

But for a long time I was stuck with that: whenever I looked at Racket macros in more detail I walked into a wall of opaque terminology and just decided that I had better things to do that year. This year, I don’t.

Two desirable macros

There are many ways people use macros in Lisp: some of them are good. I decided that if I could write two macros and understand them then I would be well on my way.

  • collecting / collect. This is the macro given above in CL. It’s interesting not for what it does — the tail-pointer stuff is less interesting now than it once was and is hard to implement in Racket anyway — but because it introduces a binding: it is intentionally not completely hygienic, while having an essentially trivial expansion: no complicated destructuring is needed.
  • CL’s let, which I’ll call clet. This is interesting because it requires destructuring of arguments which is not completely simple, but it does not present problems of hygiene. The reason it’s not just a subset of Racket’s let is that CL allows variables with no initial value, which get bound to nil and should, I think, become undefined in Racket. So (clet ((x 1) y) body ...) should expand to (let ([x 1] [y undefined]) body ...) or something equivalent to that.

Here is a simple implementation of clet in CL, missing any error checking:

(defmacro clet (bindings &body forms)
  (multiple-value-bind (args vals)
      (loop for binding in bindings
            for consp = (consp binding)
            collect (if consp (first binding) binding) into as
            collect (if consp (second binding) nil) into vs
            finally (return (values as vs)))
    `((lambda (,@args) ,@forms) ,@vals)))

Like most macros in CL it’s not particularly pretty but it is reasonably clear what it does.

I will use these two macros as examples below.

Phases

To understand macros in any Lisp you need to develop a strong idea of the various ‘times’ that things happen and the relationships between them: for CL these are things like read time, macro expansion time, compilation time (compiler-macro expansion time), load time, run time and so on. Racket has formalised the parts of this after read time into a notion of ‘phase’:

  • phase 0 is run-time;
  • phase 1 is macro expansion time;
  • phase 2 would, I think, be macros used in macro expansion;
  • and so on.

However I am not sure how this ties in to read time: is that phase 1? For CL read time is before macro expansion time although the two are, or may be, interleaved at the granularity of forms (rather than a per-file or per-compilation-unit). Also there are negative phases which I don’t understand, although I think they must be to do with code which exists at macro expansion time (phase 1) wanting to make things available at run time (phase 0). All of this is integrated into the module system (and CL gets away without it mostly because it does not have a formalised module system).

Bindings exist at a phase, and the same name can have different bindings at different phases.

Modules can say what they provide at which phase, and, importantly, the racket module does indeed provide different things at different phases: if you look at it you’ll find:

(provide ...
         (for-syntax (all-from-out racket/base)))

Which means that, at phase 1, what is available is racket/base: a significantly smaller language than racket itself. If you need things in macros which are in racket but not racket/base you need to require them:

(require (for-syntax ...))

An example of this is first & rest, both of which are provided at phase 0 by racket but not at phase one: if you want them you need to say (require (for-syntax racket/list)).

Syntax objects

As in CL, Racket macros are source-to-source functions. The difference is that in Racket the source is represented by a syntax object and a macro needs to produce another syntax object, while in CL source is represented as it looks: usually as nested lists.

So then a Racket macro is simply a function which maps from syntax objects to other syntax objects. The reason for having an opaque syntax object is that it can carry around all sorts of information around with it, and in particular it can carry information about names, which help the system maintain hygiene. (There is also information about source location and so on, but this isn’t so important.)

So the Racket macro system needs tools to transform syntax objects into other syntax objects, ultimately by digging around inside them to find out what the source code actually was. This is necessarily more complicated than it is in CL both because the objects are opaque and because they contain information which is not present at all in the objects CL macros get.

Additionally, and mostly independently, there is a layer on top of this which does not exist in CL (without libraries) at all: pattern matching and template filling. This means that for many purposes you can write macros in Racket simply by specifying patterns that the source must match and filling templates with the results of those matches. This is a very nice way of writing macros, although it renders what is actually going on even more opaque. For a CL person, used to feeling the bits between their toes, this can be quite disconcerting at first since what is actually happening can become entirely obscure.

Syntax objects for the unwashed Lisp hacker

Well, of course it is possible to ignore all this terrifyingly modern pattern matching stuff and write macros almost the way you do in CL, and it’s worth doing that at least once, perhaps. So here is clet:

(require (for-syntax racket/list)
         racket/undefined)

(define-syntax clet
  (λ (stx)
    (define ctx (quote-syntax clet))
    (define top-level (syntax->list stx))
    (define bindings (second top-level))
    (define body (rest (rest top-level)))
    (define-values (args vals)
      (for/lists (as vs) ([binding (syntax->list bindings)])
        (define it (syntax->list binding))
        (if it
            (values (first it) (second it))
            (values binding (datum->syntax ctx 'undefined)))))
    (datum->syntax 
     ctx
     `((λ (,@args) ,@body) ,@vals))))

So how does this work? Well, it uses some functions provided by Racket to look inside the syntax object (getting the ‘datum’ in the syntax object) and in turn to construct a new one:

  • syntax->list takes a syntax object which wraps a proper list and unpacks one level of it, returning a list of syntax objects, or #f if it does not wrap a proper list;
  • datum->syntax takes a context object and a datum and wraps it into a syntax object, leaving any syntax objects in the datum as they are;
  • quote-syntax is like quote but it creates a syntax object, and this object contains the lexical information present in the source.

So the macro pulls apart the syntax object in a fairly straightforward way: making it into a list, extracting the second element and all the remaining elements, which will be the binding specifications, and then grinding over the binding specifications, using syntax->list both to work out if the bindings are a list or not and to extract the variable and value if it is, and then reassembles everything as a call to an anonymous function.

The critical trick is that the context that datum->syntax needs is a syntax object and you need to pick the right one: you can use the syntax object you got given, which provides the context of the place where the macro was expanded, or you can use a syntax object of your own devising which provides that object’s context. And in this case we want our own context, not the context of place where the macro was expanded. This is what ctx is for: providing a suitable context.

Notice the require:

  • we need racket/list at phase 1 (macro expansion time) because the macro uses first and so on;
  • we need racket/undefined at phase 0 (run time) as the expansion of the macro uses undefined.

So we can try this:

(clet ((x 12) y) (values x y))
12
#<undefined>
> (let ((undefined 'hello)) (clet (x) x))
#<undefined>
> (clet ((undefined 'hello)) (clet (x) x))
#<undefined>
> (clet ((x 1)))
λ: bad syntax in: (λ (x))
> (clet (1) 1)
λ: not an identifier, identifier with default, or keyword in: 1

The second and third examples show why we need the macro context: we don’t want a binding of undefined to alter what the clet picks as the undefined value. The fourth and fifth examples show that the macro isn’t very robust, and has terrible error reporting.

Some notes:

  • I’ve deliberately written (define-syntax clet (λ (stx) ...) rather than the more pleasant (define-syntax (clet stx) ...) to make it clear that clet is a function which transforms a syntax object;
  • but I’ve used internal define where in CL there would be let* or nested lets — I’m not sure why other than reducing indentation;
  • the destructuring of the syntax object is done in a way which is primitive even by the standards of CL;
  • it should be evident that the macro is not very robust — something like (clet ((x 1) 2) ...) will fail horribly;
  • it’s not much less clear than the CL version, although I think it is a bit less clear.

I am fairly but not completely sure that this macro is right: I am slightly confused by the handling of undefined: although it is easy to check, by wrapping clet into a module, that clients of that module don’t themselves need to import racket/undefined and do get the right initial values in forms like (clet (x) ...) I am still a bit queasy about what it’s doing.

What is very clear is that this macro is just horrible: even by the standards of CL macros it’s horrible, because there is so much explcit unpacking and repacking going on. Things would be even worse if there was any significant error checking. Something better than this is needed to deal with syntax objects, in a way that it isn’t needed for CL macros. In next week’s exciting episode I’ll look at ways of making this better.


Pointers

Writing ‘syntax-case’ Macros by Eli Barzilay. This was the article that first helped me understand what was going on.

Fear of Macros by Greg Greg Hendershott. This is an introduction to macros, and macros in Racket in particular, by the author of Frog.

The cult of programming

:: computer, lisp

Programming is not meant to be easy and it’s important to make sure that it is as cryptic as possible otherwise people other than cult members might be able to understand it. Of course, you also need to make sure it’s pure, because otherwise cult members will laughingly throw you into a pit full of spikes and the rotting remains of other heretics.


For instance, you can’t be writing this sort of thing:

(defun ss (n)
  (let ((s 0) (i 0))
    (tagbody
     loop
     (when (> i n) (go done))
     (setf s (+ s (* i i))
           i (+ i 1))
     (go loop)
     done
     (return-from ss s))))

This is just terrible code. Non cult members may well be able to understand it, and the cultists will have you in the pit before you know it.

You might think this was better

(defun ss (n)
  (loop for i from 0 to n
    summing (* i i)))

But in fact it’s far worse. Fellow cultists will definitely still be at the laughing and pit-throwing, and the others will certainly understand it and laugh at you because you don’t know the closed form.

Instead, you must write this:

(define (ss n)
  (let-values ([(a i l) (call/cc (λ (c) (values 0 0 c)))])
    (l (+ a (* i i))
       (+ i 1)
       (if (< i (- n 1))
           l
           (λ (a i l) a)))))

This is almost a perfect solution. It’s so achingly pure and cryptic that you will be immediately appointed king of the cult and be able to do your own laughing, and throw other members into pits you have first made them dig, for which they will thank you as they slide down the spikes. Non cult members stand essentially no chance of understanding what it does and sniping about the whole silly closed-form thing: certainly the only way they will be able to learn what it does is by first joining the cult, at which point, as king, you can just throw them straight into the pit.

It’s important you understand this.

The end of the world

:: stories

Investment bankers are often called ‘sharks’ and this is, in fact, a good description. There is nothing wrong with sharks: they are beautiful animals designed by billions of years of natural selection to do one thing extremely well. You can not expect a shark to be other than a shark: rather you must understand how sharks behave and arrange matters so as not to be eaten. Governments can do this for banks: they did it in 1933, after all, and it served us well for nearly 70 years. However governments entirely failed to do this after the events of 2008 for reasons of stupidity and corruption.

Rerooting Frog

:: computer

Frog wants to create blogs which hang directly under /. I want mine to live under a subdirectory, and to have all its data living under that directory. I’ve made some changes to Frog to support that. As of 20150702 these changes have been merged to the main frog repo: you no longer need to refer to mine, which is obsolete.

First

:: meta

I often find myself writing fairly substantial mail messages, posts or comments to posts, which inevitably get lost. This is a way to keep some of them, I hope.