Two sides to hygiene

:: lisp

It’s tempting to think that by being sufficiently careful about names bound by traditional Lisp macros you can write macros which are hygienic. This is not true: it’s much harder than that.

Hygienic macros

I do not fully understand all the problems which Scheme-style hygienic macros try to solve, and the implementation of the solutions is usually sufficiently difficult to understand that I have always been put off doing so, especially as the details of the implementation in Racket, the Scheme-related language I use most, seems to change every few years. I’m happy enough that I am mostly competent to write the macros I need in Racket, without understanding the details of the implementation.

Traditional Lisp macros are, to me, far more appealing because they work in such an explicit and simple way: you could pretty easily write a macroexpander which did most of what the Common Lisp macroexpander does, for instance. I have written several toy versions of such a thing: I’m sure most Lisp people have. Traditional Lisp macros are just functions between bits of language expressed explicitly as s-expressions: what could be simpler?

In fact I am reasonably confident that, if I had to choose one, I’d choose CL’s macros over Racket’s: writing macros in raw CL is a bit annoying because you need explicit gensyms and you need to do pattern matching yourself. But you can write, and I have written tools to make most of this go away. With these, writing macros in CL can often be very pleasant. And it’s easy to understand what is going on.

What is far harder though, is to make it completely hygienic. Here’s one reason why.

Several versions of a macro in Common Lisp

Let’s imagine I want a macro which allows you to select actions based on the interval a real number is in. It might look like this:

(interval-case x
  ((0 1) ...)
  ((1) 2) ...)
  (otherwise ...))

Here intervals are specified the way they are in type specifiers for reals:

  • (a b) where a and b are reals means \([a,b]\);
  • ((a) b) where a and b are reals means \((a,b]\);
  • and so on.

There can be only one interval per clause, for simplicity.

I will write several versions of this macro. For all of them I will use dsm and, later, metatronic macros to make things better.

First of all here’s a function1 which, given an interval specification, returns a form which will match numbers in that interval:

(defun compute-interval-form (v iv)
  (destructuring-match iv
    (((l) (h))
     (:when (and (realp l) (realp h)))
     `(< ,l ,v ,h))
    ((l (h))
     (:when (and (realp l) (realp h)))
     `(and (<= ,l ,v) (< ,v ,h)))
    (((l) h)
     (:when (and (realp l) (realp h)))
     `(and (< ,l ,v) (<= v ,h)))
    ((l h)
     (:when (and (realp l) (realp h)))
     `(<= ,l ,v ,h))
    (default
     (:when (member default '(t otherwise)))
     t)
    (otherwise
     (error "~S is not an interval designator" iv))))

A hopeless version

Here is a version of this macro which is entirely hopeless:

(defmacro interval-case (n &body clauses)
  ;; Hopeless
  `(cond
    ,@(mapcar (lambda (clause)
                (destructuring-bind (iv &body forms) clause
                  `(,(compute-interval-form n iv) ,@forms)))
              clauses)))

It’s hopeless because of this:

> (let ((x 1))
    (interval-case (incf x)
      ((1 (2)) '(1 (2)))
      ((2 (3)) '(2 (3)))))

So (incf x) where x is initially 1 is apparently neither in \([1,2)\) nor \([2,3)\) which is strange. This is happening, of course, because the macro is multiply-evaluating its argument, which it should not do.

An obviously unhygienic repair

So let’s try to fix that:

(defmacro interval-case (n &body clauses)
  ;; Unhygenic
  `(let ((v ,n))
     (cond
      ,@(mapcar (lambda (clause)
                  (destructuring-bind (iv &body forms) clause
                    `(,(compute-interval-form 'v iv) ,@forms)))
                clauses))))

Well, this is better:

> (let ((x 1))
    (interval-case (incf x)
      ((1 (2)) '(1 (2)))
      ((2 (3)) '(2 (3)))))
((2) (3))

but … not much better:

> (let ((x 1) (v 10))
    (interval-case (incf x)
      ((1 (2)) nil)
      ((2 (3)) v)))
2

The macro binds v, which shadows the outer binding of v and breaks everything.

A repair which might be hygienic

Here is the normal way to fix that:

(defmacro interval-case (n &body clauses)
  ;; OK
  (let ((vn (make-symbol "V")))
    `(let ((,vn ,n))
       (cond
        ,@(mapcar (lambda (clause)
                    (destructuring-bind (iv &body forms) clause
                      `(,(compute-interval-form vn iv) ,@forms)))
                  clauses)))))

And now

> (let ((x 1) (v 10))
    (interval-case (incf x)
      ((1 (2)) nil)
      ((2 (3)) v)))

Good. I think it is possible to argue that this version of the macro is hygienic, at least in terms of names.

A simpler repair using metatronic macros

Here is the previous macro written using metatronic macros:

(defmacro/m interval-case (n &body clauses)
  ;; OK, easier
  `(let ((<v> ,n))
       (cond
        ,@(mapcar (lambda (clause)
                    (destructuring-bind (iv &body forms) clause
                      `(,(compute-interval-form '<v> iv) ,@forms)))
                  clauses))))

This is simpler to read and should be as good.

An alternative approach …

Although it is not entirely natural in the case of this macro, many macros can be written by having the macro expand into a call to a function, passing another function whose body is the body of the macro as an argument. These things often exist as pairs of with-* (the macro) and call-with-* (the function).

We can persuade interval-case to work like that: it’s not a natural macro to write that way and writing it that way will end up with something almost certainly less efficient as (at least the way I’ve written it) as it needs to interpret the interval specifications at runtime rather than compile them2. But I wanted to have just one example.

Here is call/intervals, the function layer:

(defun call/intervals (n ivs/thunks)
  ;; Given a real n and a list of (interval-spec thunk ...), find the
  ;; first spec that n matches and call its thunk.
  (if (null ivs/thunks)
      nil
    (destructuring-bind (iv thunk . more) ivs/thunks
      (if (destructuring-match iv
            (((l) (h))
             (:when (and (realp l) (realp h)))
             (< l n h))
            ((l (h))
             (:when (and (realp l) (realp h)))
             (and (<= l n) (< n h)))
            (((l) h)
             (:when (and (realp l) (realp h)))
             (and (< l n) (<= n h)))
            ((l h)
             (:when (and (realp l) (realp h)))
             (<= l n h))
            (default
             (:when (member default '(t otherwise)))
             t)
            (otherwise
             (error "~S is not an interval designator" iv)))
          (funcall thunk)
        (call/intervals n more)))))

As well, here is a ‘nospread’ variation on call/intervals which serves as an impedence matcher:

(defun call/intervals* (n &rest ivs/thunks)
  ;; Impedence matcher
  (declare (dynamic-extent ivs/thunks))
  (call/intervals n ivs/thunks))

Now here’s the macro layer:

(defmacro interval-case (n &body clauses)
  ;; Purports to be hygienic
  `(call/intervals*
    ,n
    ,@(mapcan (lambda (clause)
                `(',(car clause)
                  (lambda () ,@(cdr clause))))
              clauses)))

So we can test this:

> (let ((x 1) (v 10))
    (interval-case (incf x)
      ((1 (2)) nil)
      ((2 (3)) v)))
10

So, OK, that’s good, right? This is another hygienic macro. Not so fast.

… which is not hygienic

> (flet ((call/intervals* (&rest junk)
           (declare (ignore junk))
           86))
    (interval-case 2
      ((1 2) 'two)))
86

Not so hygienic, then.

The alternative approach in Racket

Here is a similar alternative approach implemented in Racket:

(define (call/intervals n ivs/thunks)
  ;; Here ivs/thunks is a list of (iv thunk) pairs, which is not the same
  ;; as the CL version: that's because I can't work out how to do the
  ;; syntax rule otherwise.
  (match ivs/thunks
    ['() #f]
    [(list (list iv thunk) more ...)
     (if
      (match iv
        [(list (list (? real? l))
               (list (? real? h)))
         (< l n h)]
        [(list (? real? l)
               (list (? real? h)))
         (and (<= l n) (< n h))]
        [(list (list (? real? l))
               (? real? h))
         (and (< l n) (<= n h))]
        [(list (? real? l) (? real? h))
         (<= l n h)]
        [(or 'otherwise #t)
         #t]
        [_
         (error 'call/intervals "~S is not an interval designator" iv)])
      (thunk)
      (call/intervals n more))]))

(define (call/intervals* n  . ivs/thunks)
  ;; impedence matcher (not so useful here)
  (call/intervals n ivs/thunks))

(define-syntax-rule (interval-case n (key body ...) ...)
  (call/intervals* n (list 'key (thunk body ...)) ...))

And now:

> (call/intervals* 1 (list '(0 1) (thunk 3)))
3
> (interval-case 2
    ((1 2) 'two))
'two
> (let ([call/intervals* (thunk* 86)])
    (interval-case 2
      ((1 2) 'two)))
'two
> (let ([call/intervals* (thunk* 86)])
    (call/intervals* 2))
86

In Racket this macro is hygienic.

Two sides to hygiene

So the problem here is that there are at least two sides to hygiene for macros:

  • names they use, usually by binding variables but also in other ways, must not interfere with names used in the program where the macro is used;
  • the program where the macro is used must not be able to alter what names the macro refers to mean.

In both cases, of course, there need to be exceptions which are part of the macro’s contract with its users: with-standard-io-syntax is allowed (and indeed required) to bind *print-case* and many other variables.

I think almost everyone understands the first of these problems, but the second is much less often thought about.

Dealing with this problem in Common Lisp

I think a full solution to this problem in CL would be very difficult: macros would have to refer to the names they rely on by names which were somewhow unutterable by the programs that used them. Short of actually writing a fully-fledged hygienic macro system for CL this sounds impractical.

In practice the solution is to essentially extend what CL already does. For symbols (so, names) in the CL package there are strong restrictions on what conforming programs may do. This program is not legal CL3 for instance:

(flet ((car (x) x))
  ... (car ...))

So the best answer is then, I think, to:

  • use packages with well-defined interfaces in the form of exported symbols;
  • disallow or strongly discourage the use of internal symbols of packages by programs which are not part of the implementation of the package;
  • and finally place restrictions similar to those placed on the CL package on exported symbols of your packages.

Note that package locks don’t answer this problem: they usually forbid the modification of various attributes of symbols and the creation or deletion of symbols, but what is needed is considerably stronger than that: it needs to be the case that you can’t establish any kind of binding, even a lexical one, for symbols in the package.

Is this a problem in practice? Probably not often. Do I still prefer traditional Lisp macros? Yes, I think so.


  1. This function what you would want to make more complicated to allow multiple intervals per clause. 

  2. This interpretation could be avoided by havig the compiler turn the interval specifications into one-argument functions. I think it’s still not a natural way to write this macro. 

  3. Assuming that car means ‘the symbol whose name is "CAR" in the "COMMON-LISP" package’.