Macros in Racket, part three: checking boolean operators

2015-12-12 :: computer, lisp

I wanted to see if I could write a mildly complicated macro in Racket without becoming too confused. I can, although I am not sure it is terribly idiomatic.

This is the third part of a series on writing macros in Racket for someone used to Common Lisp, although it is mostly independent of the previous parts. The previous parts are part one & part two.

One of the nice things about Lisp-family languages is that you can write your own control constructs, and it’s essentially easy to do so: if when did not exist then you could write it:

(define-syntax-rule (when test form ...)
  (and test
       (begin form ...)))

This kind of extensibility is one of the wonders of Lisp and Scheme: it’s tempting to say that it makes them better than programming languages which can’t do this but that’s not correct: it makes them incomparable to such languages: Lisp¹ programs can reason about themselves and often do². Everything about Lisp really leads to this ability.

When I taught (Common) Lisp to people one of the things I would try to get across was this ability of macros to extend the control constructs in the language: people often thought of macros as a way of essentially inlining code³, but that’s not what they’re actually good for. If you can add control constructs to your language, then you can make a new language, and that’s what Lisp macros are about, and therefore what Lisp is about.

A good way to get this across to people is to pretend that Lisp doesn’t have some control construct, and write it as a macro. This is easier than inventing new control constructs both because it doesn’t require thinking of a domain where they might be useful and because the existing control constructs have clear semantics. Reimplementing existing control constructs also demonstrates how the language is already built up from a more primitive language by macros and how the approach to solving problems in Lisp is to design and implement a language in which to talk about the problem, where that language is seamlessly built on the underlying Lisp, and can inherit all of its power and flexibiliy, including the ability to extend the language.

An advantage of reimplementing existing control constructs for teaching Lisp is that you can compare the new construct to the existing one, and with some small constraints you can do this exhaustively, so you can know whether you have actually implemented it right. This is, obviously, not possible in general, but if the operator has trivial syntax (so not cond) and if you limit the arguments of the operator to booleans then you can enumerate all the possible arguments in the obvious way, and so long as it returns a result for all combinations of arguments (does not fail to halt in other words) and is deterministic then there are only two things you need to check:

does the operator produce the same result for all combinations of arguments (\(2^n\) possibilities for \(n\) arguments) as the existing one?
does the operator evaluate its arguments the same number of times as the existing one for all these combinations?

So, for instance, if takes three arguments (in Racket) and should evaluate the first exactly once, and the others at most once, as well as returning the correct value.

Obviously such a check is not a full check of the operator — it does not tell you what it does with non-boolean arguments for instance. But I was interested in writing the check largely because it’s clearly a reasonably hairy macro which I know how to write in CL and wanted to see if I could write in Racket (I’m not very likely to teach people Lisp again).

What the macro needs to do

The idea is that to compare two boolean operators o1 and o2 which take n arguments you need to generate code which looks like this:

(for/and ([c (expt 2 n)])
  (let ([a1 (bitwise-bit-set? c 0)] ...)
    (let ([o1c1 0] ...)
      (let ([o2c1 0] ...)
        (and (eq? (o1 (begin (set! o1c1 (+ o1c1 1)) a1) ...)
                  (o2 (begin (set! o2c1 (+ o2c1 1)) a1) ...))
             (= o1c1 o2c1) ...)))))

So a1 is the first argument, o1c1 counts how many times o1 evaluates it, and o2c1 counts how many times o2 evaluates it, and so on. I decided to compare the operators with eq? rather than eqv? for no very good reason except that it works for operators whose results are booleans, which is what I was interested in. I should almost certainly use eqv? I think — certainly the -equivalent in the name would imply that — but I’m not.

It’s clear that a loop like that checks all of the \(2^n\) possibilities for the arguments, where each argument can be either #f or #t only. So this does an exhaustive check of all the possibilities, and provided o1 and o2 are deterministic and halt on all their arguments it will tell you whether they are equivalent.

And finally, this must be written as a macro, because the operators it is testing are themselves not generally functions: in particular things like if and or are obviously themselves not functions.

Things I did not know how to do

The big thing I didn’t know how to do here was to make up new identifiers: all the counters need to be created, and possibly also the argument names. In CL you’d do this with make-symbol or gensym or something like that. Assuming I want to use syntax-case rather than writing a CL-style construct-the-form-with-backquote-and-use-datum->syntax macro (which I very much do want to do) then there are two problems:

constructing the names of the counters;
making them available as pattern variables.

Well, (2) is easy: you can use nested syntax-cases, or equivalently but much more prettily, with-syntax to bind the pattern variables. And it turns out that with-syntax is willing to do a lot of work on your behalf: if you give it something which is not a syntax object it will massage it into one for you. So, in particular, this works:

(with-syntax ([(o1c ...) (list ...)])
  ...)

It takes the list it is given, turns it into a syntax object (with datum->syntax I suppose) and then does the matching. So you can be really lazy here: all you need to invent is a list of identifier syntax objects, and with-syntax will do the rest, making the program a lot less noisy. This is a really neat feature, although it might lead you to get confused about what is, and what is not, a syntax object I suppose. Anyway, I used it ruthlessly.

So this leaves (1). You could obviously do this with something like (datum->syntax ctx (string->symbol (format ...))), but Racket provides a nice shorthand for that in the form of format-id: (format-id ctx "~a-count" v) will construct an identifier syntax object from v using ctx as lexical context. And it will do the appropriate magic if v is an identifier syntax object: extract the symbol from it and use it as the argument to format in the appropriate way.

So it looks pretty straightforward to construct lists of identifiers and bind them to pattern variables. The final thing that confuses me is what lexical context to use for the identifiers. The macro should be hygenic, which means they can’t have the context of the syntax object it is working on, but I think can have more-or-less any other context where they have no existing meaning: I just invented an object for them, which I think is safe, although I am a bit confused about this.

What users see

I spent a really long time stuck on what the syntax of the macro should be: this is entirely stupid because it just does not matter that much. The reason I got stuck is that it would matter if this was a real library and I am constitutionally incapable of writing things without worrying about that kind of thing. Eventually I decided that it would be best if the user provided the argument names as a list, because they generally make sense to users and because I didn’t want to get into something which looked as if you could pass it an integer when in fact what it needs is a literal integer. So I decided on a syntax like this:

(boolean-operators-equivalent? o1 o2 (a1 ...))

So, for instance:

(boolean-operators-equivalent? if my-if (test then else))

I still don’t really like this; but I’m just playing so, well, it will do.

Additional cleverness

I wanted to report syntax errors in a reasonable way: apparently the proper way to do this is using syntax-parse but I am not ready to understand that yet, so I used wrong-syntax and the current-syntax-context parameter to get reasonable-looking errors.

I thought it would be nice to be able to report failures of equivalence, so there is a parameter which controls that and the expansion of the macro includes a check for the parameter and prints the failed cases if it’s true. All this happens at run time (phase 0) of course.

The macro itself

So, finally, here it is.

(require (for-syntax (only-in racket/syntax format-id
                              current-syntax-context wrong-syntax)))

(define boe-report-failure? (make-parameter #f))

(define-syntax (boolean-operators-equivalent? stx)
  ;; Given the names of two boolean operators and a list of argument
  ;; names, expand to a form which tests that they are equivalent, by
  ;; evaluating the with arguments bound to all the combinations of #t
  ;; and #f, and also checking that they evaluate the same arguments
  ;; in each case.
  ;;
  (parameterize ([current-syntax-context stx])
    (syntax-case stx ()
      [(_ o1 o2 (v ...))
       (let* ([vars (syntax->list #'(v ...))]
              [nvars (length vars)])
         ;; This check could be a guard, but we need the bindings
         ;; anyway, so.
         (for ([var vars])
           (unless (identifier? var)
             (wrong-syntax var "not an identifier")))
         ;; vars is now a list of identifiers, and nvars is how many
         ;; there are.  We need to construct syntax for check
         ;; variables for each var and and operator, as well as
         ;; construct 2^n and a list of bit numbers.]  This is being
         ;; fairly fast and loose: it turns out that various things
         ;; get automagically converted into syntax objects, and I
         ;; have not cared about the context for numbers (what is
         ;; it?).  In general I am a bit confused about what the
         ;; context should be here, but it clearly should *not* be
         ;; stx.
         ;;
         (with-syntax ([(o1c ...) (for/list ([v vars])
                                    (format-id #'boe "~a-1-eval-count" v))]
                       [(o2c ...) (for/list ([v vars])
                                    (format-id #'boe "~a-2-eval-count" v))]
                       [2^n (expt 2 nvars)]
                       [(b ...) (for/list ([i nvars]) i)])
           ;; And now just write the pattern we want.  '...' is pretty
           ;; clever, it turns out
           #'(for/and ([c 2^n])
               (let ([v (bitwise-bit-set? c b)] ...)
                 (let ([o1c 0] ...)
                   (let ([o2c 0] ...)
                     (or (and (eq? (o1 (begin (set! o1c (+ o1c 1)) v) ...)
                                   (o2 (begin (set! o2c (+ o2c 1)) v) ...))
                              (= o1c o2c) ...)
                         (begin
                           (when (boe-report-failure?)
                             (eprintf "Not equivalent:~% ~a~% ~a~%"
                                      (list 'o1 `(,v ,o1c) ...)
                                      (list 'o2 `(,v ,o2c) ...)))
                           #f))))))))]
      [else
       (wrong-syntax #'else "expecting o1 o2 (a1 ...)")])))

To my astonishment, this worked pretty much first time (it did not initially have the wrong-syntax stuff, but this was easy compared to the rest of it):

> (define-syntax-rule (if/broken test then else)
    (or (and test then) else))
> (boe-report-failure? #t)
> (boolean-operators-equivalent? if if/broken (test then else))
Not equivalent:
 (if (#t 1) (#f 1) (#f 0))
 (if/broken (#t 1) (#f 1) (#f 1))
#f

The macro, complete with some tests and other infrastructure can be found here⁴.

Notes and queries

I still don’t know whether this is really idiomatic Racket, although I am reasonably happy that I understand what is going on. There are a couple of things I am not sure about:

is the context for the count variables right? I think it is, but I am not sure;
the macro relies heavily on Racket’s extremely smart behaviour with ... — I am still unclear just how smart this is and whether I am relying on things which are not actually specified to happen;
similarly it relies on with-syntax being willing to convert things to syntax objects for you, which I am not sure is safe.

However, even with these worries, I think it’s pretty clear that Racket macros are significantly nicer than CL macros, if also significantly more opaque.

I am going to use ‘Lisp’ to mean ‘Lisp-family’ from now on. This is not meant to denigrate Scheme — this post is about Racket, after all — I just need a term which is not too clumsy. ↩
Of course, programs in other languages often do end up reasoning about themselves: people end up writing little languages all the time. But you only have to look at most examples of this sort of thing to realise how far ahead Lisp is: I’m currently having to deal with a system whose configuration files are in a mutant version of Windows ini file syntax, with a preprocessor which is entirely unaware of that syntax, and an entire other language which lives in strings in the base language. The preprocessor does not know about the string syntax so it pokes down into this inner language as well. I’d like to say that Greenspun’s tenth law applies, but that would imply a level of sophistication entirely missing in this horrible thing: all I want to do is leave this job and never think about it again. ↩
Macros were often used to inline code in the days of primitive compilers of course, but that’s a long time ago now. ↩
I may move it somewhere more permanent in due course, so bookmark this at your peril. ↩