Two understandable deficiencies in Common Lisp

2022-03-22 :: lisp, programming

Common Lisp is, I think, a remarkably pleasant language, despite what some people like to say. Here are two small deficiencies, both of which are understandable in terms of the history of CL, and both of which ultimately hurt naïve programmers working in CL.

The default floating-point type is `single-float`

There are two things that make this true:

*read-default-float-format* is initially single-float, which means that, unless it is changed, 1.0 reads as 1.0f0, a single float¹;
The float function will convert to a single float unless it is given a prototype which is not a single float: (float 1) is 1.0f0, while to get a double float you would need (float 1 1.0d0).

In addition things like with-standard-io-syntax bind *read-default-float-format* to single-float, so you have to do a little more work to make doubles the default.

I think there are probably several historical reasons why this default was chosen:

a long time ago memory was very expensive and single floats take, usually, half the memory of double floats, thus pushing people towards single floats;
a long time ago, perhaps, on some machines, single float operations were significantly faster than double float operations even before possible float consing was taken into account;
Lisp hardware companies with significant influence on the standard, notably Symbolics, made hardware which allowed single (32 bit) floats to be immediate objects, while double floats were not, and had simple-minded compilers which were not capable of optimizing double float operations, thus making double float arithmetic extremely slow compared to single float arithmetic, and these companies wanted their machines to seem fast (they never, really, were) for naïve users;
it was not clear that implementations would choose single-float to mean ‘single precision IEEE 754 float’ and double-float to mean ‘double precision IEEE 754 float’, for instance it’s perfectly legal to have the short-float type mean single precision IEEE 754 and all of the single-float, double-float and long-float types mean double precision IEEE 754;
it wasn’t even even clear that IEEE 754 would come to dominate how machines implement floating-point: VAXes didn’t, and other machines of interest at the time also did not.

So there are good historical reasons for this. However all implementations I’m aware of now translate short-float to mean single-float, single-float to mean IEEE 754 single precision, double-float to mean IEEE 754 double precision and long-float to be the same as double-float.

So what is the problem with the default float type being single-float in the modern world? The answer is

> (log (/ 1 single-float-epsilon) 10)
7.22472

In other words, single precision IEEE 754 arithmetic has about 7 significant figures of precision. For many purposes, and especially for naïvely-written code that’s at best marginal and at worst less than that. On the other hand

> (log (/ 1 double-float-epsilon) 10)
15.954589770191001D0

which is almost 16 significant figures of precision, more than twice that of single precision.

That’s why the default should have been double precision: it makes naïve code more likely to work, and people who are writing non-naïve code can use single precision if they need it.

The `CL-USER` package is defined in an implementation-dependent way

From the spec:

The COMMON-LISP-USER package is the current package when a Common Lisp system starts up. This package uses the COMMON-LISP package. The COMMON-LISP-USER package has the nickname CL-USER. The COMMON-LISP-USER package can have additional symbols interned within it; it can use other implementation-defined packages.

(My emphasis.)

What this means is that when you start a CL environment, the current package may have all sorts of implementation-dependent symbols visible in it. You can see why this happened: if you’re implementing Super-Whizz-Bang CL which has all sorts of magic extra features, you want at least some of those features to be immediately available to users, rather than requiring them to pore over boring manuals to find them.

But for users, and especially for naïve users, it’s a terrible choice: naïve users don’t know about packages so they write their programs in CL-USER. And they also don’t really know which symbols available in CL-USER come from CL and are thus standard parts of the language, and which come from one of Super-Whizz-Bang CL’s implementation packages, and are not standard parts of the language. So their programs turn into a mess where the portable parts are not distinct from the non-portable parts. The way the CL-USER package is defined thus makes it harder for to write programs whose non-portable parts are well-isolated, and ultimately hurts the language.

This is a direct conflict between implementors and users: implementors both want their extra features immediately available so their implementation is shinier and want to encourage users to use these extra features in a way which makes it hard to move their programs to other implementations; users, when they think about it, generally don’t want this second thing, at least.

Instead, the language should have defined CL-USER as a package which only used CL, and perhaps have defined another standard package, perhaps IMPL-USER, which was defined the way CL-USER is today.

Can these be fixed?

While both of these problems could be fixed without changing the standard, I don’t think either can realistically be fixed.

For the single-float problem there is nothing to stop implementations simply defining short-float to mean IEEE 754 single precision and all the other types to mean IEEE 754 double precision. But all the existing code which assumes otherwise will then probably break in exciting ways. So this is unlikely to happen I expect.

The CL-USER problem could be fixed if implementations agree to define CL-USER to use only CL as it is allowed to do, and perhaps to define an IMPL-USER package as above. Of course that will make implementations slightly less convenient to use, so the chances of it happening would be small, even if implementors actually talked to each other in any useful way which I suspect they no longer do. Worse than that, this change will break many programs written by naïve users which live in CL-USER, and there are almost certainly lots of those.

A moment of convenience, a lifetime of regret, as the old saying goes.

An earlier version of this article had single floats written as, for instance 1.0s0: that’s wrong, those are short floats, single floats are 1.0f0 for instance. These are almost certainly the same type on any current implementation (and I think on any implementation I have ever used, hence the mistake) but they don’t have to be. Thanks to Prem Nirved for finding this stupidity. ↩

The default floating-point type is single-float

The CL-USER package is defined in an implementation-dependent way

Can these be fixed?

The default floating-point type is `single-float`

The `CL-USER` package is defined in an implementation-dependent way