:: computer, photography

The UK keeps its laws on vellum: this seems to be a ludicrously archaic thing to do: is it?

Don’t preserve physical artifacts: preserve information

People who deal with archives are used to dealing with physical objects and worrying about their longevity. So they worry about how long paper vellum last, what their decay mechanisms are and how they can be minimised. Everything is kept in controlled conditions so that the physical objects last as long as they can. Thus it is tempting to think that preserving information is the same thing as preserving the physical objects in which it resides: to preserve digital information you must preserve the media — tape, disks and so on — on which it resides. But we know that these media have rather short lifetimes — perhaps a few tens of years at the outside — and even when the media survive, there may be no way of reading them since the infrastructure on which they relied has gone.

This is, of course, confused: to preserve information you do not need to preserve the media on which it resides for any length of time. Since digital information can be copied without loss (or with a very low chance of loss), what you do instead is repeatedly copy the information onto current media. Preserving information is not the same as preserving physical artifacts: rather than a sacred disk rotting in a vault you keep the data spinning all the time on many copies of current media. I have files which originated on Fujitsu Eagles: I doubt there are very many Eagles still spinning or machines which can use them, but the information isn’t in any danger of being lost.

Don’t preserve information: preserve physical artifacts

Everything above is wrong, because it makes a critical assumption which is not true.

You can always keep information on current media.

This is true only if you are continually working on the system: in order to keep information spinning you need to be willing to buy new systems, transfer the information to the new systems, and keep the power on. But there is no evidence that we can keep the power on for any length of time, and plenty of evidence that we can’t.

This isn’t just dealing with a possible collapse of advanced civilisation, although archivists should worry about that: it’s happened before, and there is no reason to believe it won’t happen again. If we go through a period of several hundred years where our society retreats to some preindustrial (or just pre–1970) level, how much of our digitally-stored information will survive? My guess is that almost none will. And such a collapse is likely.

But much less than that is needed for information to be lost. Consider some large scientific data set — climate data for instance. What happens if political power gets into the hands of people for whom that data is inconvenient, and who remove funding from the organisations which look after that data? It may persist for a while, on ageing disk arrays and tapes, until enough of the redundancy goes away; it may persist for a while even after the power is removed from the systems which hold it. But it will not persist when the rent isn’t paid on the buildings in which those systems live. Within quite a short time that information will be irretrievably lost.

The archivists turn out to be right: if you want to preserve information it needs to live on media which remain readable for long periods of time with minimal requirements. In particular there must be no requirement for frequent replacement of hardware, on human intervention, or power. Choosing a medium, samples of which which have already survived for long periods is a good idea as well. Vellum is not such a bad choice if you only need to preserve a small amount of information. Large scientific data sets present a different problem, but ‘just keep the data spinning’ is probably not a very good solution.