5.0207 Archiving Etexts (1/72)

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Tue, 2 Jul 91 15:26:14 EDT

Humanist Discussion Group, Vol. 5, No. 0207. Tuesday, 2 Jul 1991.

Date: Mon, 01 Jul 91 18:02:20 EDT
From: Richard Ristow <AP430001@BROWNVM>
Subject: Re: More prima facie objections to ejournals

In Humanist 5.0178, Stevan Harnad <harnad@Princeton.EDU>
wrote on Thu, 20 Jun 91 19:58:23 EDT,
( ... )
>> 4. Librarians worry a great deal about archiving and permanence. Who
>> can say, they say, that the files of PMC or PSYCOLOQUY will remain
>> available for a great deal of time, if the editors decide to fold up
>> shop. Who will undertake to make sure this stuff exists? Further,
>> who will ensure that it exists in a readable format? The world is
>> full of examples of "high-tech" media of 10 and 20 years ago which
>> are unreadable, because the equipment which was used at the time the
>> information was created, is no longer maintained. This suggests a
>> systematic re-copying or updating program. NASA, for example, copies
>> all its satellite data rotationally every 3 years or so onto new and
>> compressed storage media (state of the art) so that the data can be
>> retrieved. This is a federally budgeted, planned, and highly
>> expensive process.
> Backups, backups, backups. Trivial.

OUCH! I agree that this is a situational rather than a *prima facie*
objection, but it is NOT a trivial one. Is Harnad falling into the
old trap of equating "there are no conceptual problems" with "trivial"?
Perhaps some librarians would like to post about how well their
collections would be preserved if preservation required inserting each
book, once every ten years, into a machine costing a few thousand
dollars and taking 5 to 10 minutes to 'preserve' each book. Now,
what if each such operation required also supplying enough paper for
a new copy of the book; and the 'copy' came out bound, but with the
cover and spine of the book blank? What if the machine itself had to
be replaced every ten years by a new model; deciding *WHICH* new model
took a review by technical specialists; and the old model had to be
retained for ten more years, because preservation required inserting
each book into the model machine in which it had last been 'preserved',
but the 'copy' came out of the new model? Finally (here's the real
trap), what if putting off 'preserving' a book caused no harm except
a chance of, say, 0.5% per month that all pages in the book would
become completely blank?

A ten-year lifetime is considered long for machine-readable media; a
100-year life is considered minimal for archival preservation. It can
can become impossible to find a machine that will read an old medium
(has anybody seen a 7-track half-inch magnetic tape drive recently?).
We should not give up on e-journals because archiving is a problem,
but if we wave off archiving with words like 'trivial', scholars in
fifty years will be cursing the day our networks went on line.

I'm open to correction, but I suggest that the best present machine-
readable archive is a clean printout on acid-free paper stored using
standard preservation techniques. Preservation of paper is far better
understood than preservation of the machine-readable media will be for
quite a while yet; and since print will continue to be used for human-
readable information, scanners that can read it are not likely to
become obsolete soon. Tagged text is a better archival medium than is
formatted text, because structure is preserved; and all text, including
tags, must be restricted to an agreed-upon displayable archival character
set. (By the way, the machine-internal representation of this set --
ASCII, EBCDIC, Unicode, ISO10646, or whatever -- does NOT need to be
agreed upon or fixed.) Details of the printed representation must be
established thoughtfully; for example, if it is desired to distinguish
between an umlaut and a diaresis on the letter 'o', both must be
represented by entity references.

Preservation is easy to do badly, because the penalty for bad work comes
long after the crucial omission or mistake. As the primary representation
of more text, and more important text, becomes computer-based, archiving
becomes correspondingly more important. Habits based on print (where
text is embodied in physical objects) will not be sufficient. Scorning
the necessary careful work as 'trivial' or otherwise guarantees a bad

Richard Ristow AP430001@BROWNVM.BROWN.EDU Bitnet: AP430001@BROWNVM