4.1072 Scanning Microfilm (1/75)

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Fri, 22 Feb 91 17:41:44 EST

Humanist Discussion Group, Vol. 4, No. 1072. Friday, 22 Feb 1991.

Date: Thu, 21 Feb 91 17:52 CST
From: Terrence Erdt <ERDTT@PUCAL>
Subject: Scanning microfilm (83 lines)

During past discussions of optical character recognition, several
Humanists asked about scanning microfilm. The medium is so ornery to
use, of course, that even many computer phobic scholars would prefer
machine readable copy.

If a microfilm were scanned and rendered into digital format, then
character recognition might be possible, as well as automatic indexing,
and other conveniences that computers offer.

Recently I learned of a company near Philadelphia, Pennsylvania (USA)
that had scanned microfilm copies of the _Pennsylvania Gazette_, a
newspaper that Benjamin Franklin published in the eighteenth century. I
visited the company's offices and found some rather impressive equipment.

The hardware consisted of a box about fifteen inches high and thirty-six
inches long, which held the reel of microfilm. At the operator's
command, the film ran by a lens, and an image appeared on an oversize
high resolution monitor attached to a 386 workstation. A smaller box,
called an image "optimizer," sat on top of the scanner. I was told that
the optimizer was a by-product of the technology used to enhance
satellite photos, and that the scanner was produced by a company known
for its bombsights, the irony of which might have given pause to Emerson
and Thoreau.

The image of the newspaper could be stored in any of several formats,
"tiff," "pcx," and so forth, either on magnetic or optical drives.
Operators entered keywords, subjects, and other access information into
a file that linked to the image database, so that segments of it could be
displayed as needed. The database contained, I was told, about 50,000
scanned images; the name index alone contained over a million items. I
have no idea what the charges are for such a remarkable feat.

The enhancer automatically adjusted the image derived from the
microfilm, eliminating many of the defects one typically encounters:
scratches on the film, smeared ink, print through. The operator could
override the "improvements" the device imposed and manually edit the
image using a "paint" program.

A representative of a company that sells the microfilm scanner and other
OCR products was present during the demonstration; he told me that he is
preparing to offer an optical character recognition service to
complement the image capture service. We agreed, however, that
Franklin's newspaper (whether or not on microfilm) likely would defy the
current state of the art of OCR; the broken typefaces and uneven lines
would outmatch today's software.

Short of rudely turning machines around, or upside down, and looking for
manufacturer's logos, I could not ascertain for certain the names of
each piece of equipment. The core equipment was a Mekel M400 Roll Film
Scanner. The representative of the company that sells the hardware told
me that a total system, including a 386 workstation, sells for about
$70,000 (US).

The Commission on Preservation and Access has urged that microfilm
continue to be, for the present, the primary medium for preservation of
books and other paper publications, until standards for computer
preservation systems become reasonably standardized. The Newsletter of
the Commission, which is free, frequently reports on the research in
preservation that the organization is funding. For further information
contact the Commission at CPA@GWUVM or:

The Commission on Preservation and Access
1785 Massachusetts Ave., NW Suite 313
Washington, DC 20036-2117 USA

ph.: (202) 483-7474

Humanists wanting the addresses and phone numbers of the service for
creating the image database or of the distributor for the hardware are
invited to write me directly.

Terrence Erdt e-mail: erdtt@pucal ph: (219) 989-2659

Purdue University Calumet
Hammond, IN USA