3.121 Kurzweil 5100 scanner (70)

Willard McCarty (MCCARTY@VM.EPAS.UTORONTO.CA)
Wed, 14 Jun 89 21:20:10 EDT


Humanist Discussion Group, Vol. 3, No. 121. Wednesday, 14 Jun 1989.

Date: Wed, 14 Jun 89 12:53 EST
From: <ERDT@VUVAXCOM> (Terrence Erdt)
Subject: Scanning, OCR, New Products

More on the Kurzweil 5100, and
the Emerging Scenario for
Document Representation
and OCR Conversion

The new model 5100 should be arriving at U.S. dealers in
two to three weeks. It will cost $17,950, and as reported
previously on Humanist, it will offer the Kurzweil "Verifier,"
which allows for trainability. Unlike the model 5000, the new
model will also have the capacity to read image files.

As I noted in the panel discussion on OCR and scanning at
the Dynamic Text conference, the power to read "tif" and
other image file formats such as "pcx" constitutes a
significant difference between earlier OCR efforts with Kurzweil
machines such as the 4000, and newer approaches that
incorporate the peripherals associated with desktop
publishing. Earlier efforts resulted in a final, or
"finished," product, a machine readable, often frustratingly
inaccurate version of a book; now, by applying an OCR
application to an image file, it is possible to
scan the book and in time apply newer and presumably
improved OCR programs to the same file, producing better
and more accurate machine readable versions. The 5100, like
the Calera Truescan, will support this approach.

One of the programs that I mentioned at the session on
OCR was Ibase's "Irecognize," essentially an enhanced editor
for Truescan, that allows for comparing the bitmapped image
of the original scanned document with the version resulting
from OCR. It seems to me that such a program may signify
the direction that OCR and scanning documents will take. As
the capacity for inexpensive data storage increases, as
through optical disks, and as data compression techniques
improve, it is probable, it seems to me, that we will be
able to compare a very detailed image of original
documents with the machine readable counterpart. Currently,
there is already a sizeable industry devoted to scanning
(without OCR) documents for companies wanting "paperless"
offices; they generally link image files to conventional
database programs that contain index terms and so forth; the
next step, of linking the machine readable version of a
document to the image of the original document, appears
not to be, technically, a great one--given sufficient storage.
Given such a scenario, the tedious and herculean efforts
now being planned by the Text Encoding Initiative may be misplaced
or missdirected.

With that shot fired, let me just add, returning to
the ever changing area of product information, that Kurzweil
has just introduced Accutext, a program intended for
Macintoshes with 5 mgs. of memory. I hope soon to have
more information on the 5100, Accutext, and
another new product, Innovative Software from Inovatic,
which I shall bring to the attention of Humanists.

--Terry Erdt