3.1191 super scanning (77)

Willard McCarty (MCCARTY@vm.epas.utoronto.ca)
Tue, 20 Mar 90 19:23:20 EST

Humanist Discussion Group, Vol. 3, No. 1191. Tuesday, 20 Mar 1990.


(1) Date: Monday, 19 March 1990 2212-EST (23 lines)
From: KRAFT@PENNDRLS
Subject: Super Scanning

(2) Date: Mon, 19 Mar 90 23:54:43 EST (10 lines)
From: Robert Hollander <bobh@phoenix.Princeton.EDU>
Subject: Re: 3.1187 paper vs. e-documents (74)

(3) Date: Tue, 20 Mar 90 03:56:10 EST (19 lines)
From: David.A.Bantz@mac.dartmouth.edu
Subject: Re: 3.1187 paper vs. e-documents (74)

(1) --------------------------------------------------------------------
Date: Monday, 19 March 1990 2212-EST
From: KRAFT@PENNDRLS
Subject: Super Scanning

(1) If Steve DeRose has information about a reliable
scanner that does 10 pages per minute, I'd like to know
about it. I would have said something more like one page
per minute, especially if pages need turning. And what about
verification? Cummon Steve, that budget is awfully low, unless
you have a similar budget to xerox everything (and resolve
problems of multiple columns, footnotes, etc.) in advance
so that a fantastically swift sheet feeder can be used!

(2) On turning microform into electronic materials, I have
not seen anyone yet who offers appropriately detailed
information on getting from the GRAPHICS form (like a photo)
that is scanned to usable CHARACTER RECOGNITION form (text). Is
that an obstacle? What I want is text files of the microform
data, not graphics files. Can current OCR software be run
successfully on the graphics representation of microform?
Theoretically yes, but has anyone actually tried it?

Bob Kraft (CCAT)
(2) --------------------------------------------------------------20----
Date: Mon, 19 Mar 90 23:54:43 EST
From: Robert Hollander <bobh@phoenix.Princeton.EDU>
Subject: Re: 3.1187 paper vs. e-documents (74)

Let's assume Mr. DeRose's colored glasses are not totally opaque and
the entire Library of Congress can be made m-readable for a mere $41M;
let's now ask Mr. DeRose who's going to check the text after his 200
monkeys and 1000 machines turn the Library of Congress into the British
Museum _sub specie simiarum_? Has he ever seen scanned text? Has he
ever seen scanned text that is not supervised closely? One shudders.
(3) --------------------------------------------------------------29----
Date: Tue, 20 Mar 90 03:56:10 EST
From: David.A.Bantz@mac.dartmouth.edu
Subject: Re: 3.1187 paper vs. e-documents (74)

Fascinating guestimate. Even at a more reasonable 1 page/minute ~ $500M or
about the cost of a single high tech military plane.

--- Steven J. DeRose (IR400011@BROWNVM) wrote:
Even scanning "everything" would be surprisingly inexpensive.
The US Library of Congress, as a start, holds about 20 Terabytes.
At reasonable OCR rates of 10 pages/minute, 80 hours/week, allow
5,000 machine years. If we get machines that can flip pages,
one human can probably manage at least 10 machines at a time.
Thus, 1000 person-years (two shifts). Thus to finish in only 5 years:
1000 scanners at $10K $ 10M
200 operators at $25K/year $ 25M
40K optical disks, 500 drives 6M
$ 41M total.
--- end of quoted material ---