3.1270 scanning and correcting (109)

Willard McCarty (MCCARTY@vm.epas.utoronto.ca)
Thu, 5 Apr 90 22:55:54 EDT

Humanist Discussion Group, Vol. 3, No. 1270. Thursday, 5 Apr 1990.


(1) Date: Wed, 4 Apr 90 23:21:53 EDT (22 lines)
From: Robert Hollander <bobh@phoenix.Princeton.EDU>
Subject: Re: 3.1259 scanning and encoding texts (204)

(2) Date: Thu, 5 Apr 90 00:26:52 -0400 (20 lines)
From: amsler@flash.bellcore.com (Robert A Amsler)
Subject: Yet more re: encoding texts and costs

(3) Date: Thu, 5 Apr 90 01:18:00 EDT (17 lines)
From: Michel LENOBLE <LENOBLEM@CC.UMONTREAL.CA>
Subject: RE: 3.1259 scanning and encoding texts (204)

(4) Date: Wed, 04 Apr 90 13:04:01 EDT (19 lines)
From: Tzvee Zahavy <MAIC@UMINN1>
Subject: Scanning

(1) --------------------------------------------------------------------
Date: Wed, 4 Apr 90 23:21:53 EDT
From: Robert Hollander <bobh@phoenix.Princeton.EDU>
Subject: Re: 3.1259 scanning and encoding texts (204)

In reply to Michael S. Hart's two observations about the Dartmouth Dante
Project: (1) The database consists of the text of the _Commedia_ in
Italian and (eventually) sixty commentaries. Six of these are in
English, four of which are not yet edited. If he knows of someone who
would like to take on one of these, we'd be glad to have the assistance;
and I'll be glad to serve as tutor and look over that person's shoulder.
(2) Error correction by users _is_ a feature of our database. I still
find typos when I consult it. One must remember that out editing
consists not only of checking for accuracy (and our work has ranged from
one to sixty errors per page when we got it back from various sources of
data entry), but of formatting, regularizing line numbers and
conventions for bolding and italics. In short, the editorial task is not
trivial. To put these texts up for scholarly use in raw form would be an
unconscionable act. Likewise, not to have the capacity to let users
help us maintain the integrity of data would be foolish. Thus we follow
both methods and will continue to do so.

Robert Hollander
bobh@phoenix.princeton.edu
(2) --------------------------------------------------------------34----
Date: Thu, 5 Apr 90 00:26:52 -0400
From: amsler@flash.bellcore.com (Robert A Amsler)
Subject: Yet more re: encoding texts and costs

I think the whole issue of how much it should cost to get data
entered and even the discussion of what should be marked up in texts
is suffering from a lack of statement of premises and goals.

Goals come forward when one writes proposals to specific funding
sources. The first questions to be answered would then be `what
works?' and for `what audience?'

THEN, the issue of `what encoding would be adequate?' could be raised
and from that the issue of `how much will it cost?'

While there seems to be much well-reasoning argument in the current
debate, there is no way it can be resolved unless people are talking
about the same thing.


(3) --------------------------------------------------------------24----
Date: Thu, 5 Apr 90 01:18:00 EDT
From: Michel LENOBLE <LENOBLEM@CC.UMONTREAL.CA>
Subject: RE: 3.1259 scanning and encoding texts (204)


Answer to Elli Mylonas - accuracy.

An accuracy rate of 99% for scanning is far to low because,
to me it means ten to twenty errors a page (1000-2000 characters).
We are all trying to asymptotically reach the 100% accurary ideal but
but I think that we should only be satisfied with a 99.99% rate which
means 1 error per 10-20 page section.
Asymptotically yours
Michel Lenoble
Litterature Comparee
Universite de Montreal
E-mail: lenoblem@cc.umontreal.ca
(4) --------------------------------------------------------------23----
Date: Wed, 04 Apr 90 13:04:01 EDT
From: Tzvee Zahavy <MAIC@UMINN1>
Subject: Scanning



I deeply regret getting started with scanning. In order to have an
accessible copy of my first book on computer I submitted the printed
text for Kurzweil scanning at our University computer center. The
scanned text of 365 pages is such a mess that it is taking about twice
as much time for me to reformat it as it would have taken had I given
the book to a secretary to type it directly into the computer.

I suggest that only a closely monitored interactive scanning process
could approximate in effectiveness and speed the skills of an
accomplished typist working with a powerful wordprocessor.

That may change or I may be wrong. For now I vote against any major
investment in scanning the library of congress or even one book.