5.0478 Rs: Keyboarding Summary; Icelandic (2/66)

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Tue, 26 Nov 1991 16:12:43 EST

Humanist Discussion Group, Vol. 5, No. 0478. Tuesday, 26 Nov 1991.


(1) Date: 25 Nov 91 17:53:28 EST (50 lines)
From: Malcolm.Brown@Dartmouth.EDU
Subject: keyboarding

(2) Date: Mon, 25 Nov 1991 14:52 EST (16 lines)
From: <NEUMAN@GUVAX>
Subject: Icelandic

(1) --------------------------------------------------------------------
Date: 25 Nov 91 17:53:28 EST
From: Malcolm.Brown@Dartmouth.EDU
Subject: keyboarding

I have corresponded with folks both in Europe
and the US. It seems that most roads do in fact
lead to the Input Center. I know that they have
done work for Elli Mylonas at Harvard, keyboarding
Greek. Perhaps she can comment on whether the
Input Center truly delivers as advertised.

The price quoted to me was $1500 per megabyte
at the 99.95% accuracy level. Ted Brunner told
me that such a price is competitive. The question is
whether this represents an advantage over OCR.

If the source text is anything but pristine regarding
the quality of its print, then OCR, in my experience
is no longer an option. For anything but the very
best texts, accuracy quickly plummets.

If my math is correct, at 99.95% accuracy, this means
500 errors per megabyte. At 99% accuracy -- and most
OCR runs don't scan at this level of accuracy for
typeset texts -- it becomes 10 thousand errors per
megabyte. And at 95% accuracy, it becomes a staggering
50 thousand errors per megabyte.

So the question is which is more cost effective:
to pay for keyboarding and its cleanup or to pay for
OCR operators and its cleanup. While $1500 buys a fair
amount of student labor, the expertise needed
to clean up a text often requires something more
that undergraduate-level assistance.

I recall Mark Olsen, at the 1989 Toronto conference,
presented a paper that effectively argued against
OCR. I am doubtful that OCR performance has
improved much since then, but it may.

The key factor, it seems, is the source material. If you're
using new and/or well printed editions, then OCR may
be viable. Otherwise one should think carefully.

My own approach is: if there isn't a company like InteLex
selling the etext at a reasonable price, I'm going to try to
do fund raising in order to purchase as much keyboarding
as I can.


(2) --------------------------------------------------------------21----
Date: Mon, 25 Nov 1991 14:52 EST
From: <NEUMAN@GUVAX>
Subject: Icelandic

Diane Olsen recently asked about instructional materials in Icelandic.
I'm told that various Icelandic texts -- including Laxness, the Poetic
Edda, and the Bible -- are available in electronic form from the following
source:
Dr. Baldur Jonsson
Facultas Philosophica
Haskoli Islands
Sudurgata
Reykjavik, Iceland

Mike Neuman
Georgetown Center for Text and Technology