17.625 OCR for Greek? sources on the paradigm shift?

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty@kcl.ac.uk)
Date: Tue Feb 10 2004 - 03:58:30 EST

               Humanist Discussion Group, Vol. 17, No. 625.
       Centre for Computing in the Humanities, King's College London
                     Submit to: humanist@princeton.edu

   [1] From: Willard McCarty <willard.mccarty@kcl.ac.uk> (35)
         Subject: Query about OCR for Ancient Greek

   [2] From: n.thieberger@LINGUISTICS.UNIMELB.EDU.AU (32)
         Subject: theoretical shift

         Date: Tue, 10 Feb 2004 08:35:32 +0000
         From: Willard McCarty <willard.mccarty@kcl.ac.uk>
         Subject: Query about OCR for Ancient Greek

Dear colleagues:

Following a query from someone faced with scanning a mixture of Italian and
ancient Greek. She mentions software packages I have not heard of, which I
take to be a sign of some progress at the technical end of things, or at
least more variety of choice. If you have advice for Ms Gritti, please send
it directly to her as well as to Humanist. It is time we revisited this topic!


>From: "Gritti Elena" <e.gritti@libero.it>
>>Date: Sun, 8 Feb 2004 23:53:49 +0100
>Dear Prof. McCarty,
>my name is Elena Gritti and on January I submitted a Ph.D. dissertation
>about Exegesis and dialectic in Proclus' philosophical thought and works
>to the Faculty of Philosophy at the State University of Milan, Italy.
>Since I know the Humanist Discussion Group, if you don't mind I would like
>to ask you a technical advice. Recently I joined a project which involves
>the necessity of scanning pages that are written in italian mixed with
>ancient greek; there are also greek fragments, often corrupted, and
>critical apparatus. I've found that exist some OCR software such as
>FineReader 7.0, Anagnostis 4.0 and the last releases of Omnipage Pro.
>What is, in your opinion, the best and most effective way to solve the
>I would be very grateful to you for your advice.
>Yours sincerely,
>Elena Gritti

Dr Willard McCarty | Senior Lecturer | Centre for Computing in the
Humanities | King's College London | Strand | London WC2R 2LS || +44 (0)20
7848-2784 fax: -2980 || willard.mccarty@kcl.ac.uk

         Date: Tue, 10 Feb 2004 08:46:56 +0000
         From: n.thieberger@LINGUISTICS.UNIMELB.EDU.AU
         Subject: theoretical shift

I am a field linguist and my recently submitted PhD dissertation was a
grammar of a previously undescribed language which included an audio corpus
of some 18 hours of linked text and media. Most of the 840-odd example
sentences are playable, as are seven example texts. This work arises out of
the discussion on language documentation that has been receiving some
coverage in the past few years.

Reflecting on the process of presenting this data I consider that there is
a distinct shift in the authority of a grammar written with citable audio
sources compared to a grammar in which sentences are provided with no
source. When a corpus is citable it can be used as the basis for any claims
made about the grammar of the language. The data is given in a form that
can be accessed by others and so can be used to test claims and to provide
additional analysis that may no have been considered in the analytical work.

This may appear to be a common scientific method, but it is not one that
has been followed by many linguists.

I am writing to this list to ask about sources on the (paradigm?) shift
associated with new technological advances, especially the presentation of
primary data for testing claims, and the shift in authority towards the
data rather than the analyst.


Nick Thieberger


Project Manager PARADISEC Pacific And Regional Archive for DIgital Sources in Endangered Cultures http://www.paradisec.org.au

nickt@paradisec.org.au Department of Linguistics and Applied Linguistics University of Melbourne Vic 3010 Australia

Ph 61 (0)3 8344 5185 </x-flowed>

This archive was generated by hypermail 2b30 : Fri Mar 26 2004 - 11:19:41 EST