17.631 OCR and ancient Greek

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty@kcl.ac.uk)
Date: Wed Feb 11 2004 - 03:19:38 EST

<x-flowed>
               Humanist Discussion Group, Vol. 17, No. 631.
       Centre for Computing in the Humanities, King's College London
                   www.kcl.ac.uk/humanities/cch/humanist/
                        www.princeton.edu/humanist/
                     Submit to: humanist@princeton.edu

[1] From: Patrick Durusau <patrick.durusau@sbl-site.org> (33)
Subject: OCR and Ancient Greek

   [2] From: "Dominique GONNET" <dominique.gonnet@mom.fr> (141)
         Subject: RE: 17.625 OCR for Greek? sources on the paradigm
                 shift?

--[1]------------------------------------------------------------------
         Date: Wed, 11 Feb 2004 08:12:12 +0000
         From: Patrick Durusau <patrick.durusau@sbl-site.org>
         Subject: OCR and Ancient Greek

Willard,

Willard McCarthy wrote:

>Following a query from someone faced with scanning a mixture of Italian and
>ancient Greek. She mentions software packages I have not heard of, which I
>take to be a sign of some progress at the technical end of things, or at
>least more variety of choice. If you have advice for Ms Gritti, please send
>it directly to her as well as to Humanist. It is time we revisited this
topic!

On the issue of text recognition, "Telling Humans and Computers Apart
Automatically: How Lazy Cryptographers Do AI" which appeared in
Communications of the ACM, Volume 47, Number 2, pp. 57-60, by Luis von Ahn,
Manuel Blum, and John Langford, should be of interest to Humanist readers.

Images that contain deliberately distorted text, CAPCHAs (Completely
Automated Public Turing Test to Tell Computers and Humans Apart), that can
be read by a human reader but not easily by computer are described with
examples. Solving the problem of recognizing distorted text is directly
relevant to OCR of damged or poorly reproduced texts.

One effort at a solution cited by the authors is: "Recognizing objects in
adversarial clutter: Breaking a Visual CAPTCHA." Mori, G. and Malik, J.,
Proceedings of the Conference on Computer Vision and Pattern Recognition,
2003. (Also available at: www.cs.berkeley.edu/~mori/research/
papers/mori_cvpr03.pdf.

Like the phrase "adversarial clutter." Certainly describes how I feel about
the stray marks and distortions that seem to trouble OCR programs.

Hope you are having a great day!

Patrick

Patrick Durusau Director of Research and Development Society of Biblical Literature Patrick.Durusau@sbl-site.org Chair, V1 - Text Processing: Office and Publishing Systems Interface Co-Editor, ISO 13250, Topic Maps -- Reference Model

Topic Maps: Human, not artificial, intelligence at work!

--[2]------------------------------------------------------------------ Date: Wed, 11 Feb 2004 08:11:21 +0000 From: "Dominique GONNET" <dominique.gonnet@mom.fr> Subject: RE: 17.625 OCR for Greek? sources on the paradigm shift?

Merci à Elena Gritti pour ses indications. C'est une invitation à aller voir du côté de "FineReader 7.0, and the last releases of Omnipage Pro". Quant à Anagnostis 4.0, nous avons réussi à utiliser la version test avec des résultats intéressants à Sources Chrétiennes, alors qu'à la MOM, je ne sais pourquoi, ça n'avait pas marché. Nous espérons qu'Anagnostis ou un autre meilleur OCR de grec ancien pourra être acquis par la MOM. Amitiés.

Dominique Gonnet - 06 15 11 12 36 dominique.gonnet@mom.fr CNRS UMR 5189 « Histoire et Sources des Mondes antiques » Institut des Sources Chrétiennes 29 Rue du Plat - 69002 Lyon Tél. 04 72 77 73 53 - Fax 04 78 92 90 11

-----Message d'origine----- De : Marjorie Burghart [mailto:marjorie.burghart@online.fr] Envoyé : mar. 10 février 2004 10:54 À : Pierre.Philippe@mom.fr; Laurence Darmezin; dominique.gonnet@mom.fr Objet : Fw: 17.625 OCR for Greek? sources on the paradigm shift? </x-flowed>

This archive was generated by hypermail 2b30 : Fri Mar 26 2004 - 11:19:41 EST