Willard McCarthy wrote:

>Following a query from someone faced with scanning a mixture of Italian and
>ancient Greek. She mentions software packages I have not heard of, which I
>take to be a sign of some progress at the technical end of things, or at
>least more variety of choice. If you have advice for Ms Gritti, please send
>it directly to her as well as to Humanist. It is time we revisited this

On the issue of text recognition, "Telling Humans and Computers Apart
Automatically: How Lazy Cryptographers Do AI" which appeared in
Communications of the ACM, Volume 47, Number 2, pp. 57-60, by Luis von Ahn,
Manuel Blum, and John Langford, should be of interest to Humanist readers.

Images that contain deliberately distorted text, CAPCHAs (Completely
Automated Public Turing Test to Tell Computers and Humans Apart), that can
be read by a human reader but not easily by computer are described with
examples. Solving the problem of recognizing distorted text is directly
relevant to OCR of damged or poorly reproduced texts.

One effort at a solution cited by the authors is: "Recognizing objects in
adversarial clutter: Breaking a Visual CAPTCHA." Mori, G. and Malik, J.,
Proceedings of the Conference on Computer Vision and Pattern Recognition,
2003. (Also available at: www.cs.berkeley.edu/~mori/research/

Like the phrase "adversarial clutter." Certainly describes how I feel about
the stray marks and distortions that seem to trouble OCR programs.

Hope you are having a great day!



Merci à Elena Gritti pour ses indications. C'est une invitation à aller voir du côté de "FineReader 7.0, and the last releases of Omnipage Pro". Quant à Anagnostis 4.0, nous avons réussi à utiliser la version test avec des résultats intéressants à Sources Chrétiennes, alors qu'à la MOM, je ne sais pourquoi, ça n'avait pas marché. Nous espérons qu'Anagnostis ou un autre meilleur OCR de grec ancien pourra être acquis par la MOM. Amitiés.

