[tei-council] Fwd: [tei-board] Report from Google engineer about progress with TEI

Julianne Nyhan julianne.nyhan at gmail.com
Fri Aug 19 04:17:25 EDT 2011


Sorry if I'm being a bit daft but can you please tell me where can I view
the files referred to the
message?
Thanks,
Julianne

On Fri, Aug 19, 2011 at 8:18 AM, Laurent Romary <laurent.romary at inria.fr>wrote:

> Council. See the message below which is a follow up on some technical
> feedback from Google that we already discussed. Please provide your views on
> this and possibly volunterr if you want to be the council contact for this
> collaboration.
> Laurent
>
>
> Début du message réexpédié :
>
> > De : Martin Mueller <martinmueller at northwestern.edu>
> > Date : 11 août 2011 04:04:59 HAEC
> > À : "tei-board at lists.village.Virginia.EDU" <
> tei-board at lists.village.Virginia.EDU>
> > Objet : [tei-board] Report from Google engineer about progress with TEI
> > Répondre à : tei-board at lists.village.Virginia.EDU
> >
> >
> >
> > From: Ranjith Unnikrishnan <ranjith at google.com>
> > Date: Wed, 10 Aug 2011 18:54:20 -0700
> > To: <google-library-quality at googlegroups.com>, Jeff Breidenbach <
> jbreiden at google.com>, Martin Mueller <martinmueller at northwestern.edu>
> > Subject: TEI samples and open questions
> >
> > Hello everyone,
> >
> > To follow up on our discussion yesterday, I've attached the following
> generated sample TEI files for your feedback. They are loosely in order of
> decreasing OCR text quality. The variation comes from a number of factors
> like image quality, complexity of the book structure, as well as the recency
> and extent of processing. But I'd like to draw your attention to the
> generated format rather than the text quality at this stage as there are
> possibilities for exporting our estimates of text quality that we can
> discuss separately.
> >
> > dickens.tei  (Google books ID i8_u_-YmG4MC)
> > gullivers_travels.tei (Google books ID srVbAAAAQAAJ)
> > shamela_andrews.tei (Google books ID zNsNAAAAQAAJ)
> > scandal.tei (Google books ID i3lbAAAAQAAJ)
> > dunciad.tei (Google books ID gA8UAAAAQAAJ)
> >
> > The files were validated using the latest candidate release RNC schema
> files that follow the TEI best practices guide for libraries at the "Level
> 3" encoding. Our intention is to supply generated TEI files for our
> processed volumes via GRIN or some other interface so that you can then
> disseminate them as you wish to interested humanities scholars. The TEI
> users and members of the TEI standards body that we've been corresponding
> with over the past months seem pleased with the samples they've seen, and
> from the quality of generated output feel they would make a decent starting
> point for further manual annotation and enrichment.
> >
> > I'd like to get your feedback on:
> > (i) whether and how to restrict the set of volumes for which we generate
> TEI files. eg. restriction by language, a quality threshold over the
> document using something like Ashok's text scorer, only public domain books
> etc. Or maybe this should be library specific?
> > (ii) whether to use GRIN as the interface to provide these files, and
> > (iii) whether and how to make an entry in the METS xml file for the
> generated TEI file to accompany the GRIN package, and what other conventions
> (eg. file naming) should be followed for that.
> >
> > Thanks,
> > Ranjith
> >
>
>
>
>
>
> > _______________________________________________
> > tei-board mailing list
> > tei-board at lists.village.Virginia.EDU
> > http://lists.village.Virginia.EDU/mailman/listinfo/tei-board
>
> Laurent Romary
> INRIA & HUB-IDSL
> laurent.romary at inria.fr
>
>
>
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>
> PLEASE NOTE: postings to this list are publicly archived
>



-- 
Dr Julianne Nyhan,
(UCL & Universitaet Trier)

*Direct Line:* +44 (0)20 7679 7206)
*Fax:*  +44 (0)20 7383 0557)
*Office:* G15a, Department of Information Studies, Foster Court, University
College London, WC1E 6BT, U.K.

http://www.ucl.ac.uk/infostudies/julianne-nyhan/
http://germazope.uni-trier.de/Projects/KoZe2/
http://epu.ucc.ie/theses/jnyhan/
http://maney.co.uk/index.php/journals/isr/


More information about the tei-council mailing list