[tei-council] Fwd: [tei-board] Report from Google engineer about progress with TEI

Julianne Nyhan julianne.nyhan at gmail.com
Fri Aug 19 04:40:22 EDT 2011


Super, thanks!
Julianne

On Fri, Aug 19, 2011 at 9:37 AM, Laurent Romary <laurent.romary at inria.fr>wrote:

> You're right the list did not allow them through. I hereby try a zip.
> Laurent
>
>
> Le 19 août 2011 à 10:17, Julianne Nyhan a écrit :
>
> Sorry if I'm being a bit daft but can you please tell me where can I view
> the files referred to the
> message?
> Thanks,
> Julianne
>
> On Fri, Aug 19, 2011 at 8:18 AM, Laurent Romary <laurent.romary at inria.fr>wrote:
>
>> Council. See the message below which is a follow up on some technical
>> feedback from Google that we already discussed. Please provide your views on
>> this and possibly volunterr if you want to be the council contact for this
>> collaboration.
>> Laurent
>>
>>
>> Début du message réexpédié :
>>
>> > De : Martin Mueller <martinmueller at northwestern.edu>
>> > Date : 11 août 2011 04:04:59 HAEC
>> > À : "tei-board at lists.village.Virginia.EDU" <
>> tei-board at lists.village.Virginia.EDU>
>> > Objet : [tei-board] Report from Google engineer about progress with TEI
>> > Répondre à : tei-board at lists.village.Virginia.EDU
>> >
>> >
>> >
>> > From: Ranjith Unnikrishnan <ranjith at google.com>
>> > Date: Wed, 10 Aug 2011 18:54:20 -0700
>> > To: <google-library-quality at googlegroups.com>, Jeff Breidenbach <
>> jbreiden at google.com>, Martin Mueller <martinmueller at northwestern.edu>
>> > Subject: TEI samples and open questions
>> >
>> > Hello everyone,
>> >
>> > To follow up on our discussion yesterday, I've attached the following
>> generated sample TEI files for your feedback. They are loosely in order of
>> decreasing OCR text quality. The variation comes from a number of factors
>> like image quality, complexity of the book structure, as well as the recency
>> and extent of processing. But I'd like to draw your attention to the
>> generated format rather than the text quality at this stage as there are
>> possibilities for exporting our estimates of text quality that we can
>> discuss separately.
>> >
>> > dickens.tei  (Google books ID i8_u_-YmG4MC)
>> > gullivers_travels.tei (Google books ID srVbAAAAQAAJ)
>> > shamela_andrews.tei (Google books ID zNsNAAAAQAAJ)
>> > scandal.tei (Google books ID i3lbAAAAQAAJ)
>> > dunciad.tei (Google books ID gA8UAAAAQAAJ)
>> >
>> > The files were validated using the latest candidate release RNC schema
>> files that follow the TEI best practices guide for libraries at the "Level
>> 3" encoding. Our intention is to supply generated TEI files for our
>> processed volumes via GRIN or some other interface so that you can then
>> disseminate them as you wish to interested humanities scholars. The TEI
>> users and members of the TEI standards body that we've been corresponding
>> with over the past months seem pleased with the samples they've seen, and
>> from the quality of generated output feel they would make a decent starting
>> point for further manual annotation and enrichment.
>> >
>> > I'd like to get your feedback on:
>> > (i) whether and how to restrict the set of volumes for which we generate
>> TEI files. eg. restriction by language, a quality threshold over the
>> document using something like Ashok's text scorer, only public domain books
>> etc. Or maybe this should be library specific?
>> > (ii) whether to use GRIN as the interface to provide these files, and
>> > (iii) whether and how to make an entry in the METS xml file for the
>> generated TEI file to accompany the GRIN package, and what other conventions
>> (eg. file naming) should be followed for that.
>> >
>> > Thanks,
>> > Ranjith
>> >
>>
>>
>>
>>
>>
>> > _______________________________________________
>> > tei-board mailing list
>> > tei-board at lists.village.Virginia.EDU
>> > http://lists.village.Virginia.EDU/mailman/listinfo/tei-board
>>
>> Laurent Romary
>> INRIA & HUB-IDSL
>> laurent.romary at inria.fr
>>
>>
>>
>> _______________________________________________
>> tei-council mailing list
>> tei-council at lists.village.Virginia.EDU
>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>
>> PLEASE NOTE: postings to this list are publicly archived
>>
>
>
>
> --
> Dr Julianne Nyhan,
> (UCL & Universitaet Trier)
>
> *Direct Line:* +44 (0)20 7679 7206)
> *Fax:*  +44 (0)20 7383 0557)
> *Office:* G15a, Department of Information Studies, Foster Court,
> University College London, WC1E 6BT, U.K.
>
> http://www.ucl.ac.uk/infostudies/julianne-nyhan/
> http://germazope.uni-trier.de/Projects/KoZe2/
> http://epu.ucc.ie/theses/jnyhan/
> http://maney.co.uk/index.php/journals/isr/
>
>
>
> Laurent Romary
> INRIA & HUB-IDSL
> laurent.romary at inria.fr
>
>
>
>
>


-- 
Dr Julianne Nyhan,
(UCL & Universitaet Trier)

*Direct Line:* +44 (0)20 7679 7206)
*Fax:*  +44 (0)20 7383 0557)
*Office:* G15a, Department of Information Studies, Foster Court, University
College London, WC1E 6BT, U.K.

http://www.ucl.ac.uk/infostudies/julianne-nyhan/
http://germazope.uni-trier.de/Projects/KoZe2/
http://epu.ucc.ie/theses/jnyhan/
http://maney.co.uk/index.php/journals/isr/


More information about the tei-council mailing list