[tei-council] Fwd: [tei-board] Report from Google engineer about progress with TEI

Julianne Nyhan julianne.nyhan at gmail.com
Fri Aug 19 04:43:32 EDT 2011


I just got the attachment.
J

On Fri, Aug 19, 2011 at 9:42 AM, Lou Burnard
<lou.burnard at retired.ox.ac.uk>wrote:

>  The list don't allow attachments. Also these probably shouldn't be
> publicly available. I sugggest sharing the up as a Google doc.
>
> Sent from my HTC
>
>
> ----- Reply message -----
> From: "Julianne Nyhan" <julianne.nyhan at gmail.com>
> To: "Laurent Romary" <laurent.romary at inria.fr>
> Cc: "TEI Council" <tei-council at lists.village.virginia.edu>, "John
> Unsworth" <unsworth at illinois.edu>
> Subject: [tei-council] Fwd: [tei-board] Report from Google engineer about
> progress with TEI
> Date: Fri, Aug 19, 2011 09:40
>
>
>
> Super, thanks!
> Julianne
>
> On Fri, Aug 19, 2011 at 9:37 AM, Laurent Romary <laurent.romary at inria.fr
> >wrote:
>
> > You're right the list did not allow them through. I hereby try a zip.
> > Laurent
> >
> >
> > Le 19 août 2011 à 10:17, Julianne Nyhan a écrit :
> >
> > Sorry if I'm being a bit daft but can you please tell me where can I view
> > the files referred to the
> > message?
> > Thanks,
> > Julianne
> >
> > On Fri, Aug 19, 2011 at 8:18 AM, Laurent Romary <laurent.romary at inria.fr
> >wrote:
> >
> >> Council. See the message below which is a follow up on some technical
> >> feedback from Google that we already discussed. Please provide your
> views on
> >> this and possibly volunterr if you want to be the council contact for
> this
> >> collaboration.
> >> Laurent
> >>
> >>
> >> Début du message réexpédié :
> >>
> >> > De : Martin Mueller <martinmueller at northwestern.edu>
> >> > Date : 11 août 2011 04:04:59 HAEC
> >> > À : "tei-board at lists.village.Virginia.EDU" <
> >> tei-board at lists.village.Virginia.EDU>
> >> > Objet : [tei-board] Report from Google engineer about progress with
> TEI
> >> > Répondre à : tei-board at lists.village.Virginia.EDU
> >> >
> >> >
> >> >
> >> > From: Ranjith Unnikrishnan <ranjith at google.com>
> >> > Date: Wed, 10 Aug 2011 18:54:20 -0700
> >> > To: <google-library-quality at googlegroups.com>, Jeff Breidenbach <
> >> jbreiden at google.com>, Martin Mueller <martinmueller at northwestern.edu>
> >> > Subject: TEI samples and open questions
> >> >
> >> > Hello everyone,
> >> >
> >> > To follow up on our discussion yesterday, I've attached the following
> >> generated sample TEI files for your feedback. They are loosely in order
> of
> >> decreasing OCR text quality. The variation comes from a number of
> factors
> >> like image quality, complexity of the book structure, as well as the
> recency
> >> and extent of processing. But I'd like to draw your attention to the
> >> generated format rather than the text quality at this stage as there are
> >> possibilities for exporting our estimates of text quality that we can
> >> discuss separately.
> >> >
> >> > dickens.tei  (Google books ID i8_u_-YmG4MC)
> >> > gullivers_travels.tei (Google books ID srVbAAAAQAAJ)
> >> > shamela_andrews.tei (Google books ID zNsNAAAAQAAJ)
> >> > scandal.tei (Google books ID i3lbAAAAQAAJ)
> >> > dunciad.tei (Google books ID gA8UAAAAQAAJ)
> >> >
> >> > The files were validated using the latest candidate release RNC schema
> >> files that follow the TEI best practices guide for libraries at the
> "Level
> >> 3" encoding. Our intention is to supply generated TEI files for our
> >> processed volumes via GRIN or some other interface so that you can then
> >> disseminate them as you wish to interested humanities scholars. The TEI
> >> users and members of the TEI standards body that we've been
> corresponding
> >> with over the past months seem pleased with the samples they've seen,
> and
> >> from the quality of generated output feel they would make a decent
> starting
> >> point for further manual annotation and enrichment.
> >> >
> >> > I'd like to get your feedback on:
> >> > (i) whether and how to restrict the set of volumes for which we
> generate
> >> TEI files. eg. restriction by language, a quality threshold over the
> >> document using something like Ashok's text scorer, only public domain
> books
> >> etc. Or maybe this should be library specific?
> >> > (ii) whether to use GRIN as the interface to provide these files, and
> >> > (iii) whether and how to make an entry in the METS xml file for the
> >> generated TEI file to accompany the GRIN package, and what other
> conventions
> >> (eg. file naming) should be followed for that.
> >> >
> >> > Thanks,
> >> > Ranjith
> >> >
> >>
> >>
> >>
> >>
> >>
> >> > _______________________________________________
> >> > tei-board mailing list
> >> > tei-board at lists.village.Virginia.EDU
> >> > http://lists.village.Virginia.EDU/mailman/listinfo/tei-board
> >>
> >> Laurent Romary
> >> INRIA & HUB-IDSL
> >> laurent.romary at inria.fr
> >>
> >>
> >>
> >> _______________________________________________
> >> tei-council mailing list
> >> tei-council at lists.village.Virginia.EDU
> >> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
> >>
> >> PLEASE NOTE: postings to this list are publicly archived
> >>
> >
> >
> >
> > --
> > Dr Julianne Nyhan,
> > (UCL & Universitaet Trier)
> >
> > *Direct Line:* +44 (0)20 7679 7206)
> > *Fax:*  +44 (0)20 7383 0557)
> > *Office:* G15a, Department of Information Studies, Foster Court,
> > University College London, WC1E 6BT, U.K.
> >
> > http://www.ucl.ac.uk/infostudies/julianne-nyhan/
> > http://germazope.uni-trier.de/Projects/KoZe2/
> > http://epu.ucc.ie/theses/jnyhan/
> > http://maney.co.uk/index.php/journals/isr/
> >
> >
> >
> > Laurent Romary
> > INRIA & HUB-IDSL
> > laurent.romary at inria.fr
> >
> >
> >
> >
> >
>
>
> --
> Dr Julianne Nyhan,
> (UCL & Universitaet Trier)
>
> *Direct Line:* +44 (0)20 7679 7206)
> *Fax:*  +44 (0)20 7383 0557)
> *Office:* G15a, Department of Information Studies, Foster Court, University
> College London, WC1E 6BT, U.K.
>
> http://www.ucl.ac.uk/infostudies/julianne-nyhan/
> http://germazope.uni-trier.de/Projects/KoZe2/
> http://epu.ucc.ie/theses/jnyhan/
> http://maney.co.uk/index.php/journals/isr/
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>
> PLEASE NOTE: postings to this list are publicly archived
>



-- 
Dr Julianne Nyhan,
(UCL & Universitaet Trier)

*Direct Line:* +44 (0)20 7679 7206)
*Fax:*  +44 (0)20 7383 0557)
*Office:* G15a, Department of Information Studies, Foster Court, University
College London, WC1E 6BT, U.K.

http://www.ucl.ac.uk/infostudies/julianne-nyhan/
http://germazope.uni-trier.de/Projects/KoZe2/
http://epu.ucc.ie/theses/jnyhan/
http://maney.co.uk/index.php/journals/isr/


More information about the tei-council mailing list