[tei-council] Fwd: Re: Google Books > TEI

Kevin Hawkins kevin.s.hawkins at ultraslavonic.info
Thu Jun 28 00:15:01 EDT 2012


Pardon me: I forgot to include the IDs for these books so you can 
compare against page images:

dickens.tei  (Google books ID i8_u_-YmG4MC)
gullivers_travels.tei (Google books ID srVbAAAAQAAJ)

On 6/28/12 12:12 AM, Kevin Hawkins wrote:
> All,
>
> Now that the latest release is behind us, I'd like to follow up on a few
> things I've promised for you but which wouldn't have contributed to
> getting through bug fixes and feature requests in time for the release.
>
> First of all, in Ann Arbor we agreed that we would ask Ranjith, our
> contact at Google, for the latest samples so we can calculate some
> statistics on accuracy and encourage Google towards making this format
> public.  See the two attachments and the correspondence below.
>
> Our agenda is a bit vague on responsibility here.  I've asked for
> samples, but I think others will want to check for accuracy of encoding.
>
> Kevin
>
> -------- Original Message --------
> Subject:     Re: Google Books > TEI
> Date:     Mon, 23 Apr 2012 15:30:38 -0700
> From:     Ranjith Unnikrishnan <ranjith at google.com>
> To:     Kevin Hawkins <kevin.s.hawkins at ultraslavonic.info>
> CC:     James.Cummings at oucs.ox.ac.uk, mholmes at uvic.ca,
> laurent.romary at inria.fr
>
>
>
> Yes, the last round of feedback I got was around that time frame, and
> came both from your group as well as another working group that included
> some of our library partners. I had incorporated the two sets of
> feedback into some improvements to the algorithm, but they were mostly
> related to style and had nothing to do with producing new output tags or
> such. Comments that were not addressed required improvements to existing
> text structure analysis algorithms that were at least partly based on
> the quality of obtained OCR text. Both of these are active research
> topics that are always on our agenda but are not quick fixes.
> I've attached the latest Dicken's and Gulliver's Travels files, and can
> generate TEI files for others if you can send me links to their
> corresponding pages on Google Books.
>
>
>
> On Mon, Apr 23, 2012 at 2:53 PM, Kevin Hawkins
> <kevin.s.hawkins at ultraslavonic.info
> <mailto:kevin.s.hawkins at ultraslavonic.info>> wrote:
>
>     The last round of reviews I have is a sample of Dickens from July
>     27, 2011.  I have earlier versions of other titles, but they aren't
>     worth consulting at this point since you've improved other things.
>       Has the algorithm changed since then?  It would be nice to have
>     the latest version of not only Dickens but also of some other works
>     in the public domain: perhaps an old bound volume of a journal and a
>     non-fiction book?  Thanks.
>
>
>     On 4/23/2012 12:50 PM, Ranjith Unnikrishnan wrote:
>
>         Hi Kevin,
>
>         I have not made any changes to the TEI generation algorithm
>         since our
>         last round of reviews withing the group. I've since diverted my
>         energy
>         towards getting buy-in and making progress on making the TEI files
>         available on the Books site. I've had some success but it's
>         still early
>         days. Until this is launched, I don't plan to work on increasing
>         the TEI
>         markup depth or anything else related to the process apart from
>         fixing
>         any bugs that may arise during large-scale testing.
>
>         ~R
>
>
>         On Thu, Apr 19, 2012 at 6:05 PM, Kevin Hawkins
>         <kevin.s.hawkins at ultraslavonic.info
>         <mailto:kevin.s.hawkins at ultraslavonic.info>
>         <mailto:kevin.s.hawkins at ultraslavonic.info
>         <mailto:kevin.s.hawkins at ultraslavonic.info>>> wrote:
>
>             Hi Ranjith,
>
>             While we wait on your colleagues to integrate the code for
>             generating TEI into your production pipeline (for which we
>         hope our
>             Google Docs brainstorming has helped you make that case),
>         the TEI
>             Technical Council is thinking about how we might publicize the
>             availability of TEI documents in Google Books -- when that
>         day comes
>             -- and what it might mean for our community.  The sort of
>         message we
>             would promote depends on the depth and consistency of the
> markup
>             that Google is able to create. Would you be able to provide
>         us with
>             some of samples generated by the latest version of your
>         code?  (We
>             saw early drafts, but I'm not sure that I have any of the final
>             versions.)  I'd like to share them with the Technical Council.
>
>             Thanks,
>
>             Kevin
>
>
>             On 2/6/12 12:24 PM, Ranjith Unnikrishnan wrote:
>
>                 Hi Kevin,
>
>                 The code has been reviewed and checked in but we're
>         still working on
>                 some questions related to integrating the code in our
>         production
>                 pipeline. There's probably not much you can help with at
>         this
>                 point, but
>                 I might need some input from you guys as we get closer
>         to deploying.
>
>                 Sorry this is not moving as fast as I'd like; this
>         effort is one
>                 of my
>                 independent "20%" projects, and those have a general
>         tendency of
>                 getting
>                 pushed down the priority list in light of more urgent
>         tasks. So
>                 don't
>                 hesitate to check up once in a while.
>
>                 ~Ranjith
>
>



More information about the tei-council mailing list