[tei-council] Fwd: Re: Google Books > TEI
Kevin Hawkins
kevin.s.hawkins at ultraslavonic.info
Thu Jun 28 00:15:01 EDT 2012
Pardon me: I forgot to include the IDs for these books so you can
compare against page images:
dickens.tei (Google books ID i8_u_-YmG4MC)
gullivers_travels.tei (Google books ID srVbAAAAQAAJ)
On 6/28/12 12:12 AM, Kevin Hawkins wrote:
> All,
>
> Now that the latest release is behind us, I'd like to follow up on a few
> things I've promised for you but which wouldn't have contributed to
> getting through bug fixes and feature requests in time for the release.
>
> First of all, in Ann Arbor we agreed that we would ask Ranjith, our
> contact at Google, for the latest samples so we can calculate some
> statistics on accuracy and encourage Google towards making this format
> public. See the two attachments and the correspondence below.
>
> Our agenda is a bit vague on responsibility here. I've asked for
> samples, but I think others will want to check for accuracy of encoding.
>
> Kevin
>
> -------- Original Message --------
> Subject: Re: Google Books > TEI
> Date: Mon, 23 Apr 2012 15:30:38 -0700
> From: Ranjith Unnikrishnan <ranjith at google.com>
> To: Kevin Hawkins <kevin.s.hawkins at ultraslavonic.info>
> CC: James.Cummings at oucs.ox.ac.uk, mholmes at uvic.ca,
> laurent.romary at inria.fr
>
>
>
> Yes, the last round of feedback I got was around that time frame, and
> came both from your group as well as another working group that included
> some of our library partners. I had incorporated the two sets of
> feedback into some improvements to the algorithm, but they were mostly
> related to style and had nothing to do with producing new output tags or
> such. Comments that were not addressed required improvements to existing
> text structure analysis algorithms that were at least partly based on
> the quality of obtained OCR text. Both of these are active research
> topics that are always on our agenda but are not quick fixes.
> I've attached the latest Dicken's and Gulliver's Travels files, and can
> generate TEI files for others if you can send me links to their
> corresponding pages on Google Books.
>
>
>
> On Mon, Apr 23, 2012 at 2:53 PM, Kevin Hawkins
> <kevin.s.hawkins at ultraslavonic.info
> <mailto:kevin.s.hawkins at ultraslavonic.info>> wrote:
>
> The last round of reviews I have is a sample of Dickens from July
> 27, 2011. I have earlier versions of other titles, but they aren't
> worth consulting at this point since you've improved other things.
> Has the algorithm changed since then? It would be nice to have
> the latest version of not only Dickens but also of some other works
> in the public domain: perhaps an old bound volume of a journal and a
> non-fiction book? Thanks.
>
>
> On 4/23/2012 12:50 PM, Ranjith Unnikrishnan wrote:
>
> Hi Kevin,
>
> I have not made any changes to the TEI generation algorithm
> since our
> last round of reviews withing the group. I've since diverted my
> energy
> towards getting buy-in and making progress on making the TEI files
> available on the Books site. I've had some success but it's
> still early
> days. Until this is launched, I don't plan to work on increasing
> the TEI
> markup depth or anything else related to the process apart from
> fixing
> any bugs that may arise during large-scale testing.
>
> ~R
>
>
> On Thu, Apr 19, 2012 at 6:05 PM, Kevin Hawkins
> <kevin.s.hawkins at ultraslavonic.info
> <mailto:kevin.s.hawkins at ultraslavonic.info>
> <mailto:kevin.s.hawkins at ultraslavonic.info
> <mailto:kevin.s.hawkins at ultraslavonic.info>>> wrote:
>
> Hi Ranjith,
>
> While we wait on your colleagues to integrate the code for
> generating TEI into your production pipeline (for which we
> hope our
> Google Docs brainstorming has helped you make that case),
> the TEI
> Technical Council is thinking about how we might publicize the
> availability of TEI documents in Google Books -- when that
> day comes
> -- and what it might mean for our community. The sort of
> message we
> would promote depends on the depth and consistency of the
> markup
> that Google is able to create. Would you be able to provide
> us with
> some of samples generated by the latest version of your
> code? (We
> saw early drafts, but I'm not sure that I have any of the final
> versions.) I'd like to share them with the Technical Council.
>
> Thanks,
>
> Kevin
>
>
> On 2/6/12 12:24 PM, Ranjith Unnikrishnan wrote:
>
> Hi Kevin,
>
> The code has been reviewed and checked in but we're
> still working on
> some questions related to integrating the code in our
> production
> pipeline. There's probably not much you can help with at
> this
> point, but
> I might need some input from you guys as we get closer
> to deploying.
>
> Sorry this is not moving as fast as I'd like; this
> effort is one
> of my
> independent "20%" projects, and those have a general
> tendency of
> getting
> pushed down the priority list in light of more urgent
> tasks. So
> don't
> hesitate to check up once in a while.
>
> ~Ranjith
>
>
More information about the tei-council
mailing list