[tei-council] Fwd: Re: update on inclusion of TEI output in Google Books
Kevin Hawkins
kevin.s.hawkins at ultraslavonic.info
Mon Dec 16 18:57:24 EST 2013
Forwarding since Laurent probably was not allowed to post to tei-council
as a non-member of that list ...
-------- Original Message --------
Subject: Re: update on inclusion of TEI output in Google Books
Date: Mon, 16 Dec 2013 10:28:20 +0100
From: Laurent Romary <laurent.romary at inria.fr>
To: Kevin Hawkins <kevin.s.hawkins at ULTRASLAVONIC.INFO>
CC: TEI Council <tei-council at lists.village.Virginia.EDU>, Majewski
Stefan <stefan.majewski at onb.ac.at>
Thanks a lot Kevin for this update and I am happy with what you suggest.
I think it is very important that the community is informed on a regular
basis about the progress within Google so that we can reflect that on
how to behave with regards Google books content.
Have a good week,
Laurent
Le 15 déc. 2013 à 22:40, Kevin Hawkins
<kevin.s.hawkins at ULTRASLAVONIC.INFO> a écrit :
> Fellow Technical Council members (cc'ing Laurent Romary and Stefan Majewski),
>
> As those of you who have been on Council for a few years know, Google has expressed interest in providing TEI as a download format in Google Books -- I believe just for public-domain titles that were scanned at one of Google's Library Partners. An engineer at Google began work on this in 2011 in response to a request from the Google Library Quality working group, which is comprised of staff from Google's various Library Partners and deals mostly with questions concerning the quality of scans and OCR. I believe this group occasionally holds conference calls.
>
> The way we know about this is that Peter Gorman (Wisconsin) is a member of that group, and he looped in Martin Mueller, then Chair of the TEI's Board, who then looped in Council. We were provided with a few samples encoded texts, and various of us sent feedback on those samples and more general matters. In particular I'll mention that since Ranjith was interested in using the Best Practices for TEI in Libraries (at Peter Gorman's suggestion, I believe), I urged the engineer to aim for a level of encoding between Level 3 and Level 4 rather than throwing out structural data he has beyond Level 3) and not worry about validation to the schemas provided with the BP. We reviewed some early samples and heard from the engineer that he needs to lobby further for its inclusion in order to get it deployed to the public.
>
> Concil created an ad-hoc committee on TEI for Google Books in September 2011 to provide further guidance. The group included James, Martin Holmes, Laurent Romary, and me. We began work on a document intended to address the questions from others at Google:
>
> https://docs.google.com/document/d/1PWBt_y-svn8ESAFDz1KxZinKXxc9dfn6kj5sbsIbBR0/edit
>
> Unfortunately, the engineer has been slow to respond in general, having been pulled to other projects. Peter Gorman tried to restart the work on this in February 2013, having written to a mix of people interested in the question, not all of whom were on the Google Library Quality working group or Council's ad-hoc committee, in February 2013. The engineer responded that he would come back to this project in March 2013.
>
> At some point the Google Library Quality working group formed a TEI sub-group to deal just with suggesting improvements on Google's use of TEI markup. In March a colleague of mine at Michigan asked that Paul Schaffner and I (both at Michigan as well) be added to the TEI sub-group of the Google Library Quality working group, though I, and I assume Paul, are not on the main Google Library Quality working group. At about the same time, Stefan Majewski was added to the group (representing the Austrian National Library) and tried to kick-start the process of evaluating samples.
>
> Some additional samples were provided by Google:
>
> https://drive.google.com/folderview?id=0B_I1dv3x62jERUVqSFktZ3RKeXc&usp=sharing
>
> for the following items in Google Books (using Google Books' identifiers, based on barcodes stuck in the copy in the library from which it was scanned):
>
> +Z137414909
> +Z136964409
> +Z169495603
> +Z156881802
> +Z156987604
> +Z170360609
> +Z155001508
> +Z150106808
> +Z159009101
> +Z119545503
> +Z152825208
> +Z156332508
>
> Stefan looked at them recently and responded to the group with some ways that they might be involved.
>
> I have asked the TEI sub-group of the Google Library Quality working group and Ranjith whether they want the Technical Council to continue to have a role in this, but no one has responded. My suggestion is that Council decide that since TEI expertise and Council representation is now being provided through folks like Paul Schaffner (who was recently reelected to Council for two more years), Stefan Majewski, and me, the ad-hoc committee can officially disband (not that it was ever official in much of any way!) and cede this work to the TEI sub-group. I will, of course, continue to urge the group and the Google engineer in particular to seek input from the broader TEI community.
>
> Regardless, I will shortly share https://docs.google.com/document/d/1PWBt_y-svn8ESAFDz1KxZinKXxc9dfn6kj5sbsIbBR0/edit with the TEI sub-group since I don't believe they ever saw this document.
>
> --Kevin
Laurent Romary
INRIA & HUB-IDSL
laurent.romary at inria.fr
More information about the tei-council
mailing list