[tei-council] update on inclusion of TEI output in Google Books

Kevin Hawkins kevin.s.hawkins at ultraslavonic.info
Sun Dec 15 16:40:42 EST 2013


Fellow Technical Council members (cc'ing Laurent Romary and Stefan 
Majewski),

As those of you who have been on Council for a few years know, Google 
has expressed interest in providing TEI as a download format in Google 
Books -- I believe just for public-domain titles that were scanned at 
one of Google's Library Partners.  An engineer at Google began work on 
this in 2011 in response to a request from the Google Library Quality 
working group, which is comprised of staff from Google's various Library 
Partners and deals mostly with questions concerning the quality of scans 
and OCR.  I believe this group occasionally holds conference calls.

The way we know about this is that Peter Gorman (Wisconsin) is a member 
of that group, and he looped in Martin Mueller, then Chair of the TEI's 
Board, who then looped in Council.  We were provided with a few samples 
encoded texts, and various of us sent feedback on those samples and more 
general matters.  In particular I'll mention that since Ranjith was 
interested in using the Best Practices for TEI in Libraries (at Peter 
Gorman's suggestion, I believe), I urged the engineer to aim for a level 
of encoding between Level 3 and Level 4 rather than throwing out 
structural data he has beyond Level 3) and not worry about validation to 
the schemas provided with the BP.  We reviewed some early samples and 
heard from the engineer that he needs to lobby further for its inclusion 
in order to get it deployed to the public.

Concil created an ad-hoc committee on TEI for Google Books in September 
2011 to provide further guidance. The group included James, Martin 
Holmes, Laurent Romary, and me.  We began work on a document intended to 
address the questions from others at Google:

https://docs.google.com/document/d/1PWBt_y-svn8ESAFDz1KxZinKXxc9dfn6kj5sbsIbBR0/edit

Unfortunately, the engineer has been slow to respond in general, having 
been pulled to other projects.  Peter Gorman tried to restart the work 
on this in February 2013, having written to a mix of people interested 
in the question, not all of whom were on the Google Library Quality 
working group or Council's ad-hoc committee, in February 2013.  The 
engineer responded that he would come back to this project in March 2013.

At some point the Google Library Quality working group formed a TEI 
sub-group to deal just with suggesting improvements on Google's use of 
TEI markup.  In March a colleague of mine at Michigan asked that Paul 
Schaffner and I (both at Michigan as well) be added to the TEI sub-group 
of the Google Library Quality working group, though I, and I assume 
Paul, are not on the main Google Library Quality working group.  At 
about the same time, Stefan Majewski was added to the group 
(representing the Austrian National Library) and tried to kick-start the 
process of evaluating samples.

Some additional samples were provided by Google:

https://drive.google.com/folderview?id=0B_I1dv3x62jERUVqSFktZ3RKeXc&usp=sharing

for the following items in Google Books (using Google Books' 
identifiers, based on barcodes stuck in the copy in the library from 
which it was scanned):

+Z137414909
+Z136964409
+Z169495603
+Z156881802
+Z156987604
+Z170360609
+Z155001508
+Z150106808
+Z159009101
+Z119545503
+Z152825208
+Z156332508

Stefan looked at them recently and responded to the group with some ways 
that they might be involved.

I have asked the TEI sub-group of the Google Library Quality working 
group and Ranjith whether they want the Technical Council to continue to 
have a role in this, but no one has responded.  My suggestion is that 
Council decide that since TEI expertise and Council representation is 
now being provided through folks like Paul Schaffner (who was recently 
reelected to Council for two more years), Stefan Majewski, and me, the 
ad-hoc committee can officially disband (not that it was ever official 
in much of any way!) and cede this work to the TEI sub-group.  I will, 
of course, continue to urge the group and the Google engineer in 
particular to seek input from the broader TEI community.

Regardless, I will shortly share 
https://docs.google.com/document/d/1PWBt_y-svn8ESAFDz1KxZinKXxc9dfn6kj5sbsIbBR0/edit 
with the TEI sub-group since I don't believe they ever saw this document.

--Kevin


More information about the tei-council mailing list