[tei-council] update on inclusion of TEI output in Google Books
Kevin Hawkins
kevin.s.hawkins at ultraslavonic.info
Sun Dec 15 16:40:42 EST 2013
Fellow Technical Council members (cc'ing Laurent Romary and Stefan
Majewski),
As those of you who have been on Council for a few years know, Google
has expressed interest in providing TEI as a download format in Google
Books -- I believe just for public-domain titles that were scanned at
one of Google's Library Partners. An engineer at Google began work on
this in 2011 in response to a request from the Google Library Quality
working group, which is comprised of staff from Google's various Library
Partners and deals mostly with questions concerning the quality of scans
and OCR. I believe this group occasionally holds conference calls.
The way we know about this is that Peter Gorman (Wisconsin) is a member
of that group, and he looped in Martin Mueller, then Chair of the TEI's
Board, who then looped in Council. We were provided with a few samples
encoded texts, and various of us sent feedback on those samples and more
general matters. In particular I'll mention that since Ranjith was
interested in using the Best Practices for TEI in Libraries (at Peter
Gorman's suggestion, I believe), I urged the engineer to aim for a level
of encoding between Level 3 and Level 4 rather than throwing out
structural data he has beyond Level 3) and not worry about validation to
the schemas provided with the BP. We reviewed some early samples and
heard from the engineer that he needs to lobby further for its inclusion
in order to get it deployed to the public.
Concil created an ad-hoc committee on TEI for Google Books in September
2011 to provide further guidance. The group included James, Martin
Holmes, Laurent Romary, and me. We began work on a document intended to
address the questions from others at Google:
https://docs.google.com/document/d/1PWBt_y-svn8ESAFDz1KxZinKXxc9dfn6kj5sbsIbBR0/edit
Unfortunately, the engineer has been slow to respond in general, having
been pulled to other projects. Peter Gorman tried to restart the work
on this in February 2013, having written to a mix of people interested
in the question, not all of whom were on the Google Library Quality
working group or Council's ad-hoc committee, in February 2013. The
engineer responded that he would come back to this project in March 2013.
At some point the Google Library Quality working group formed a TEI
sub-group to deal just with suggesting improvements on Google's use of
TEI markup. In March a colleague of mine at Michigan asked that Paul
Schaffner and I (both at Michigan as well) be added to the TEI sub-group
of the Google Library Quality working group, though I, and I assume
Paul, are not on the main Google Library Quality working group. At
about the same time, Stefan Majewski was added to the group
(representing the Austrian National Library) and tried to kick-start the
process of evaluating samples.
Some additional samples were provided by Google:
https://drive.google.com/folderview?id=0B_I1dv3x62jERUVqSFktZ3RKeXc&usp=sharing
for the following items in Google Books (using Google Books'
identifiers, based on barcodes stuck in the copy in the library from
which it was scanned):
+Z137414909
+Z136964409
+Z169495603
+Z156881802
+Z156987604
+Z170360609
+Z155001508
+Z150106808
+Z159009101
+Z119545503
+Z152825208
+Z156332508
Stefan looked at them recently and responded to the group with some ways
that they might be involved.
I have asked the TEI sub-group of the Google Library Quality working
group and Ranjith whether they want the Technical Council to continue to
have a role in this, but no one has responded. My suggestion is that
Council decide that since TEI expertise and Council representation is
now being provided through folks like Paul Schaffner (who was recently
reelected to Council for two more years), Stefan Majewski, and me, the
ad-hoc committee can officially disband (not that it was ever official
in much of any way!) and cede this work to the TEI sub-group. I will,
of course, continue to urge the group and the Google engineer in
particular to seek input from the broader TEI community.
Regardless, I will shortly share
https://docs.google.com/document/d/1PWBt_y-svn8ESAFDz1KxZinKXxc9dfn6kj5sbsIbBR0/edit
with the TEI sub-group since I don't believe they ever saw this document.
--Kevin
More information about the tei-council
mailing list