[tei-council] TEI versions of Gutenberg and Google
martinmueller at northwestern.edu
Mon Apr 18 09:52:09 EDT 2011
Has there ever been any TEI version of a Gutenberg text produced in this
manner? I've written to Matt Jockers to see what they've done with their
On a related matter, has anybody tried to convert Google ebooks to TEI?
This would be a sort of reverse engineering and unnecessary if Google has
a TEI output format. On the other hand, demonstration projects from within
the TEI community might spur them.
I recently read Hofmannsthal's Der Schwierige in a Google epub version and
looked at the encoding and OCR. There have been some interesting
experiments at UIUC about transforming the "white space XML" of OCR output
into TEI. It takes only minimal forms of human intervention to correct
basic structural errors. One of the attractive aspects of TEI texts that
originate in OCR is that you end up with a text that is a chain of digital
surrogates stretching from the page image through the transcription to the
data derivatives and aggregates that you can construct from a corpus of
Laura Mandell and the 18thConnect Project are very active in that field.
I don't know enough about what goes on inside an epub text to figure out
how to transform it. But it looks like it's possible, and if we did a few
proof-of-concept conversions, it might be a way of nudging Google.
On 4/18/11 12:04 AM, "Laurent Romary" <laurent.romary at inria.fr> wrote:
>The idea would be to have a way to make those projects maintain their
>schema in a way which is closer to what the TEI itself does. They could
>even maintain these on SF. By helping them going this way, we would go in
>the direction indicated by Martin M. in his talk last week.
>Le 18 avr. 2011 ¨¤ 00:25, Lou Burnard a ¨¦crit :
>> We did have quite a bit of discussion with Marcello back in 2006 or
>> or so, but I haven't heard much from them lately.
>> Re-expressing their subset of TEI as an ODD would be a fairly trivial
>> exercise, but I'm not sure what it would achieve.
>> On 17/04/11 17:01, Laurent Romary wrote:
>>> Would someone already ni close contact with PG be eager to take an
>>>informal contact with the guy maintaining the page and see whether he
>>>would like having his schema as a P5 ODD, which would also allow him to
>>>update some tiny features here and there (like using xml:lang)?
>>> Le 17 avr. 2011 ¨¤ 17:48, Piotr Ba¨½ski a ¨¦crit :
>>>> Regarding the need to secure the choice of TEI as the format of choice
>>>> in (among others) digitization of literary works, should we not pay
>>>> special attention to developments such as PGTEI? Project Gutenberg is
>>>> very serious and popular initiative, and providing support to it will
>>>> benefit both sides.
>>>> I can't recall PG being mentioned at TEI-MMs or on TEI-L (may have
>>>> missed something obvious though). PGTEI appears to be a derived format
>>>> instead of being an ODD customization -- perhaps all is not lost and
>>>> can provide support for PG, in return enlarging the community
>>>> with all the related benefits.
>>>> * http://pgtei.pglaf.org/marcello/0.4/doc/20000-h.html
>>>> * https://www.stanford.edu/~mjockers/cgi-bin/drupal/node/49
>>>> tei-council mailing list
>>>> tei-council at lists.village.Virginia.EDU
>>>> PLEASE NOTE: postings to this list are publicly archived
>>> Laurent Romary
>>> INRIA& HUB-IDSL
>>> laurent.romary at inria.fr
>>> tei-council mailing list
>>> tei-council at lists.village.Virginia.EDU
>>> PLEASE NOTE: postings to this list are publicly archived
>> tei-council mailing list
>> tei-council at lists.village.Virginia.EDU
>> PLEASE NOTE: postings to this list are publicly archived
>INRIA & HUB-IDSL
>laurent.romary at inria.fr
>tei-council mailing list
>tei-council at lists.village.Virginia.EDU
>PLEASE NOTE: postings to this list are publicly archived
More information about the tei-council