[tei-council] Project Gutenberg

Sebastian Rahtz sebastian.rahtz at oucs.ox.ac.uk
Mon Apr 25 06:37:42 EDT 2011

catching up late on this, I looked at Matt J's script, and Gut texts, and am slightly puzzled as to why he started from the
plain text stuff, when Gut themselves produce transformable HTML by some means (and epub, for that matter, I assume
to the same standard).  Doing a quick check with "Nada The Lily", the chapter structure is present in the HTML. There is so
much ad hoc markup, tho, one's heart sinks a little.

but perhaps lots of Gut stuff is only in plain text?

It is tempting to take apart a Google epub and convert it to TEI. Is this worth doing, to show Google how we might like it?
it is a bit like flailing in the dark, tho, as there is info in there I for one do not grok.

eg, in:

the only mental provision she was making for the evening of life, was the collecting and transcribing all the 
riddles of every<a id="GBS.PA145.w."></a> sort that she could meet with, into a thin quarto of 
hot-pressed<a id="ORIG-GBS.PA146.w.1.0.0"></a>
<span class="gtxt_body" id="para.161.1.0.box.242.302.808.925.q.60"><a id="GBS.PA146"></a><a id="GBS.PA146.w.0.0.0"></a><a id="GBS.PA146.w.1.0.0"></a>
paper, made up by her friend, and ornamented

what is the function of the sequence of 3 empty <a> elements?

I have only looked at one Google ePub, so its very hard to tell what the patterns are.
Sebastian Rahtz      
Head of Information and Support Group, Oxford University Computing Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431

Sólo le pido a Dios
que el futuro no me sea indiferente

More information about the tei-council mailing list