[tei-council] FW: first stab at Google > TEI

Martin Mueller martinmueller at northwestern.edu
Wed Jun 22 20:56:48 EDT 2011


It would be worth following up on Stuart Yeates' suggestion and pick a set
of texts that have different types of problems at the paragraph-like
level.  

What about verse? It is characteristically indented. Can it be
distinguished from indented prose?

If you know that a text is a play or contains only plays, can you guess
speaker changes? 

And so forth, not to speak of pursuing all this in different languages and
alphabets.

On 6/22/11 6:57 PM, "Sebastian Rahtz" <sebastian.rahtz at oucs.ox.ac.uk>
wrote:

>The revised Gulliver looks fine, within its limitations. Until/unless
>chapter detection is available,
>probably not much point doing any more work there.
>
>perhaps try something a bit more complex? eg
>http://books.google.com/books?id=ud-_YWJwCzAC&dq=gentleman's%20magazine&pg
>=PA4#v=onepage&q&f=false
>
>A bigger challenge would be
>http://books.google.com/books?id=Wo5HAAAAYAAJ&dq=greek&pg=PP10#v=onepage&q
>&f=false .....
>--
>Sebastian Rahtz   
>Head of Information and Support Group, Oxford University Computing
>Services
>13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
>
>Sólo le pido a Dios
>que el futuro no me sea indiferente
>
>
>
>
>
>




More information about the tei-council mailing list