[tei-council] Fwd: Reconciling EEBO and P5 dtdas

Laurent Romary laurent.romary at INRIA.FR
Tue Nov 8 05:00:22 EST 2011

For the discussion this afternoon.

Début du message réexpédié :

> De : Martin Mueller <martinmueller at northwestern.edu>
> Date : 26 octobre 2011 22:39:12 HAEC
> À : Laurent Romary <laurent.romary at INRIA.FR>, James Cummings <James.Cummings at oucs.ox.ac.uk>
> Cc : Sebastian Rahtz <sebastian.rahtz at OUCS.OX.AC.UK>, "Paul F. Schaffner" <PFSchaffner at umich.edu>, Brian L Pytlik Zillig <bpytlikz at unlnotes.unl.edu>, Lou Burnard <lou.burnard at RETIRED.OX.AC.UK>
> Objet : Reconciling EEBO and P5 dtdas
> James and Laurent,
> I address this memo to the two of you as the outgoing and incoming chairs of the TEI Council.  The topic is whether we can reconcile the EEBO and P5 dtds and arrive at a version of P5 that would make it possible to distribute the TCP texts in a pure subset of P5 when they go into the public domain in 2015. Some 2,000 ECCO-TCP texts are already in the public domain. 
> It may or may not be possible to resolve all the issues at the Paris meeting of the Council, but it will be helpful to discuss the problems and get some sense of the meeting about best ways to proceed.  There has been an informal and self-appointed work group on this issue.  Sebastian, Lou, Brian Pytlik Zillig at Nebraska, and Paul Schaffner have commented on various aspects of it.  In the likely event that things will not conclude in Paris, there may be some virtue in having a meeting of this group to come up with a detailed proposal for discussion and ratification. 
> There are two Google docs that contain fairly detailed information about this matter. The first is a spreadsheet that tabulates problems and suggested solutions, including comments from Lou and Sebastian.  The second is an informal essay about various problems
> https://docs.google.com/spreadsheet/ccc?key=0AudTBrdzdsHLdEZvRy02dGhfeFJfeC1EeTVzZ2FrSFE
> https://docs.google.com/document/d/1aYLAK575-jZmmYcKrT-CECckgA0twMSZjlmR_0oXq1A/edit
> The following is a summary of the problems and possible solutions. It does NOT represent a consensus of the informal workgroup.
> The EEBO dtd started out as a lightweight version of TEI-Lite. It is quite light, using about 70 elements, not counting two dozen header elements and the numbered divs it uses. Some 30,000 texts from the first three centuries of print culture have been encoded in it. An equal number are scheduled to be encoded over the course of the next few years, and a very large library of early modern English texts will pass into the public domain starting in 2015.
> The EEBO-dtd has diverged from pure TEI in various ways over the years responding to problems the encoders encountered. From the perspective of end users there are substantial benefits in the form of reduced transaction costs if the texts can be manipulated in a pure subset of TEI.  It is possible to get there through some combination of three things:
> accommodate EEBO practices by incorporating them into P5
> transform EEBO into different but equivalent P5 elements
> Change the encoding of EEBO texts
> The third is obviously a matter of last resort, but the cases where it may be necessary seem in fact to be numbered in the dozens rather than hundreds or thousands. 
> As for the first, the following would remove most discrepancies:
> accept the EECO model for <figure> wholly or in part. 
> add <opener> and <closer> to the P5 <postscript> model
> permit <l> as a direct child of <head>
> permit <stage> as a direct child of <lg>
> permit <floatingText> and <table> as direct children of <sp> (which would also align <sp> with <said>, which does allow those elements
> With regard to the second:
> <headnote> and <tailnote> can be expressed as <notes> with type attributes
> single <p> elements inside <cell> can be deleted making the content of <p> the immediate content of <cell>
> The EEBO <element> can be expressed as <floatingText type="letter">
> The rare <above> and <below> can be expressed as <hi rend="above"> or perhaps some different ways
> <postscript> elements as last children of <closer> can be turned into right siblings of <closer>
> With regard to the third:
> Cases in which <list> appears as child of <label> in EEBO texts should be remodeled as two-column tables or expressed in some other
> The few instances in which <cell> has a complex content model along the lines of <item> should be rethought.
> In addition to these problems, the encoding of signatures in early modern texts may require additional discussion, 
> Some other points are discussed in the longer Google documents, but well over 95% of parsing problems with EEBO texts under P5 can be resolved by some combination of the features discussed above. There are probably some important points I did not raise, but I trust that Paul will spot them, and I hope that this memo will have some use in stimulating discussion. 

Laurent Romary
laurent.romary at inria.fr

More information about the tei-council mailing list