[tei-council] update from the TEI Tite task force: comments on 2 tickets by October 1

James Cummings James.Cummings at oucs.ox.ac.uk
Mon Sep 19 13:36:00 EDT 2011

On 19/09/11 17:25, Lou Burnard wrote:
> Hear hear. Not least because of its lack of a TEI Header -- which makes
> it arguably non-TEI-conformant anyway.

Is there any argument about that? If it breaks the TEI Abstract 
Model, it is not TEI-conformant.  The argument might be whether 
it was 'Conformable' or not, but I don't think it can be since it 
includes no metadata and the TEI Abstract Model requires metadata.

> Fascinating. You could also have just turned each element name into a
> single Unicode character of course! Is the ODD online?

Sure and I could have turned every distinct-value 
element/attribute combination into a single unicode character... 
there are enough of them ;-) And I could ahve made it a binary 
format and ..... etc.

But the point is not to reinvent a new complicated toolchain, 
just use an abbreviated element/attribute names, a fairly small 
subset of elements, etc. Doing this on the sample provided saves 
40% compared to the expanded form of the markup. I'm less 
certainly whether this will carry over to a larger sample size of 
thousands of pages. In this case we want the vendor to basically 
capture presentational aspects because it is an edition that has 
been very careful to represent different aspects of data with 
different font-changes. (So the @r (@rend) attribute has a large 
number of items in a closed valList.) Also we want to minimize 
their interpretative input as much as possible. :-)

The ODD isn't public yet, but when it has been finally agreed 
(with our friends in the Bodley), and a set of sample materials 
encoded by the vendor as a test,  it will indeed be made 
available under a CC+BY license.  I would, additionally, be 
willing to donate it to the TEI-C as an additional exemplar if it 
was decided it might be useful.  I think it is an unusual 
situation though because most vendors charge by 
word-count/page-count/inputsize/complexity or something rather 
than output-byte-count. Or if they do wouldn't accept a 
byte-reduced schema. :-)


Dr James Cummings, InfoDev,
Computing Services, University of Oxford

More information about the tei-council mailing list