[tei-council] update from the TEI Tite task force: comments on 2 tickets by October 1
James Cummings
James.Cummings at oucs.ox.ac.uk
Mon Sep 19 13:36:00 EDT 2011
On 19/09/11 17:25, Lou Burnard wrote:
> Hear hear. Not least because of its lack of a TEI Header -- which makes
> it arguably non-TEI-conformant anyway.
Is there any argument about that? If it breaks the TEI Abstract
Model, it is not TEI-conformant. The argument might be whether
it was 'Conformable' or not, but I don't think it can be since it
includes no metadata and the TEI Abstract Model requires metadata.
> Fascinating. You could also have just turned each element name into a
> single Unicode character of course! Is the ODD online?
Sure and I could have turned every distinct-value
element/attribute combination into a single unicode character...
there are enough of them ;-) And I could ahve made it a binary
format and ..... etc.
But the point is not to reinvent a new complicated toolchain,
just use an abbreviated element/attribute names, a fairly small
subset of elements, etc. Doing this on the sample provided saves
40% compared to the expanded form of the markup. I'm less
certainly whether this will carry over to a larger sample size of
thousands of pages. In this case we want the vendor to basically
capture presentational aspects because it is an edition that has
been very careful to represent different aspects of data with
different font-changes. (So the @r (@rend) attribute has a large
number of items in a closed valList.) Also we want to minimize
their interpretative input as much as possible. :-)
The ODD isn't public yet, but when it has been finally agreed
(with our friends in the Bodley), and a set of sample materials
encoded by the vendor as a test, it will indeed be made
available under a CC+BY license. I would, additionally, be
willing to donate it to the TEI-C as an additional exemplar if it
was decided it might be useful. I think it is an unusual
situation though because most vendors charge by
word-count/page-count/inputsize/complexity or something rather
than output-byte-count. Or if they do wouldn't accept a
byte-reduced schema. :-)
-James
--
Dr James Cummings, InfoDev,
Computing Services, University of Oxford
More information about the tei-council
mailing list