[tei-council] comments on edw90

Lou Burnard lou.burnard at computing-services.oxford.ac.uk
Mon Aug 8 11:16:05 EDT 2005


Here are some (rather belated) comments on some substantive issues 
raised by Syd's paper EDW90.

1. Syd's "Challenges"

1.1 The TYPE attribute.

Sebastian and I have done some testing of the feasibility of doing local 
over-rides as described here. Our conclusions are that this is not 
possible on technical grounds, and probably less good an idea than we 
initially thought in any case. What after all does it mean to say that 
"being typed" is a class? Is there any semantic one can associate with 
that which is *not* immediately over-ridden by the specific typology 
defined locally? (If there is, then maybe that would indicate that some 
class other than "typed" is appropriate).

Our recommendation is that
- the tei.typed class should be removed
- elements bearing a type attribute (and former members of the typed 
class) should be checked to see whether their valLists constitute an 
open or closed list
- for closed list, the datatype will be an alternation of the possible 
values
- for open (or semi) lists, the datatype will be tei.enumerated, i.e. a 
single token not containing whitespace

1.2 Over-riding of attributes

I think we can actually make explicit some of what Syd is describing 
here by using the RelaxNG method of defining facets. So we could for 
example say that a datatype was basically a positive integer, but with 
the added constraint that it has a value less than 43 by a construct such as

      <datatype>
  	<rng:data type="nonNegativeInteger">
  	  <rng:param name="maxInclusive">42</rng:param>
  	</rng:data>
        </datatype>

If (and only if) there is a 1:1 mapping between a TEI datatype and an 
RNHG datatype we could presumably also do

  <datatype>
  	<rng:data>
           <rng:ref name="tei.nonNegativeInteger">
  	  <rng:param name="maxInclusive">42</rng:param>
  	</rng:data>
   </datatype>

but it's not clear what this would mean for TEI datatypes which mapped 
to more than one RNG datatype or an expression

In the general case, however, we think that all datatype definitions 
should be complete and appropriate. In practice, we think the vast 
majority of TEI attributes are already catered for by a very small 
number of datatypes. (of the 500+ attributes listed by Syd, about 400 
are covered by derivations of tei.data.token, tei.data.pointer and 
tei.data.uboolean)


We conclude that

- datatypes should be expressed as <rng:data> expressions
- for commonly occurring cases (see below) we should define a small 
number of macros, which will be named in the way Syd proposes for datatypes
- it should be possible to map all datatypes to W3C basic datatypes, 
possibly with additional constraints


1.3 constraints

The TEI has always proposed additional constraints in remarks, valDesc, 
and descriptive prose. We think we should use the Schematron language to 
express some of these: the primary use case being constraints on 
acceptable GIs as targets for various pointing attributes.

We are not sure where these constraints go in ODD-world, but probably 
not in the <datatype>. We recommend using Schematron for them because 
(a) we know it does the job (b) it is a candidate ISO recommendation.


2. Syd's comments on tokenization

I think anything we can do to reduce the complications consequent on the 
whitespace rules of XML is an unalloyed Good Thing, and propose to be 
even more draconian than Syd suggests. My suggestion is that we allow 
only token, nmtoken, and tei.data.token. While sympathising with the 
motivation for it, I feel that the distinction Syd proposes between 
"tei.data.string" and "tei.data.tokens" will only confuse people. If the 
value of a sequence of tokens is to be interpreted as a single string, 
then it probably shouldn't be an attribute at all. (That said, there's a 
strong case to be made for using "tei.data.tokens" as its name!)


3. Proposed datatypes

These I like:
tei.data.token, tei.data.tokens, tei.data.pointer, tei.data.pointers

Constraining tei.data.token/s further as NMTOKEN/S/NCName/QNAME etc. is 
possible, but I am not sure how many elements would benefit from it


Names I would prefer:
for tei.data.uboolean -> tei.data.truthValue

Names I'm not sure about
tei.data.temporalExpression: how does this map to ISO 8601?
(I assume it doesn't include dateRanges, for example)

tei.data.duration
  We should adopt a consistent policy as to whether quantities like this 
include their units, or whether the units are supplied as a separate 
attribute. I think I prefer the second option, as being more flexible.

tei.data.probability
  Not convinced we need this. There are very few candidates in the EDW90 
table (I find 1, to be exact!)

tei.data.numeric
  I'm now coming round to the view that we also need a tei.data.integer

tei.data.language
  I agree that we need to document exactly what this means somewhere and 
providing a TEI name for it is a good way of doing so.

tei.data.code and tei.data.key

   I have a different understanding of these.

Syd defines tei.data.code as a version of tei.data.pointer, with the 
extra constraint that the target must be in the present document. I am 
not sure what benefit there might be to adding this constraint.

More seriously, however, tei.data.key is defined as its complement, i.e. 
a tei.data.pointer with the constraint that its target must *not* be in 
the current document. I am even less clear what benefit there is in 
that, and find the naming rather confusing, since we are currently using 
"key=" attributes for things which are explicitly not pointers and which 
might even be in the same document (e.g. in TD)

I think we should stick to tei.data.pointer (by all means add 
tei.data.pointer.local, if need be) for the URI case, and keep 
tei.data.code (or key) for use when all we can say of the attribute is 
that its value is a magic token which might get you somewhere in some 
external system, e.g. a database key, but which cannot be validated. It 
might also be useful for cases where an association is made by 
co-reference, as with the ident/key pair in TD, or in whatever variety 
of HORSE we finally decide to back.


That's probably enough to be going on with for the moment...


Lou






More information about the tei-council mailing list