[tei-council] comments on edw90
Lou Burnard
lou.burnard at computing-services.oxford.ac.uk
Mon Aug 8 11:16:05 EDT 2005
Here are some (rather belated) comments on some substantive issues
raised by Syd's paper EDW90.
1. Syd's "Challenges"
1.1 The TYPE attribute.
Sebastian and I have done some testing of the feasibility of doing local
over-rides as described here. Our conclusions are that this is not
possible on technical grounds, and probably less good an idea than we
initially thought in any case. What after all does it mean to say that
"being typed" is a class? Is there any semantic one can associate with
that which is *not* immediately over-ridden by the specific typology
defined locally? (If there is, then maybe that would indicate that some
class other than "typed" is appropriate).
Our recommendation is that
- the tei.typed class should be removed
- elements bearing a type attribute (and former members of the typed
class) should be checked to see whether their valLists constitute an
open or closed list
- for closed list, the datatype will be an alternation of the possible
values
- for open (or semi) lists, the datatype will be tei.enumerated, i.e. a
single token not containing whitespace
1.2 Over-riding of attributes
I think we can actually make explicit some of what Syd is describing
here by using the RelaxNG method of defining facets. So we could for
example say that a datatype was basically a positive integer, but with
the added constraint that it has a value less than 43 by a construct such as
<datatype>
<rng:data type="nonNegativeInteger">
<rng:param name="maxInclusive">42</rng:param>
</rng:data>
</datatype>
If (and only if) there is a 1:1 mapping between a TEI datatype and an
RNHG datatype we could presumably also do
<datatype>
<rng:data>
<rng:ref name="tei.nonNegativeInteger">
<rng:param name="maxInclusive">42</rng:param>
</rng:data>
</datatype>
but it's not clear what this would mean for TEI datatypes which mapped
to more than one RNG datatype or an expression
In the general case, however, we think that all datatype definitions
should be complete and appropriate. In practice, we think the vast
majority of TEI attributes are already catered for by a very small
number of datatypes. (of the 500+ attributes listed by Syd, about 400
are covered by derivations of tei.data.token, tei.data.pointer and
tei.data.uboolean)
We conclude that
- datatypes should be expressed as <rng:data> expressions
- for commonly occurring cases (see below) we should define a small
number of macros, which will be named in the way Syd proposes for datatypes
- it should be possible to map all datatypes to W3C basic datatypes,
possibly with additional constraints
1.3 constraints
The TEI has always proposed additional constraints in remarks, valDesc,
and descriptive prose. We think we should use the Schematron language to
express some of these: the primary use case being constraints on
acceptable GIs as targets for various pointing attributes.
We are not sure where these constraints go in ODD-world, but probably
not in the <datatype>. We recommend using Schematron for them because
(a) we know it does the job (b) it is a candidate ISO recommendation.
2. Syd's comments on tokenization
I think anything we can do to reduce the complications consequent on the
whitespace rules of XML is an unalloyed Good Thing, and propose to be
even more draconian than Syd suggests. My suggestion is that we allow
only token, nmtoken, and tei.data.token. While sympathising with the
motivation for it, I feel that the distinction Syd proposes between
"tei.data.string" and "tei.data.tokens" will only confuse people. If the
value of a sequence of tokens is to be interpreted as a single string,
then it probably shouldn't be an attribute at all. (That said, there's a
strong case to be made for using "tei.data.tokens" as its name!)
3. Proposed datatypes
These I like:
tei.data.token, tei.data.tokens, tei.data.pointer, tei.data.pointers
Constraining tei.data.token/s further as NMTOKEN/S/NCName/QNAME etc. is
possible, but I am not sure how many elements would benefit from it
Names I would prefer:
for tei.data.uboolean -> tei.data.truthValue
Names I'm not sure about
tei.data.temporalExpression: how does this map to ISO 8601?
(I assume it doesn't include dateRanges, for example)
tei.data.duration
We should adopt a consistent policy as to whether quantities like this
include their units, or whether the units are supplied as a separate
attribute. I think I prefer the second option, as being more flexible.
tei.data.probability
Not convinced we need this. There are very few candidates in the EDW90
table (I find 1, to be exact!)
tei.data.numeric
I'm now coming round to the view that we also need a tei.data.integer
tei.data.language
I agree that we need to document exactly what this means somewhere and
providing a TEI name for it is a good way of doing so.
tei.data.code and tei.data.key
I have a different understanding of these.
Syd defines tei.data.code as a version of tei.data.pointer, with the
extra constraint that the target must be in the present document. I am
not sure what benefit there might be to adding this constraint.
More seriously, however, tei.data.key is defined as its complement, i.e.
a tei.data.pointer with the constraint that its target must *not* be in
the current document. I am even less clear what benefit there is in
that, and find the naming rather confusing, since we are currently using
"key=" attributes for things which are explicitly not pointers and which
might even be in the same document (e.g. in TD)
I think we should stick to tei.data.pointer (by all means add
tei.data.pointer.local, if need be) for the URI case, and keep
tei.data.code (or key) for use when all we can say of the attribute is
that its value is a magic token which might get you somewhere in some
external system, e.g. a database key, but which cannot be validated. It
might also be useful for cases where an association is made by
co-reference, as with the ident/key pair in TD, or in whatever variety
of HORSE we finally decide to back.
That's probably enough to be going on with for the moment...
Lou
More information about the tei-council
mailing list