[tei-council] datatype issues (part 1)
Lou Burnard
lou.burnard at computing-services.oxford.ac.uk
Sat Sep 10 14:12:04 EDT 2005
Working through the datatype definitions, I have so far identified the
following issues. Comments and corrections on these would be much
appreciated, especially if it comes in the next few days.
1. <specDesc ident="tei.data.xxxx"/> will extract the <desc> from the
referenced macroSpec. It would be nice also to be able to extract the
<content> or <stringVal> part for display.
2. The definition of <macroSpec> allows it to contain multiple <content>
or <stringVal> children. Why?
3. tei.data.certainty is defined as either an enumeration ("high",
"low", "medium", "unknown") or a reference to tei.data.probability,
which is a real value between 0,1 or an integer between 0,100. I wonder
if it wouldn't be less confusing to restrict the values for
tei.data.certainty to the literal only, since any attribute for which we
to allow either kind of value can do so by giving an alternation of
datatypes (I think)
4. tei.datatype.language is isomorphic with xsd:language: do we need it?
5. tei.data.regexp is used only in two rather obscure places: do we
need it? If we do, is the reference to appx. F of the xsd spec really
the canonical place to define what sort of regexp we mean ?
6. tei.data.sex defines four alphabetic values (m f x u) which
correspond to ISO 5218 numeric codes 1 2 0 and 9. Should we not rather
use the ISO codes?
7. Furthermore, where (as with sex) the datatype is a closed
enumeration, it makes sense to represent this in the macrospec as a
<rng:choice> containing several <rng:value>s. But there is currently no
scope to provide a gloss for what each value means, since <valList> is
not allowed within <macrospec>.
8. In earlier discussion I had proposed that tei.data.token should
differ from rng:token in that the former should not permit included
whitespace. Thinking about this again, I think I might have been wrong:
it might be less confusing to use <rng:token> directly wherever we want
a "tei.data.token", thus allowing people to use XML whitespace
normalization in attribute values in the same way as they can in
content. If we do define tei.data.token as proposed (i.e. as an
xsd:token with a facet saying that whitespace is not allowed), we should
really give it a different name, or expect to spend the rest of eternity
explaining why our usage differs from W3C and RNG's (ok, we were there
first, but still). Same applies, mutatis mutandis, to tei.data.tokens:
it might indeed be simpler to define that as xsd:token rather than as a
list of our weird tei.data.tokens. On the other hand....
That's something to be going on with. I'm going to have a cup of tea AND
A BISCUIT now.
Lou
More information about the tei-council
mailing list