[tei-council] datatype issues (part 1)

Christian Wittern wittern at kanji.zinbun.kyoto-u.ac.jp
Sat Sep 10 22:41:15 EDT 2005


Lou Burnard <lou.burnard at computing-services.oxford.ac.uk> writes:

> Working through the datatype definitions, I have so far identified the
> following issues. Comments and corrections on these would be much
> appreciated, especially if it comes in the next few days.
>
> 1. <specDesc ident="tei.data.xxxx"/> will extract the <desc> from the
>    referenced macroSpec. It would be nice also to be able to extract
>    the <content> or <stringVal> part for display.

In that case maybe <getTarget type="content|stringVal|desc" ident=""/>
would do the trick.

> 2. The definition of <macroSpec> allows it to contain multiple
>    <content> or <stringVal> children. Why?
>
> 3. tei.data.certainty is defined as either an enumeration ("high",
>    "low", "medium", "unknown")  or a reference to
>    tei.data.probability, which is a real value between 0,1 or an
>    integer between 0,100. I wonder if it wouldn't be less confusing to
>    restrict the values for tei.data.certainty to the literal only,
>    since any attribute for which we to allow either kind of value can
>    do so by giving an alternation of datatypes (I think)

This sounds reasonable to me.

>  4. tei.datatype.language is isomorphic with xsd:language: do we
>  need it?

I have asked this before.  We have to think how this fits in with
@xml:lang and <language>. 

> 5. tei.data.regexp is used only  in two rather obscure places: do we
>    need it? If we do, is the reference to appx. F of the xsd spec
>    really the canonical place to define what sort of regexp we mean
>    ?
>
> 6. tei.data.sex defines four alphabetic values (m f x u) which
>    correspond to ISO 5218 numeric codes 1 2 0 and 9. Should we not
>    rather use the ISO codes?

Hmm.  This raises the general question of how far we want to go in
pulling in the relevant standards and keeping our descriptions in
synch. Also, I would rather have some layer of human-readability here,
which could then under the hood be mapped to the relevant codes. mfxu
is just so much more intuitive than 1209.  The same issue came also up
with durations, P35Y vs. 35yrs.  At some point, we planned to have the
tei.* stuff provide this layer -- is this what Syd is the underlying
assumption that turned out to be not workable?  In that case, we have
to rethink the whole strategy, I am afraid. 

> 7. Furthermore, where (as with sex) the datatype is a closed
>    enumeration, it makes sense to represent this in the macrospec as a
>    <rng:choice> containing several <rng:value>s. But there is
>    currently no scope to  provide a gloss for what each value means,
>    since <valList> is not allowed within <macrospec>.

Maybe that should be changed then?  

>
> 8. In earlier discussion I had proposed that tei.data.token should
>    differ from rng:token in that the former should not permit included
>    whitespace.  Thinking about this again, I think I might have been
>    wrong: it might be less confusing to use <rng:token> directly
>    wherever we want a "tei.data.token", thus allowing people to use
>    XML whitespace normalization in attribute values in the same way as
>    they can in content. If we do define tei.data.token as proposed
>    (i.e. as an xsd:token with a facet saying that whitespace is not
>    allowed), we should really give it a different name, or expect to
>    spend the rest of eternity explaining why our usage differs from
>    W3C and RNG's (ok, we were there first, but still). Same applies,
>    mutatis mutandis, to tei.data.tokens: it might indeed be simpler to
>    define that as xsd:token rather than as a list of our weird
>    tei.data.tokens. On the other hand....

Yeah.  a token is a token is a token.  We won't be able to change
that.

All the best,

Christian


-- 

 Christian Wittern 
 Institute for Research in Humanities, Kyoto University
 47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN



More information about the tei-council mailing list