on spec grp 4, coded values (was "Re: [tei-council] datatypes")

Sat Oct 1 01:29:12 EDT 2005

Lou Burnard wrote:

> >   - People want to know where things point. ...
> True, but irrelevant.

I think considering what users want "irrelevant" is a bit rough.

> We can specify the target element type, and maybe we should do so
> in some cases, but I don't see any advantage at all to trying to
> specify "where" the element instance is. It's all out there in
> cyberspace, man!

The advantage, again, is just to make life easier on users. They
don't want to figure out where is a good place to put something, they
just want to know where it goes. (Although it's nicer if it's not
*required* to be there.)

> > * tei.data.enumerated: the change permits whitespace in the values.
> I am open minded (or open-minded) on this question. I finally came
> down on the side of allowing *normalised* whitespace within the
> value because ...

This has a *major* consistency problem, though. If you have a closed
value list, the values cannot have spaces (because they must match
ident= of <valItem>). If you have an open list, they can (because the
only constraint is they must match the pattern "token").

For consistency, this *must* be declared to match the declaration of
ident= of <valItem>, which is tei.data.ident.

> > * tei.data.key: the change is from 'xsd:token' to 'rng:string'. I
> >   think it should be left as 'xsd:token' just for consistency.
> >   There is no difference in validation at all (since there are no
> >   enumerated values).
> There is all the difference in the world. key is *explicitly*
> defined as something that is to be validated externally and over
> which the TEI should therefore place no (additional) syntactic
> constraints. We cannot possibly second guess what syntactic
> constraints every database system in the world might impose, ergo
> our only choice is to not impose any at all.

And what syntactic constraints do you believe xsd:token would impose?
In RelaxNG, AFAIK, the only constraints are that all characters be
from Unicode. PUA, combining, whatever. (XML itself additionally
imposes constraints, e.g. that unescaped "<" and "&" are not
permitted, but even that's not imposed by the datatype.) If you're
worried about the constraints in W3C Schema land, who knows what
rng:string will do? Probably the right thing, I suppose, but still,
if what you really want is preserved whitespace, why not be explicit
and use xsd:string?

> > * tei.data.name: the change is from any string that does not
> >   contain whitespace (but may include, e.g., punctuation marks,
> >   currency symbols, math symbols, etc.) to an XML NMTOKEN. I am
> >   not sure why we'd want to exclude the non-letter, non-digit
> >   characters (other than .-_:, which are permitted in NMTOKEN).
> >   Why shouldn't the Tibetan Paluta character be allowed?
> I assume the latter is not a serious suggestion. I thought the
> reason was fairly obviously to do with ease of (XML) processing.

It was a 100% completely serious suggestion, and if I may remind you
Lou, it was *your* suggestion. I'm just sticking with it after you
have abandoned it.

What part of XML processing do you think is so much easier to do with
an attribute that matches xsd:NMTOKEN versus one that permits other
punctuation characters, etc.?

> We did discuss this a a bit on the list and nobody came up with a
> better suggestion. I think using "token" would really be asking for
> confusion -- precisely because we do mean something different from
> a datatype which the W3C calls "token" -- whether it's in caps or
> not. I also considered "ident" and "label". The key thing about it,
> surely though, is that it is a way of naming something, even if
> it's not a proper name?

You lost me at the bakery. How are the values of
  age= of <person> 
  from= and to= of <locus>
  value= of <metSym>
  where= of <move>
  scope= of <handNote>
  extent= and reason= of <gap>, <supplied>, and <unclear>
  loc= of <app>
naming something? (It does apply in some circumstances, of course,
e.g. name= of <equiv> or lang= of <code>.) While I agree with you that
neither "ident" nor "label" will do, I would much prefer to live with
the mild confusion of "token" than the misleading lie of "name". But
the suggestion "word" has been put forth, and I think that would be
fine.