[tei-council] datatype issues (part 1) continued,,,
Lou Burnard
lou.burnard at computing-services.oxford.ac.uk
Tue Sep 13 05:49:19 EDT 2005
Syd Bauman wrote:
>>5. tei.data.regexp is used only in two rather obscure places: do we
>>need it?
>
>
> I don't think we need it, although again, it may be a useful place to
> put an explanation.
>
I propose to change this to tei.data.formula which maps to xsd:token
The only places it is used at present is for attributes like metDecl
which have as their value a string of gobbledegook in some special
syntax defined by the TEI. This seems a useful category of information.
Calling it regexp suggests to me that it is a (Unix style) regular
expression, which it isn't necessarily -- though obviously one could
define a reg exp that matched any such formula!
Whether or not an attribute the value of which *was* a Unix regular
expression should use this datatype is a bridge I would prefer to cross
when we actually have such an attribute. I would have thought it would
be much better as content anyway.
>
>>8. In earlier discussion I had proposed that tei.data.token should
>>differ from rng:token in that the former should not permit included
>>whitespace. Thinking about this again, I think I might have been
>>wrong: it might be less confusing to use <rng:token> directly
>>wherever we want a "tei.data.token", thus allowing people to use
>>XML whitespace normalization in attribute values in the same way as
>>they can in content.
>
>
> There is no XML whitespace normalization of any content in TEI, yet,
> is there? When we're done straightening out the classes and stuff,
> there may be one or two obscure places where it is useful.
>
>
>>If we do define tei.data.token as proposed (i.e. as an xsd:token
>>with a facet saying that whitespace is not allowed), we should
>>really give it a different name, or expect to spend the rest of
>>eternity explaining why our usage differs from W3C and RNG's (ok,
>>we were there first, but still).
>
>
> I think a "no internal whitespace" restriction is a really good thing
> to have[1]. But I think you are absolutely right, we should change
> the name. It's not our fault that W3C and RelaxNG deliberately use
> the term "token" in a manner that is counter-intuitive to end users
> (although perhaps makes sense to those writing validators).
> Nonetheless, if we use the same term in the more normal way, we are
> dooming users to even more confusion. Problem is, it's hard to come
> up with an alternative. How about tei.data.term?
>
Not bad, but we do use "term" rather a lot in slightly different ways
elsewhere in the Guidelines, and, crucially, in my book a "term" is
taken from a human language not an artificial one. So my proposal is now
1. rename tei.data.token as tei.data.ident, mapping it to NMTOKEN.
2. tei.data.tokens is a list of tei.data.ident
3. tei.data.enumerated is a ref to a tei.data.ident
4. tei.data.key is mapped to NCName
I hesitated a long time over NMTOKEN and NCName. The former allows
hyphens but not underscore; the latter allows underscore but not hyphen.
Syd's proposed pattern allows either and also comma. I am open to
persuasion that both tei.data.key and tei.data.ident should have the
same mapping; less to defining something which is not either NMTOKEN or
NCName.
The distinction between tei.data.key and tei.data.ident is that the
latter need not actually map onto anything anywhere, it's just a name.
So in order of ascending tightness of constraint, we have
tei.data.enumerated : the value is defined by a valList (type=closed)
tei.data.code : the value is defined by a pointer to something which
must exist
tei.data.key : the value is defined by an enumeration elsewhere e.g. a
database key
tei.data.ident : the value is a name or identifier of some kind but not
necessarily enumerated or enumeratable
Does that make sense?
More information about the tei-council
mailing list