[tei-council] datatype issues (part 1) continued,,,

Lou Burnard lou.burnard at computing-services.oxford.ac.uk
Tue Sep 20 13:52:33 EDT 2005


  Syd Bauman wrote:

>>The only places [tei.data.regexp] is used at present is for
>>attributes like metDecl which have as their value a string of
>>gobbledegook in some special syntax defined by the TEI.
>>    
>>
>
>Huh?? Even in P3 the pattern= attribute of <metDecl> was defined as a
>regular expression. In P3 and P4 the regular expression language used
>was that created by the TEI for its extended pointer syntax. In P5 we
>have made the move to using the W3C regular expression language
>instead. I think using a regexp here is and was a good idea,
>switching to W3C regexps is a good move, and this attribute should
>most certainly remain as is.
>
>
>  
>
OK, I am perfectly happy to restrict the kind of gobbledegook permitted 
for this datatype to a W3C-defined regular expression. That ought to fit 
with most kinds of gobbledegook we can come up with!

Is everyone happy with "regexp" as the name for it? how about 
"tei.data.pattern" ?


....


>  
>
>>I hesitated a long time over NMTOKEN and NCName. The former allows
>>hyphens but not underscore; the latter allows underscore but not
>>hyphen.
>>    
>>
>
>This is simply untrue. xsd:NMTOKEN maps to an XML NMTOKEN; xsd:NCName
>maps to an XML Namespaces NCName, which maps to an XML name except
>that colon is not allowed. Both allow hyphen, both allow underscore.
>xsd:NCName does not permit the string to *start* with a digit or
>punctuation character other than underscore.
>
>  
>
....

>  
>
>>I am open to persuasion that both tei.data.key and tei.data.ident
>>should have the same mapping; less to defining something which is
>>not either NMTOKEN or NCName.
>>    
>>
>
>In P4 most of these things were CDATA. I think making them a single
>token (normal sense of the word) is a really good idea -- if we could
>do so in P4 we should (we can't). I even think forbidding some kinds
>of non-whitespace characters would be a really good idea (e.g.,
>control characters, PUA characters, etc. See [1] for list). However,
>I'm not really sure of the advantage of telling people who would like
>to have things like "damaged/deliberate" and "damaged/accidental" as
>their values for reason= of <gap> that they cannot, just because W3C
>says that names of elements etc. can't have a slash. 
>
>  
>

This is really a tricky problem. It's kind of like the discussion about 
temporal expressions -- because the W3C datatypes don't exactly fit in 
with feelings about what a "name/ident/term" ought to be, it's tempting 
to invent our own definition as an alternative.

We can probably distinguish cases where the name *must* conform to W3C 
rules about identifiers -- if it's the name of an element it obviously 
has to follow constraint on what an XML name can be. If however it's a 
name we've made up this is less obviously the case, and the only 
constraint we'd probably want to put on it is that it shouldnt contain 
white space (else we can't have tei.data.names). But do we really want 
two kinds of tei.data.name?


>  
>
>>So in order of ascending tightness of constraint, we have
>>
>>tei.data.enumerated : the value is defined by a valList (type=closed)
>>tei.data.code : the value is defined by a pointer to something which 
>>                must exist
>>tei.data.key : the value is defined by an enumeration elsewhere e.g. a 
>>               database key
>>tei.data.ident : the value is a name or identifier of some kind but not 
>>                 necessarily enumerated or enumeratable
>>    
>>
>
>Where in your scheme to <valList>s of type= "open" and "semi" fit?
>I.e., are they still assigned the datatype tei.data.enumerated?
>  
>

My proposal is yes, they still have tei.data.enumerated.

>[1] A first cut at Unicode categories that should and should not be
>    permitted in tei.data.[token,ident,name,term]. 
>

Useful table. If we decide to invent our own kind of name, then it will 
be eminently sensible to base it on Unicode character categories.





More information about the tei-council mailing list