[tei-council] Datatype : roundup

Lou Burnard lou.burnard at computing-services.oxford.ac.uk
Thu Sep 22 09:35:06 EDT 2005


Syd Bauman wrote:

>>>As for tei.data.name:
>>>* the name is really bad; I'd prefer to live with the confusion of
>>>  tei.data.token. (Remember, the string "xsd:token" will only appear
>>>  a few times in all of P5; in the declarations of at most half a
>>>  dozen datatypes.)
>>>      
>>>
>>That's irrelevant to the issue here: we want people to use the TEI
>>name and not be confused when they talk to others about it. No-one
>>has yet proposed a better name than name.
>>    
>>
>
>I disagree; I think *you* proposed a better name than 'name', when
>you proposed 'token'. The latter is mildly confusing due to W3C's
>misuse; the former is outright misleading. The attributes of this
>type (type=, subtype=; to= & from= of <locus>; scope= of <handNote>,
>etc.) don't really have names as their values in any sense of the
>word.
>  
>

Which attributes get which datatype is a moveable feast, of course. But 
I stand by the assertion that no-one has yet proposed a better name for 
this one. Token is not acceptable because of the confusion already noted.

>  
>
>>>* I think we should probably be more permissive than NMTOKEN.
>>>      
>>>
>>We can tweak the definition if you like, but I don't understand why
>>you would want to.
>>    
>>
>
>Because I don't think we should be limiting users to letters, digits,
>dot, hyphen, underscore, and colon for the values of these
>attributes, e.g. cref=, extent=, reason=, where= of <move>, real=,
>met=, rhyme=. Although for some it does make sense (name= of <equiv>,
>included= of <witness>); perhaps these should be moved to
>tei.data.ident?
>
>  
>

My proposal is to offer a range of choices: name, ident, enumerated. I 
have yet to see any argument for a fourth category, so that's progress. 
I still think that NMTOKEN is a good choice as representation for the 
first of these: yes, we are constraining people not to include weird 
punctuation characters in  their names. Why is that a bad idea? We are 
also getting some sensible validation for free!


>
>  
>
>>>I really don't see why not permit percentages. Users had the
>>>choice in P4, when we couldn't even validate it. Now we can.
>>>
>>>      
>>>
>>simplicity, clarity, precision...
>>    
>>
>
>While I suppose it is simpler for software writers to have 1 system
>rather than 2, there is nothing more clear nor more precise about
>"0.824" than "82.4%". While I have some sympathy with the idea of
>reducing choices for users, this is one place where I think users
>like the choice.
>  
>

I see no evidence for this asserttion at all. Since both representations 
mean exactly the same thing, and are exactly inter-convertible, I think 
it just looks silly not to come down on one side of the fence or the 
other. 

>
>  
>
>>I think this is a mistake, actually. Decimal was a better choice,
>>since it can represent any number, real or integer, no matter how
>>big. It means you can;t use scientific notation, which someone
>>folks on TEI-L suddenly woke up and asked for. 
>>    
>>
>
>So you think we have more users who really want to represent numbers
>with greater than 16 (decimal) digits of precision than users who
>want to represent numbers in scientific notation? As I had hoped my
>example would demonstrate, that much precision is not something we
>humans generally deal with.
>
>  
>

No, I think that if there are 10 people in the world who want to use a 
numeric datatype, 8 of them might  want to use what one might call 
unscientific notation, and 9.9 of them will want to represent values 
representable to an accuracy less than 8 decimal digits!



>>I now think maybe we should have a different datatype for
>>[scientific notation].
>>    
>>
>
>If we split scientific notation out to a different datatype, won't we
>need a disjunction of the two datatypes in most if not all instances
>anyway? And the disjunction (whether of two separate TEI datatypes or
>of two xsd: datatypes inside a TEI datatype) might be a bit confusing
>for implementers. But it shouldn't be impossible to deal with (could
>always just assume that if it's not in scientific notation, it is an
>xsd:decimal).
>  
>

I dont think that sort of "assumption" is something an XML validator can 
do, is it?  But we could certainly say the numeric datatype maps onto 
xsd:decimal|xsd:float if that would make you happy. 

>
>  
>
>>Credit card numbers, by the way, are tei.data.ident, clearly.
>>    
>>
>
>I thought you wanted tei.data.ident to be xsd:Name -- credit card
>numbers start with a digit, and thus are invalid xsd:Names. They'd be
>fine as tei.data.[name|token], but once again the name "name" doesn't
>make sense. (Alternative one could insist that they start with a "V"
>or "M" or whatever :-)
>

Ah yes, good point.



More information about the tei-council mailing list