[tei-council] comments on edw90

Thu Sep 1 16:07:01 EDT 2005

Syd Bauman wrote:

>>Sebastian and I have done some testing of the feasibility of doing
>>local over-rides as described here. Our conclusions are that this
>>is not possible on technical grounds,
>>    
>>
>
>This is quite a shame, as such a capability could provide the means
>for significant simplification of the TEI scheme, and much of the work
>put into ED W 90 was predicated on this possibility. Such is life.
>
>
>  
>

Yes, I agree it's a nuisance. The problem is in the way that ODDs are 
"flattened" during the schema generation phase. Sebastianb and I have 
discussed it at length several times, and I have understood it at least 
once. Dont forget of course that the inability to do a local override 
*only* applies to generation of P5 canonical sources -- a user of the 
system is still at liberty eg to replace an unconstrained list  with a 
constrained one in their application-specific odd.

>>Our recommendation is that
>>- the tei.typed class should be removed
>>- elements bearing a type attribute (and former members of the
>>  typed class) should be checked to see whether their valLists
>>  constitute an open or closed list
>>- for closed list, the datatype will be an alternation of the
>>  possible values
>>    
>>
>
>I presume you mean "set of permissible values" or some such, not
>"datatype". (Up until now we've been using the term "datatype" to
>express an abstract grouping of values with similar syntactic
>constraints and similar semantics, presumably implemented by some form
>of indirection. However, there is the potential for lots of confusion,
>because the child of <attDef> which is used to abstractly declare the
>attribute type is called <datatype>. This is quite unfortunate a
>situation, for which I must take the blame -- Sebastian specifically
>asked me if any of the names he chose for these elements were
>problematic before he cemented them into our processing chain, and I
>missed this obvious thorn.)
>
>  
>
Yes, sorry, I was using the term loosely to mean "what it says is legal 
for this attribute in the DTD/schema"

>>- for open (or semi) lists, the datatype will be tei.enumerated,
>>  i.e. a single token not containing whitespace
>>    
>>
>
>I'm confused. What then is the difference between tei.data.enumerated
>and tei.data.token?
>  
>

No difference for one sense of "datatype", some difference for the 
other! The suggestion is to use tei.data.enumerated for cases where 
there is a <valList>, open or closed. But I am happy to go with 
tei.data.token for open/semi lists as well if this is felt to be clearer.

>In some ways this may feel like a big step backwards, as it is
>essentially the system we were using before Paris. But I think this
>works almost as well as the system Council sketched out in Paris. If
>a TEI-schema-designer wants to restrict users to the values provided
>in an open (or semi) list, all she needs to do is change the type of
><valList> from "open" to "closed". (Hmmm... I just realized that this
>is actually currently not true. I think she'd have to copy-and-paste
>the entire <valList> from the tagdoc into her ODD, which isn't nearly
>as nice. Sebastian, could we arrange the process so that just
>specifying, e.g.
>          <elementSpec module="core" ident="note" mode="change">
>            <attList>
>              <attDef ident="place">
>                <valList type="closed"/>
>              </attDef>
>            </attList>
>          </elementSpec>
>would work? It's not ambiguous that the user is trying to remove all
>the possible values, because it makes no sense to have an empty
><valList>, especially if the type is "closed"; of course it also
>isn't valid ala the current TD.)
>  
>

I would like to hear Sebastian's view on this -- he's back at work next 
week -- but I suspect he will say that this looks like rather recondite 
special-casing. After all, chances are that if you want to close the set 
of values you'll probably want to modify the TEI suggestions anyway.

>
>  
>
>>1.2 Over-riding of attributes
>>
>>I think we can actually make explicit some of what Syd is
>>describing here by using the RelaxNG method of defining facets. So
>>we could for example say that a datatype was basically a positive
>>integer, but with the added constraint that it has a value less
>>than 43 by a construct such as
>>    <datatype>
>>      <rng:data type="nonNegativeInteger">
>>        <rng:param name="maxInclusive">42</rng:param>
>>      </rng:data>
>>    </datatype>
>>    
>>
>
>Yes, I think this is doable, but it really only addresses a small
>subset of what I think we set forth in Paris to do. (And note that
>quite a few of the proposed datatypes make use of this.)
>

Could we see a list of these?

> In
>particular, since the restriction is a facet, we can't use this
>feature to have one TEI datatype for numeric representations
>(tei.data.numeric), and say "this attribute is one of
>tei.data.numeric, but must be a non-negative integer".
>
>
>  
>

>>If (and only if) there is a 1:1 mapping between a TEI datatype and
>>[a W3C Schema] datatype we could presumably also do
>>    
>>
>>   <datatype>
>>     <rng:data>
>>       <rng:ref name="tei.nonNegativeInteger">
>>       <rng:param name="maxInclusive">42</rng:param>
>>     </rng:data>
>>   </datatype>
>>    
>>
>
>I tried a quick test, and both nxml-mode and trang objected to such a
>schema.
>
>
>  
>
>>In the general case, however, we think that all datatype definitions
>>should be complete and appropriate. In practice, we think the vast
>>majority of TEI attributes are already catered for by a very small
>>number of datatypes. (of the 500+ attributes listed by Syd, about
>>400 are covered by derivations of tei.data.token, tei.data.pointer
>>and tei.data.uboolean)
>>    
>>
>
>I'm not exactly sure what you mean by "derivations" here, but if it's
>the adding restrictions to a generic datatype that we can't do, it
>means we have a problem for lots of those attributes, no?
>
>
>  
>

Again, I think we need to look at cases here. Suppose we started by just 
using the simplest (most generic) datatypes wherever possible, how many 
attributes would be seriously discommoded?

>>We conclude that
>>
>>- datatypes should be expressed as <rng:data> expressions
>>    
>>
>
>I'm not sure I see why we want to impose such a restriction. E.g., for
>the TEI sex datatype, why would we prefer
>
>     <rng:data>
>       <rng:param name="pattern">^\s*(f|m|u|x)\s*$</rng:param>
>     </rng:data>
>
>to either
>
>    <rng:choice>
>      <rng:value>f</rng:value>
>      <rng:value>m</rng:value>
>      <rng:value>u</rng:value>
>      <rng:value>x</rng:value>
>    </rng:choice>
>
>or far better
>
>    <valList type="closed">
>      <val>f</val> <desc>female</desc>
>      <val>m</val> <desc>male</desc>
>      <val>u</val> <desc>unknown or undetermined</desc>
>      <val>x</val> <desc>not applicable or indeterminable</desc>
>    </valList>
>
>
>  
>

I dont quite see what you're getting at here. We need to decide whether 
sex is an enumeration of possible values (possibly allowing people to 
use their own values), or a hardwired datatype like date values. My 
vote, fwiw, is for the former, but I'm open to discussion.

>>- for commonly occurring cases (see below) we should define a small 
>>  number of macros, which will be named in the way Syd proposes for
>>  datatypes
>>    
>>
>
>I think I may be confused: for commonly occurring cases of what? In the
>previous bullet point did "datatype" mean "the declaration of the
>allowed values of a particular attribute" or "an abstract constraint
>which can be applied to the allowed values of any given attribute"?
>
>
>  
>

That's OK, I am not sure that I remember which I meant myself. I was 
probably hankering after comprehensible names for certain frequently 
ocuring ranges of v alues.

>>- it should be possible to map all datatypes to W3C basic datatypes, 
>>  possibly with additional constraints
>>    
>>
>
>If I understand this correctly, I don't think there's any immediate
>problem with it. (I'm presuming that anything that is expressed as a
>list of values is something that can be mapped ) I think it is
>probably fine as a goal for the current short-term project.
>
>However, as a long-term principle I think it is a very bad idea to tie
>TEI to W3C datatypes. While I am far from a computer scientist who has
>studied these issues, it's clear that W3C datatypes leave a lot to be
>desired. It is quite reasonable to expect that other datatype
>libraries will be published (e.g., OASIS DSDL part 5), or that we
>would want to create a datatype library ourselves, perhaps using DTLL
>if & when it becomes fully worked out.
>  
>

This is simply reiterating a decision we took at the Council meeting in 
Oxford, if I remember aright.

>
>  
>
>>The TEI has always proposed additional constraints in remarks,
>>valDesc, and descriptive prose. We think we should use the
>>Schematron language to express some of these: the primary use case
>>being constraints on acceptable GIs as targets for various pointing
>>attributes.
>>
>>We are not sure where these constraints go in ODD-world, but
>>probably not in the <datatype>. We recommend using Schematron for
>>them because (a) we know it does the job (b) it is a candidate ISO
>>recommendation.
>>    
>>
>
>I agree with all of the above. Although I think perhaps we should
>avoid features of ISO Schematron that are not available in Schematron
>1.x, as processors for the former are hard to come by. (That may
>change by the time this becomes an issue, of course.)
>
>
>  
>
>>I think anything we can do to reduce the complications consequent on
>>the whitespace rules of XML is an unalloyed Good Thing, and propose
>>to be even more draconian than Syd suggests.
>>    
>>
>
>Hear hear!
>
>
>  
>
>>My suggestion is that we allow only token, nmtoken, and
>>tei.data.token.
>>    
>>
>
>I'm not sure *where* this restriction occurs, since in the next
>paragraph you propose we keep "tei.data.tokens".
>  
>
I mean to include the plural forms, sorry for the carelessness.

>- token: do you mean "rng:token" or "xsd:token"?
>  
>

No idea. Please spell out the difference, and express your preference if 
any.

>- xsd:NMTOKEN: interesting; where would you want to use it? I have
>               found no good use for this datatype for any of the 541
>               attributes I looked at. In every case that NMTOKEN is
>               currently used, I think we should be using xsd:Name (or
>               perhaps xsd:NCName), except for the 1 oddball case of
>               unit= on <timeline>, which should be an enumerated list
>               or folded into interval=.
>
>  
>

I'm happy to stick with NCName

>  
>
>>While sympathising with the motivation for it, I feel that the
>>distinction Syd proposes between "tei.data.string" and
>>"tei.data.tokens" will only confuse people. If the value of a
>>sequence of tokens is to be interpreted as a single string, then it
>>probably shouldn't be an attribute at all.
>>    
>>
>
>Really good point. As I said, I'm very back-and-forth on this issue,
>and Lou's argument tips me back. The only attribute that I can think
>of that does not fit the "tei.data.tokens" semantic and should remain
>an attribute is the value= of <metSym>. And since it's a single
>attribute, IMHO it doesn't have to be declared as a "datatype" (i.e.,
>with indirection), and even if Council thinks it does, we could just
>use tei.data.tokens and live with it. (Remember, the validation would
>be exactly the same, it's only that the prose explanation might not
>fit perfectly well. Does anyone use this attribute, anyway?)
>
>
>  
>
I think it should be tei.data.token -- it can't have white space. It 
should be a single character in fact for any sensible kind of notation.

>>These I like:
>>tei.data.token, tei.data.tokens, tei.data.pointer, tei.data.pointers
>>    
>>
>
>Me too.
>
>
>  
>
>>Constraining tei.data.token/s further as NMTOKEN/S/NCName/QNAME etc. is 
>>possible, but I am not sure how many elements would benefit from it
>>    
>>
>
>Between half a dozen and a dozen attributes, I suspect. Most, if not
>all, of which should be xsd:Name. 
>
>  
>
Then let's go with that.

>  
>
>>Names I would prefer:
>>for tei.data.uboolean -> tei.data.truthValue
>>    
>>
>
>I *like* it. Unless there are rousing objections, I'll plan to change
>this in EDW90 and the corresponding database later this week.
>
>
>  
>
Done?

>>Names I'm not sure about
>>tei.data.temporalExpression: how does this map to ISO 8601?
>>(I assume it doesn't include dateRanges, for example)
>>    
>>
>
>Hey, you thought of this name! See separate thread James started for
>ISO 8601 alignment discussion.
>
>
>  
>

It's not so much the name that worries me as the significance! We need 
some straw person proposals in view of the points James raises.

>>tei.data.duration
>>  We should adopt a consistent policy as to whether quantities like this 
>>include their units, or whether the units are supplied as a separate 
>>attribute. I think I prefer the second option, as being more
>>flexible.
>>    
>>
>
>If we're going to use W3C datatypes, then at least in those cases
>where W3C puts the unit in with the quantity (xsd:duration explicitly,
>and the various date and time formats implicitly) we'd have to do the
>same. 
>
>
>  
>

Yes., but only if we chose to. We could say we will NEVER have 
attributes which combine in their value quantity+unit, or we could say 
we have some which do and some which dont, or we could say all 
quantities have the potential to include units. Again, a list of 
specific cases might help sharpen discussion and reach a decision.

>>tei.data.probability
>>  Not convinced we need this. There are very few candidates in the
>>EDW90 table (I find 1, to be exact!)
>>    
>>
>
>My fault, table had typos. This one is for expressing a range from 0
>to 1 (or 0% to 100% or none to all). Currently only 3 attributes make
>use of it (scope= of <handNote>, usage= of <language>, weights= of
><alt>). Since (IIRC) Council agreed in Paris that whenever 2 or more
>attributes share the same constraint, a datatype should be abstracted
>out, I did so. (There was even some discussion that there should be a
>datatype even if only 1 attribute has a particular constraint, IIRC.)
>
>
>  
>
Well 3 is almost enough to warrant having it.  Presumably if we define 
it as a macro, the user who only wants ever to express probability by 
means of a percentage can say so by redefining the macro.

>>tei.data.numeric
>>  I'm now coming round to the view that we also need a
>>  tei.data.integer
>>    
>>
>
>I'm wondering if the concept of "positive integer or 0" is simple
>enough that we don't need to bother creating a TEI datatype for it,
>and could just use xsd:nonNegativeInteger directly when needed.
>
>
>  
>

Fine by me.

>>tei.data.language
>>  I agree that we need to document exactly what this means somewhere and 
>>providing a TEI name for it is a good way of doing so.
>>    
>>
>
>JC> Just to make sure I'm understanding this...would that datatype
>JC> then be used for validation of @xml:lang's format? It seems
>JC> strange to me to be using a tei.datatype to validate and/or
>JC> document use of a non-TEI element/attribute.
>
>I agree with Lou. There are 3 reasons to make a datatype
>(tei.data.language) that maps directly to xsd:language.
>
>* It occurs more than twice (I'm not sure this is a compelling
>  argument on its own): langKey= & otherLangs= of <textLang>, ident=
>  of <language>, mainLang= of <hand>, and xml:lang= of everything.
>
>* Although it maps directly to xsd:language, the explanation of
>  xsd:language is both hard to find and hard to read & understand.
>  (Quite unlike the explanation of xsd:nonNegativeInteger, which is
>  easy to find, and not all that hard to read & understand -- besides,
>  it's obvious enough that almost no one bothers.)
>
>* As Christian pointed out, it would be nice to have someplace in the
>  reference documentation to plunk the explanation of how xml:lang= is
>  related to ident= of <language>.
>
>
>I will post a reply on tei.data.code and tei.data.key issue separately.
>
>_______________________________________________
>tei-council mailing list
>tei-council at lists.village.Virginia.EDU
>http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>
>
>  
>