[tei-council] comments on edw90
Lou Burnard
lou.burnard at computing-services.oxford.ac.uk
Thu Sep 1 16:07:01 EDT 2005
Syd Bauman wrote:
>>Sebastian and I have done some testing of the feasibility of doing
>>local over-rides as described here. Our conclusions are that this
>>is not possible on technical grounds,
>>
>>
>
>This is quite a shame, as such a capability could provide the means
>for significant simplification of the TEI scheme, and much of the work
>put into ED W 90 was predicated on this possibility. Such is life.
>
>
>
>
Yes, I agree it's a nuisance. The problem is in the way that ODDs are
"flattened" during the schema generation phase. Sebastianb and I have
discussed it at length several times, and I have understood it at least
once. Dont forget of course that the inability to do a local override
*only* applies to generation of P5 canonical sources -- a user of the
system is still at liberty eg to replace an unconstrained list with a
constrained one in their application-specific odd.
>>Our recommendation is that
>>- the tei.typed class should be removed
>>- elements bearing a type attribute (and former members of the
>> typed class) should be checked to see whether their valLists
>> constitute an open or closed list
>>- for closed list, the datatype will be an alternation of the
>> possible values
>>
>>
>
>I presume you mean "set of permissible values" or some such, not
>"datatype". (Up until now we've been using the term "datatype" to
>express an abstract grouping of values with similar syntactic
>constraints and similar semantics, presumably implemented by some form
>of indirection. However, there is the potential for lots of confusion,
>because the child of <attDef> which is used to abstractly declare the
>attribute type is called <datatype>. This is quite unfortunate a
>situation, for which I must take the blame -- Sebastian specifically
>asked me if any of the names he chose for these elements were
>problematic before he cemented them into our processing chain, and I
>missed this obvious thorn.)
>
>
>
Yes, sorry, I was using the term loosely to mean "what it says is legal
for this attribute in the DTD/schema"
>>- for open (or semi) lists, the datatype will be tei.enumerated,
>> i.e. a single token not containing whitespace
>>
>>
>
>I'm confused. What then is the difference between tei.data.enumerated
>and tei.data.token?
>
>
No difference for one sense of "datatype", some difference for the
other! The suggestion is to use tei.data.enumerated for cases where
there is a <valList>, open or closed. But I am happy to go with
tei.data.token for open/semi lists as well if this is felt to be clearer.
>In some ways this may feel like a big step backwards, as it is
>essentially the system we were using before Paris. But I think this
>works almost as well as the system Council sketched out in Paris. If
>a TEI-schema-designer wants to restrict users to the values provided
>in an open (or semi) list, all she needs to do is change the type of
><valList> from "open" to "closed". (Hmmm... I just realized that this
>is actually currently not true. I think she'd have to copy-and-paste
>the entire <valList> from the tagdoc into her ODD, which isn't nearly
>as nice. Sebastian, could we arrange the process so that just
>specifying, e.g.
> <elementSpec module="core" ident="note" mode="change">
> <attList>
> <attDef ident="place">
> <valList type="closed"/>
> </attDef>
> </attList>
> </elementSpec>
>would work? It's not ambiguous that the user is trying to remove all
>the possible values, because it makes no sense to have an empty
><valList>, especially if the type is "closed"; of course it also
>isn't valid ala the current TD.)
>
>
I would like to hear Sebastian's view on this -- he's back at work next
week -- but I suspect he will say that this looks like rather recondite
special-casing. After all, chances are that if you want to close the set
of values you'll probably want to modify the TEI suggestions anyway.
>
>
>
>>1.2 Over-riding of attributes
>>
>>I think we can actually make explicit some of what Syd is
>>describing here by using the RelaxNG method of defining facets. So
>>we could for example say that a datatype was basically a positive
>>integer, but with the added constraint that it has a value less
>>than 43 by a construct such as
>> <datatype>
>> <rng:data type="nonNegativeInteger">
>> <rng:param name="maxInclusive">42</rng:param>
>> </rng:data>
>> </datatype>
>>
>>
>
>Yes, I think this is doable, but it really only addresses a small
>subset of what I think we set forth in Paris to do. (And note that
>quite a few of the proposed datatypes make use of this.)
>
Could we see a list of these?
> In
>particular, since the restriction is a facet, we can't use this
>feature to have one TEI datatype for numeric representations
>(tei.data.numeric), and say "this attribute is one of
>tei.data.numeric, but must be a non-negative integer".
>
>
>
>
>>If (and only if) there is a 1:1 mapping between a TEI datatype and
>>[a W3C Schema] datatype we could presumably also do
>>
>>
>> <datatype>
>> <rng:data>
>> <rng:ref name="tei.nonNegativeInteger">
>> <rng:param name="maxInclusive">42</rng:param>
>> </rng:data>
>> </datatype>
>>
>>
>
>I tried a quick test, and both nxml-mode and trang objected to such a
>schema.
>
>
>
>
>>In the general case, however, we think that all datatype definitions
>>should be complete and appropriate. In practice, we think the vast
>>majority of TEI attributes are already catered for by a very small
>>number of datatypes. (of the 500+ attributes listed by Syd, about
>>400 are covered by derivations of tei.data.token, tei.data.pointer
>>and tei.data.uboolean)
>>
>>
>
>I'm not exactly sure what you mean by "derivations" here, but if it's
>the adding restrictions to a generic datatype that we can't do, it
>means we have a problem for lots of those attributes, no?
>
>
>
>
Again, I think we need to look at cases here. Suppose we started by just
using the simplest (most generic) datatypes wherever possible, how many
attributes would be seriously discommoded?
>>We conclude that
>>
>>- datatypes should be expressed as <rng:data> expressions
>>
>>
>
>I'm not sure I see why we want to impose such a restriction. E.g., for
>the TEI sex datatype, why would we prefer
>
> <rng:data>
> <rng:param name="pattern">^\s*(f|m|u|x)\s*$</rng:param>
> </rng:data>
>
>to either
>
> <rng:choice>
> <rng:value>f</rng:value>
> <rng:value>m</rng:value>
> <rng:value>u</rng:value>
> <rng:value>x</rng:value>
> </rng:choice>
>
>or far better
>
> <valList type="closed">
> <val>f</val> <desc>female</desc>
> <val>m</val> <desc>male</desc>
> <val>u</val> <desc>unknown or undetermined</desc>
> <val>x</val> <desc>not applicable or indeterminable</desc>
> </valList>
>
>
>
>
I dont quite see what you're getting at here. We need to decide whether
sex is an enumeration of possible values (possibly allowing people to
use their own values), or a hardwired datatype like date values. My
vote, fwiw, is for the former, but I'm open to discussion.
>>- for commonly occurring cases (see below) we should define a small
>> number of macros, which will be named in the way Syd proposes for
>> datatypes
>>
>>
>
>I think I may be confused: for commonly occurring cases of what? In the
>previous bullet point did "datatype" mean "the declaration of the
>allowed values of a particular attribute" or "an abstract constraint
>which can be applied to the allowed values of any given attribute"?
>
>
>
>
That's OK, I am not sure that I remember which I meant myself. I was
probably hankering after comprehensible names for certain frequently
ocuring ranges of v alues.
>>- it should be possible to map all datatypes to W3C basic datatypes,
>> possibly with additional constraints
>>
>>
>
>If I understand this correctly, I don't think there's any immediate
>problem with it. (I'm presuming that anything that is expressed as a
>list of values is something that can be mapped ) I think it is
>probably fine as a goal for the current short-term project.
>
>However, as a long-term principle I think it is a very bad idea to tie
>TEI to W3C datatypes. While I am far from a computer scientist who has
>studied these issues, it's clear that W3C datatypes leave a lot to be
>desired. It is quite reasonable to expect that other datatype
>libraries will be published (e.g., OASIS DSDL part 5), or that we
>would want to create a datatype library ourselves, perhaps using DTLL
>if & when it becomes fully worked out.
>
>
This is simply reiterating a decision we took at the Council meeting in
Oxford, if I remember aright.
>
>
>
>>The TEI has always proposed additional constraints in remarks,
>>valDesc, and descriptive prose. We think we should use the
>>Schematron language to express some of these: the primary use case
>>being constraints on acceptable GIs as targets for various pointing
>>attributes.
>>
>>We are not sure where these constraints go in ODD-world, but
>>probably not in the <datatype>. We recommend using Schematron for
>>them because (a) we know it does the job (b) it is a candidate ISO
>>recommendation.
>>
>>
>
>I agree with all of the above. Although I think perhaps we should
>avoid features of ISO Schematron that are not available in Schematron
>1.x, as processors for the former are hard to come by. (That may
>change by the time this becomes an issue, of course.)
>
>
>
>
>>I think anything we can do to reduce the complications consequent on
>>the whitespace rules of XML is an unalloyed Good Thing, and propose
>>to be even more draconian than Syd suggests.
>>
>>
>
>Hear hear!
>
>
>
>
>>My suggestion is that we allow only token, nmtoken, and
>>tei.data.token.
>>
>>
>
>I'm not sure *where* this restriction occurs, since in the next
>paragraph you propose we keep "tei.data.tokens".
>
>
I mean to include the plural forms, sorry for the carelessness.
>- token: do you mean "rng:token" or "xsd:token"?
>
>
No idea. Please spell out the difference, and express your preference if
any.
>- xsd:NMTOKEN: interesting; where would you want to use it? I have
> found no good use for this datatype for any of the 541
> attributes I looked at. In every case that NMTOKEN is
> currently used, I think we should be using xsd:Name (or
> perhaps xsd:NCName), except for the 1 oddball case of
> unit= on <timeline>, which should be an enumerated list
> or folded into interval=.
>
>
>
I'm happy to stick with NCName
>
>
>>While sympathising with the motivation for it, I feel that the
>>distinction Syd proposes between "tei.data.string" and
>>"tei.data.tokens" will only confuse people. If the value of a
>>sequence of tokens is to be interpreted as a single string, then it
>>probably shouldn't be an attribute at all.
>>
>>
>
>Really good point. As I said, I'm very back-and-forth on this issue,
>and Lou's argument tips me back. The only attribute that I can think
>of that does not fit the "tei.data.tokens" semantic and should remain
>an attribute is the value= of <metSym>. And since it's a single
>attribute, IMHO it doesn't have to be declared as a "datatype" (i.e.,
>with indirection), and even if Council thinks it does, we could just
>use tei.data.tokens and live with it. (Remember, the validation would
>be exactly the same, it's only that the prose explanation might not
>fit perfectly well. Does anyone use this attribute, anyway?)
>
>
>
>
I think it should be tei.data.token -- it can't have white space. It
should be a single character in fact for any sensible kind of notation.
>>These I like:
>>tei.data.token, tei.data.tokens, tei.data.pointer, tei.data.pointers
>>
>>
>
>Me too.
>
>
>
>
>>Constraining tei.data.token/s further as NMTOKEN/S/NCName/QNAME etc. is
>>possible, but I am not sure how many elements would benefit from it
>>
>>
>
>Between half a dozen and a dozen attributes, I suspect. Most, if not
>all, of which should be xsd:Name.
>
>
>
Then let's go with that.
>
>
>>Names I would prefer:
>>for tei.data.uboolean -> tei.data.truthValue
>>
>>
>
>I *like* it. Unless there are rousing objections, I'll plan to change
>this in EDW90 and the corresponding database later this week.
>
>
>
>
Done?
>>Names I'm not sure about
>>tei.data.temporalExpression: how does this map to ISO 8601?
>>(I assume it doesn't include dateRanges, for example)
>>
>>
>
>Hey, you thought of this name! See separate thread James started for
>ISO 8601 alignment discussion.
>
>
>
>
It's not so much the name that worries me as the significance! We need
some straw person proposals in view of the points James raises.
>>tei.data.duration
>> We should adopt a consistent policy as to whether quantities like this
>>include their units, or whether the units are supplied as a separate
>>attribute. I think I prefer the second option, as being more
>>flexible.
>>
>>
>
>If we're going to use W3C datatypes, then at least in those cases
>where W3C puts the unit in with the quantity (xsd:duration explicitly,
>and the various date and time formats implicitly) we'd have to do the
>same.
>
>
>
>
Yes., but only if we chose to. We could say we will NEVER have
attributes which combine in their value quantity+unit, or we could say
we have some which do and some which dont, or we could say all
quantities have the potential to include units. Again, a list of
specific cases might help sharpen discussion and reach a decision.
>>tei.data.probability
>> Not convinced we need this. There are very few candidates in the
>>EDW90 table (I find 1, to be exact!)
>>
>>
>
>My fault, table had typos. This one is for expressing a range from 0
>to 1 (or 0% to 100% or none to all). Currently only 3 attributes make
>use of it (scope= of <handNote>, usage= of <language>, weights= of
><alt>). Since (IIRC) Council agreed in Paris that whenever 2 or more
>attributes share the same constraint, a datatype should be abstracted
>out, I did so. (There was even some discussion that there should be a
>datatype even if only 1 attribute has a particular constraint, IIRC.)
>
>
>
>
Well 3 is almost enough to warrant having it. Presumably if we define
it as a macro, the user who only wants ever to express probability by
means of a percentage can say so by redefining the macro.
>>tei.data.numeric
>> I'm now coming round to the view that we also need a
>> tei.data.integer
>>
>>
>
>I'm wondering if the concept of "positive integer or 0" is simple
>enough that we don't need to bother creating a TEI datatype for it,
>and could just use xsd:nonNegativeInteger directly when needed.
>
>
>
>
Fine by me.
>>tei.data.language
>> I agree that we need to document exactly what this means somewhere and
>>providing a TEI name for it is a good way of doing so.
>>
>>
>
>JC> Just to make sure I'm understanding this...would that datatype
>JC> then be used for validation of @xml:lang's format? It seems
>JC> strange to me to be using a tei.datatype to validate and/or
>JC> document use of a non-TEI element/attribute.
>
>I agree with Lou. There are 3 reasons to make a datatype
>(tei.data.language) that maps directly to xsd:language.
>
>* It occurs more than twice (I'm not sure this is a compelling
> argument on its own): langKey= & otherLangs= of <textLang>, ident=
> of <language>, mainLang= of <hand>, and xml:lang= of everything.
>
>* Although it maps directly to xsd:language, the explanation of
> xsd:language is both hard to find and hard to read & understand.
> (Quite unlike the explanation of xsd:nonNegativeInteger, which is
> easy to find, and not all that hard to read & understand -- besides,
> it's obvious enough that almost no one bothers.)
>
>* As Christian pointed out, it would be nice to have someplace in the
> reference documentation to plunk the explanation of how xml:lang= is
> related to ident= of <language>.
>
>
>I will post a reply on tei.data.code and tei.data.key issue separately.
>
>_______________________________________________
>tei-council mailing list
>tei-council at lists.village.Virginia.EDU
>http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>
>
>
>
More information about the tei-council
mailing list