[tei-council] comments on edw90

Thu Sep 1 09:32:10 EDT 2005

> tei.data.code and tei.data.key
> 
>    I have a different understanding of these.
> 
> Syd defines tei.data.code as a version of tei.data.pointer, with the
> extra constraint that the target must be in the present document. I
> am not sure what benefit there might be to adding this constraint.
> 
> More seriously, however, tei.data.key is defined as its complement,
> i.e. a tei.data.pointer with the constraint that its target must
> *not* be in the current document. I am even less clear what benefit
> there is in that, and find the naming rather confusing, since we are
> currently using "key=" attributes for things which are explicitly
> not pointers and which might even be in the same document (e.g. in
> TD)
> 
> I think we should stick to tei.data.pointer (by all means add
> tei.data.pointer.local, if need be) for the URI case, and keep
> tei.data.code (or key) for use when all we can say of the attribute
> is that its value is a magic token which might get you somewhere in
> some external system, e.g. a database key, but which cannot be
> validated. It might also be useful for cases where an association is
> made by co-reference, as with the ident/key pair in TD, or in
> whatever variety of HORSE we finally decide to back.

P4 provided (at least) 3 ways for an attribute to refer to a
controlled vocabulary:

* the value is from a list specified in the DTD (e.g., status= of
  <availability>)

* the value is an IDREF to something (which has an id=) specified in
  the document instance (e.g., who= of <sp> or resp= of <add>)

* the value is a key into some (ostensibly external, but Lou correctly
  points out that, especially in the P5 world, it may well be in the
  document instance) resource (e.g., key= of <name>)

I am not claiming that this is the only or best way to think of these;
but I do think these are useful distinctions. It is my (very
unscientific) observation that users crave the use of IDREF (or its
modern equivalent) as a mechanism for controlling vocabularies. Even
those I would think of as "power users" of TEI are sometimes very
hesitant to change the DTD, and very few want to go to the trouble of
building a database and creating software that actually uses a TEI key
to look things up in it. However, users often really want the
capability to control the values of a particular attribute, and
occasionally even want to say something about what the values mean.

There are problems porting this analysis to the P5 world, though. In
P5, the pointers that replace IDREFs can point anywhere. Even into the
schema, e.g. Furthermore, in P4 the value of key= was just a key. But
in the P5 world, it might make sense for the key= to indicate the
resource as well as the key. E.g., in P4 <name key="929041"> perhaps
could be <name key="http://my.server.org/name-database?929041"> or
some such.

Thus, in EDW90 I suggested three mechanisms for controlling vocabulary.
* tei.data.enumerated, the control is via a closed <valList> in the
  ODD which boils down to a token list in the schema
* tei.data.code (perhaps a bad name), a local pointer (although an
  argument could be made for making this a generic pointer or for
  making it an xsd:Name and using co-reference)
* tei.data.key, for database keys, could be declared as a URI or as an
  xsd:Name, and I don't claim to know which is better

Note that, if I understand correctly, the key= of <attRef>,
<moduleRef>, <specDesc>, and <memberOf> is currently defined as being
a name, and is tied to being one processing chain. I.e., using a URI
here would break things.

As I sit here and think about it, in my pre-breakfast hypoglycemic
state, I am beginning to lean towards leaving tei.data.key an
xsd:Name, and telling users who really want to specify the resource as
well as the key to use xml:base=. E.g. <name key="929041"
xml:base="http:/my.server.org/name-database">. Is that feasible? I
don't claim to have thought this through at all.

One final thought. As mentioned before, in the P5 world as currently
instantiated, a document can't say to which schema it is supposed to
conform. So suppose a project has 2 somewhat similar schemas that
serve different purposes lying around, each of which imposes a closed
value list on the bar= of <foo>, as follows:
   A        B
   ---      -------
   fee      fee
   fi       fi
   fo       fiddlie
   fum      aye
            oh
Now suppose you, the encoder, start working on an instance. Unless
said instance is already valid against A and invalid against B, you
have no way of knowing that "fum" is a legal value but "aye" is not.
Thus I think (or am I worried?) that a validatable method of
constraining values from within the instance will be even more popular
in P5 than in P4. I.e., it would be useful to have
  <definition-of-foo-values>
    <valList>
      <valItem ident="fee">
        <gloss>The file entropy evaluation as reported by blort</gloss>
      </valItem>
      <valItem ident="fi">
        <gloss>The file input, should contain only ASCII characters</gloss>
      </valItem>
      </valItem ident="fo">
   ...
or some such somewhere in the TEI Header.