[tei-council] attributes/datatypes action item progress report

Syd Bauman Syd_Bauman at Brown.edu
Wed Jun 1 23:54:32 EDT 2005


In Paris y'all tasked me with going through all TEI attributes and
attempting to classify each as a particular datatype, noting those
that are singular or for some other reason don't fall into a datatype
well, and considering what datatypes are needed, what extra
constraints are needed, etc.

I had hoped to finish going through them by today. I'm almost there.
This turns out to be an enormous task, for which I've put off several
other things (e.g., shopping for a digital camera, meeting with David
Durand about SA).

There are 541 separate attributes to be considered. Of those, I've
examined *all* 298 non-"text" attributes, and along the way an
additional 17 of the "text" ones. I had put the "text" ones off until
last because for the most part these are going to be textual
attributes that we've already considered in both EDW79 and EDW86, and
thus should be pretty easy. (Although, as I say it, I'm worried those
may be "famous last words" :-)

(BTW, I put "text" in quotes because I think that the keyword "text"
should occur in the TEI schemas only once: in the declaration of the
macro 'tei.text' (or whatever we decide to call it). In attribute
declarations the keyword "token" or "xsd:token" should be used
instead.[1] In element declarations, tei.text should be used.[2])

I plan to finish up the "text" attributes in the next day or two, and
hope to write up a report shortly thereafter.

Notes
-----
[1] The only difference between "token" and "xsd:token" is that param
    patterns can only be used to restrict the latter, not the former.
    I.e., 
    element file {
       attribute name { xsd:token { maxLength = "32" } },
       attribute type { "alias" | "folder" | "plain" },
       empty
    }
    is a valid declaration, but the same thing without "xsd:" causes
    an error.
[2] tei.text = mixed { g* }
    This is because wherever what we used to call PCDATA is allowed,
    the <g> element must be permitted, in case a non-Unicode
    character is required. There are some exceptions, I suppose.
    E.g., it's hard to imagine the content of <postCode> requiring
    characters outside of Unicode.




More information about the tei-council mailing list