[tei-council] attribute datatypes

Lou Burnard lou.burnard at retired.ox.ac.uk
Sun Feb 24 13:43:47 EST 2013


Just a word (well, several) on attribute datatypes.

I believe our policy to be that attributes define their datatype
indirectly by refering to one of the existing data.xxx macros, which
are then mapped to a RELAXNG expression. This additional layer of
indirection allows us to say something about the datatype
independently of how it's currently constrained by the RELAXNG
schemas, and is thus, I contend, A Good Thing, which should be applied
consistently. As, for the most part, it is. However I have
discovered the following which appear to me to be anomalous:

1. source at readFrom is defined directly as "anyURI" which is a macro,
not a datatype: It should be corrected to data.pointer

2. att.lexicographic at opt and fDecl at optional  are both defined as
boolean, but should probably be data.truthValue

3. timeline at interval and when at interval are both defined using ad
hoc RNG constructs (but not identical ones); we should define a
data.interval which is consistent.

4. language at usage uses an adhoc RNG datatype directly; we should
define a new "data.percentage" macro for it (and look for other cases
where this might be used)

5. application at version uses an  adhoc RNG expression, which surely ought
to be replaced by a "data.versionNum" macro

6. Several attributes [moduleRef at except and @include;
att.identified at module; @key, on classRef, elementRef, macroRef, and
moduleRef; moduleRef at prefix] use the built in RNG datatype "NCName". It 
might
be more consistent to define  our own macro "data.xmlName" vel sim.

7. That old bugbear rng:text is still used on most of the attributes
delivered by att.lexicographic (expand, norm, orig, split, and
value); two attributes which hold regexp values
(att.patternReplacement at replacementPattern and att.scoping at match);
also on  refState at delim, and valItem at ident.  I think we should have
just one datatype for "string of words to be treated as a single entity" and
use that for some of these; for the regexpes surely we should have
data.regexp.


8. Many many attributes currently include <valList>s of various levels
of closure. I felt too faint to check (and I think someone else
already has), but they should all also have a datatype of data.enumerated.

Yes, I know this should be a ticket or several such. But I thought it
might be a good idea to check that we're all agreed on the principle
before setting in motion its consequences.



More information about the tei-council mailing list