[tei-council] date attributes: summary, problems, and some suggestions

Thu Feb 15 11:03:31 EST 2007

> > Problems
> > --------
> > * Some are distressed by the fact that attributes that are of the
> >   same datatype (data.temporal) and serve similar functions have
> >   different names, in particular:
> >       value=  of  <date>, <time>, <distance>, and <docDate>
> >       date=   of  <birth>, <change>, and <death>
> >   I am not bothered by this in the least, because I think the
> >   semantics are clearer with these names, and the combined
> >   alternative (dateValue=) is at least cumbersome if not misleading
> >   (i.e., on <time>).
> >   Suggestion: leave names as they are. 

CW> fine

SR> OK. its a minor annoyance/distress, but not worth arguing about

That was easy.

> >   - Put <docDate> into att.datePart. This has the disadvantage of
> >     giving <docDate> a dur= attribute, but I'm not sure it is worth
> >     making another class just for this one case. Thoughts?

CW> we should not carry the class economy to far, I think. Having
CW> attributes that do not make sense for a certain element should
CW> automatically recommend it for a separate class.

SR> no worries. I am sure someone will find an amusing use for @dur

I'm with Christian on this one. (In part because I think Sebastian is
right, and the amusing use of dur= worries me.) 

> >   - Create a new attribute class for the date= of <birth>, <death>,
> >     and <change>. (Any suggestions for the name?) 

CW> att.datePart.date?

SR> att.dateValue

Hmmm... how about att.normalizedDate?

> >   - If we keep <distance>[1] we may wish to reconsider its class
> >     membership, as value= is a bit silly on <distance>. It needs only
> >     dur= from att.datePart, making two cases that benefit from
> >     splitting att.datePart. (See <docDate>, above.)

> kill distance

Is anyone in favor of keeping <distance>? Is anyone besides me, Lou,
and Sebastian in favor of deleting it?

> > * The precision= attribute is superfluous, as the precision is ...
> >   Suggestion: delete it.

SR> OK

Going once, ...
going twice ...

> > * Users want a method of expressing [known day & month, unknown
> >   year, or specific possible dates within a range]
> >   Suggestion: I haven't got one, thus defer to P5 1.1.

SR> I agree.

CW> Whats the point of putting this in an attribute anyway?

I suppose that's a good question, Christian. In any case, it's not
one we're going to directly solve for P5 1.0, it seems. However, when
we instantiate a "user-defined" system, users will, of course, be
permitted to whip up their own systems.

> > The basic idea is to provide two capabilities: 
> > * simple date format: conform to W3C spec, easily validatable, software
> >   support in the world-at-large
> > * complex date format: should conform to ISO 8601 if possible

SR> The principle is fine, yes.

> > [Including shifting representation of imprecise times to 'complex
> > date' format.]

CW> This seems to be quite desirable to me.

So, we're agreed on the basic idea, that's good.

> > attribute level: each of the dating attrs is split into two
> > datatype level: we provide one datatype for each date format, user
> >                 chooses which for each attribute
> > class level: for each attribute set, we provide two (or more) classes,
> >              one for each format, user chooses which for each element
> > all-in-one: syntax of attribute value differentiates

SB> Up front I can only say that I don't like the 'attribute level'
SB> solution. It's just too confusing for the average user, most of
SB> whom have little or nothing to gain.

SR> I would rule out the attribute level solution, just too confusing
SR> for day to day use.

OK, attribute level is out. One down.

CW> At the moment, I am inclined to go with [datatype level] solution

SR> The multiple datatypes are interesting, but "the users chooses
SR> which datatype to use for any given attribute" is a bit clumsy.
SR> If you switch to an ISO or USER model, would you do it piecemeal?
SR> wouldn't it always be a global decision?

Not at all! Most obviously, I probably want the date= of <change> to
be a simple format date (W3C) even if I need to use the complex date
format for the value= of <date> that is inside my transcription of a
17th century diary.

SR> More importantly, this means that when I see <date value="foo bar">
SR> coming at me, I cannot easily deal with it without parsing the ODD
SR> in relatively complex ways. I really don't like the idea of @from
SR> and @to changing their meaning under my feet constantly, or having
SR> to refer back to ad hoc notations in the header.

I think this is a really compelling argument against the datatype
level solution, even though it's not as bad as Sebastian makes it
sound. (And I don't know what these ad hoc notations in the header
are.) To ascertain what kind of date it is processing, software need
only 
* follow the link in the yet-to-be-agreed-upon-let-alone-specified
  mechanism in the instance that points to the schema;
* find the declaration for the attribute in question;
* see what kind of datatype it is declared as.

So while it doesn't require parsing the ODD, it is still way too
complex. Besides, it boils down to a situation in which the schema is
adding information to the instance (much the way a DTD can add a
default attribute value), which if not morally abhorrent, is at least
something that should be avoided.

So unless Christian or someone else comes up with a good counter-
argument, I think we should stop considering the datatype-level
solution.

SR> The class solution is sort of elegant, but falls down somewhat in
SR> implementation problems.

I'm confused -- you suggest a quite reasonable implementation below. 

SR> The all-in-one solution is cute, but presumably not supported by
SR> validating software or other standards?

I've thought about it a bit more, and I think the all-in-one solution
is more than cute, I think it has a lot to say for it.

A reminder as to how it would work:

One attribute, syntax of value determines what kind of date format: 

  no prefix   = entire value is a simple format (i.e. W3C) date
  prefix 'i-' = remainder of value is an ISO 8601 date
  prefix 'x-' = remainder of value is a user-defined format date

Thus:

  <date value="2007-02-14"/>
  <date value="i-2007-W07-3"/>
  <date value="x-1385-11-25"/>   <!-- Persian calendar -->
  <date value="x-hebrew-5767-11-27"/>
  <date value="x-islamic-1428-01-27"/> 

Yes, this system is not supported by validating software or other
standards (although it bears a nice resemblance to RFC 3066 tags, the
two are not related in any way except that TEI might use them as
attribute values), but I think that's a red herring. None of the other
solutions are supported in any way, either.

None of the user-created formats are ever going to have standard
software suite support, so I don't think we should fret much over
that. So the question is, do we think that someone is going to
develop a datatype library for ISO 8601? Because if the answer is
"yes", then we should probably use one of the more complicated
solutions in anticipation that someday we could make use of that
datatype. If there answer is "no", then value="i-2007-02-10T17:58"
seems to be just as useful as value-iso="2007-02-10T17:58", and it
makes the whole system a lot easier on the user, and not particularly
difficult on software.

As to whether or not someone is likely to develop an ISO 8601 or other
useful datatype library, I don't know. I know there has been some
discussion about datatype libraries in general, and that Jenni
Tennison has gone so far as to prototype a declarative language for
defining them[1], but except for Elliotte Rusty Harold's
prime-number-checking example[2], I don't know of any actual datatype
libraries that have been built in the > 5 years that Relax NG has been
around. Does anyone else?

So, IMHO, we are down to either class-level or all-in-one. 

SR> I think we should basically go with the class solution, and accept
SR> that elements will gain extra attributes of the form value.XXX
SR> 
SR> We implement it as follows:

So, if I understand you correctly, you are suggesting something like
the following (which is different, because I've accommodated renaming
and the need for separate classes for value= and dur=):

When "ISOdate" and "UserDate" modules are loaded:

att.dateTime.w3c = @value
att.duration.w3c = @dur
att.datable.w3c  = @notBefore, @notAfter, @from, @to
att.dateTime.iso = @iso-value
att.duration.iso = @iso-dur
att.datable.iso  = @iso-notBefore, @iso-notAfter, @iso-from, @iso-to
att.dateTime.usr = @usr-value
att.duration.usr = @usr-dur
att.datable.usr  = @usr-notBefore, @usr-notAfter, @usr-from, @usr-to

When those modules are not loaded, all of the .iso and .usr classes
are empty. (BTW, we may well decide the "UserDate" module and thus all
the .usr ones are not needed.)

att.dateTime is a member of all three att.dateTime.[iso,usr,w3c]
att.duration is a member of all three att.duration.[iso,usr,w3c]
att.datable is a member of all three att.datable.[iso,usr,w3c]

By default, the various elements are members of the superclass. I.e.,
<date> and <time> are members of att.dateTime and att.duration. If a
user really does not want the plain W3C flavor attributes, she can
delete att.dateTime.w3c and att.duration.w3c.

On thought on this last bit: why doesn't she just make <time> and
<date> direct members of att.dateTime.iso and att.duration.iso? My gut
instinct is that this provides more flexibility: I can have <date> be
a member of att.dateTime.w3c, <time> be a member of att.dateTime.iso,
and my new <period> a member of att.dateTime, thus getting both
attributes.

SR>  * people who want to define att.datable.Etruscan as a new class simply
SR>    do so, and change att.datable to be member of that new class
SR>  * if desired, individual elements can be made members of 
SR>    att.datable.Etruscan

Right, which is why the "UserDate" module and various .usr classes may
not be needed.

SR>  * people can redefine att.datable.w3c to use ISO if they want
SR>    to freeze in hell forever and ever

Not sure they'd freeze in hell, but I hope we can craft a definition
of TEI conformance that leaves them out in the cold of non-
conformance.

SR> I believe that the complexity of adding an extra set of attributes
SR> is worth it for the simplicity and extensibility.

Can you elaborate on how this is more simple or more extensible than
the "all-in-one" solution? I think "all-in-one" is simpler for the end
user, and not particularly more complicated for the programmer. And
both seem equally extensible. However, I think the class solution is
cleaner, and permits more flexibility. E.g., a user can choose to have
element <a> have only a plain ("simple date format", aka W3C) value=
attribute, element <b> to have only a "complex date format" (aka ISO)
iso-value= attribute, and element <c> to have both. With the
"all-in-one" the user has no such flexibility: any <a>, <b>, or <c>
has one simple value= attribute, the value of which may be a simple or
complex date format.

SR> The model I am following here is that of att.global, which picks
SR> up extra meaning if you load new modules

Right, makes sense.

SR> If you buy the "module" route, it means that we could add
SR> att.datable.iso at 1.1 if we wanted.

Indeed. Although personally, I'd be inclined to add the .iso class at
1.0, and then work on a datatype to actually constrain it well for
1.1. 

Notes
-----
[1] http://www.jenitennison.com/datatypes/
[2] http://www-128.ibm.com/developerworks/xml/library/x-custyp/