[tei-council] ISO 8601 date representations and TEI attributes (fwd)

David Sewell dsewell at virginia.edu
Tue Nov 11 12:59:51 EST 2008


As this came from Syd when everyone was busy preparing for the Members
Meeting, I've waited to forward it until now, when we may have more time
to consider the issues he raises.

Syd has put a lot of work into considering the relation between the
formal ISO 8601 mechanisms for representing dates/times/durations and
the W3C Schema subset that is our primary datatype for representing them
in the Guidelines. One question we probably need to answer before
proceeding any farther is: how important a priority is it to sort out
(simplify, regularize, extend, or whatever) our usage for attributes
that take date/time information?  Genuinely understanding ISO 8601
syntax and semantics is challenging and time-consuming. Syd is our
resident expert on the topic (to my knowledge, unless others in the TEI
community have similar expertise), and he is willing to put more time
into this, but is it an optimal use of his time? (Or of mine, to the
extent that I work with him.)

David

---------- Forwarded message ----------
Date: Sat, 1 Nov 2008 16:09:50 -0400
From: Syd Bauman <Syd_Bauman at Brown.edu>
To: David Sewell <dsewell at virginia.edu>
Subject: Re: ISO 8601 date representations and TEI attributes

[David, could you forward this to the Council? Thanks.]

A somewhat caricatured summary of the conversation so far is "do we
really need all those *-iso= attributes?" and "maybe, but in the
meantime we should do a better job of constraining their values".

See SF FR 2055864.[1]

To help answer the first question, I have created a table of the
features of ISO 8601:2004 and, for most, recorded whether or not TEI
P5 could represent the information, if not the exact syntax, without
the *-iso= attributes.

To help us along the path to the second, I have written a program
that generates a RELAX NG fragment for checking ISO 8601 temporal
expressions.

Both of these have been uploaded to the SF FR as attached files.

I have also made them available on the web temporarily[2], along with
the output of the pattern writing program and a small test schema
that uses it[3].

Note that the patterns generated by the pattern writer are not yet
ready to be included in the Guidelines, for several reasons.
* not sufficiently tested
* some patterns may be *very* slow with some cases of input
* excludes recurring intervals because I don't understand them (not
  really a good reason, as the syntax is clear, even if the semantics
  aren't :-)
* further profiling of 8601 is in order, IMHO

On that last issue, we haven't discussed what profiling of 8601 we
should do. Some cases are obvious, e.g. that we should require the
extended format wherever there is one. ("The basic format should be
avoided in plain text." -- ISO 8601:2004 2.3.3.) Some are pretty
clear, if not outright obvious (e.g., that we should avoid the
representation of midnight with "24:00:00").

But I think there is at least one more case that ISO 8601 permits
that we may wish to consider disallowing: representing a year with
more than 4 digits. This capability requires that the parties involve
negotiate and agree on how many digits the year representation will
be expanded by, lest the result be ambiguous. Now I'm the biggest
nay-sayer against the idea of limiting TEI or misrepresenting text in
order to help achieve blind interchange. But in this case, it's seems
hard to come up with use-cases of humanities texts where specifying a
year both requires more than 4 digits (and an optional sign) and
wouldn't be better served by the period= attribute.

Notes
-----
[1] https://sourceforge.net/tracker2/?func=detail&aid=2055864&group_id=106328&atid=644065
[2] http://bauman.zapto.org/~syd/temp/8601_feature_table.xhtml and
    http://bauman.zapto.org/~syd/temp/8601_pattern_writer.perl
[3] http://bauman.zapto.org/~syd/temp/iso8601pattern.rng and
    http://bauman.zapto.org/~syd/temp/iso8601test.rng



More information about the tei-council mailing list