[tei-council] Datatype : roundup

Thu Sep 22 06:35:11 EDT 2005

> I am not sure who the "we" in that sentence is -- possibly the SO
> work group?

Yup, sorry; I should have been more explicit. SO decided on the W3C
regular expression language, and Council approved.

> Well, as one who has done a lot of programming in various
> pattern-matching languages, I think the characterization is not
> VERY misleading. But it hardly matters... I am quite happy for us
> to stick with the W3C regexp language if others agree, for the good
> pragmatic reason given above, provided that we make explicit what
> its shortcomings are.

What shortcomings do you have in mind? (Implicit anchoring, while a
difference, can hardly be called a shortcoming.)

> Aaargh, so it's not even a clean subset!

It is not a subset at all (at least, not of any language I'm aware
of).

> > As for tei.data.name:
> > * the name is really bad; I'd prefer to live with the confusion of
> >   tei.data.token. (Remember, the string "xsd:token" will only appear
> >   a few times in all of P5; in the declarations of at most half a
> >   dozen datatypes.)
> 
> That's irrelevant to the issue here: we want people to use the TEI
> name and not be confused when they talk to others about it. No-one
> has yet proposed a better name than name.

I disagree; I think *you* proposed a better name than 'name', when
you proposed 'token'. The latter is mildly confusing due to W3C's
misuse; the former is outright misleading. The attributes of this
type (type=, subtype=; to= & from= of <locus>; scope= of <handNote>,
etc.) don't really have names as their values in any sense of the
word.

> > * I think we should probably be more permissive than NMTOKEN.
> 
> We can tweak the definition if you like, but I don't understand why
> you would want to.

Because I don't think we should be limiting users to letters, digits,
dot, hyphen, underscore, and colon for the values of these
attributes, e.g. cref=, extent=, reason=, where= of <move>, real=,
met=, rhyme=. Although for some it does make sense (name= of <equiv>,
included= of <witness>); perhaps these should be moved to
tei.data.ident?

> (a) difficulties in implementation

Why is it so much harder to implement? I haven't completely worked
through the algorithms for comparison of dateTimes and addition of
duration to dateTime, but neither seems to depend on precision to the
second. Did I miss something?

> (b) confusion caused by lack of timezone information

Good point. We should make timezone information possible on
right-truncated times, of course.

> > I really don't see why not permit percentages. Users had the
> > choice in P4, when we couldn't even validate it. Now we can.
> > 
> simplicity, clarity, precision...

While I suppose it is simpler for software writers to have 1 system
rather than 2, there is nothing more clear nor more precise about
"0.824" than "82.4%". While I have some sympathy with the idea of
reducing choices for users, this is one place where I think users
like the choice.

> I think this is a mistake, actually. Decimal was a better choice,
> since it can represent any number, real or integer, no matter how
> big. It means you can;t use scientific notation, which someone
> folks on TEI-L suddenly woke up and asked for. 

So you think we have more users who really want to represent numbers
with greater than 16 (decimal) digits of precision than users who
want to represent numbers in scientific notation? As I had hoped my
example would demonstrate, that much precision is not something we
humans generally deal with.

> I now think maybe we should have a different datatype for
> [scientific notation].

If we split scientific notation out to a different datatype, won't we
need a disjunction of the two datatypes in most if not all instances
anyway? And the disjunction (whether of two separate TEI datatypes or
of two xsd: datatypes inside a TEI datatype) might be a bit confusing
for implementers. But it shouldn't be impossible to deal with (could
always just assume that if it's not in scientific notation, it is an
xsd:decimal).

> Credit card numbers, by the way, are tei.data.ident, clearly.

I thought you wanted tei.data.ident to be xsd:Name -- credit card
numbers start with a digit, and thus are invalid xsd:Names. They'd be
fine as tei.data.[name|token], but once again the name "name" doesn't
make sense. (Alternative one could insist that they start with a "V"
or "M" or whatever :-)