[tei-council] datatypes -- syd's comments

Tue Sep 20 18:32:25 EDT 2005

Lou wrote:
> ... tei.data.numeric maps to xsd:decimal which does support
> floating point numbers.

Right, you have instantiated tei.data.numeric as xsd:decimal, quite
in opposition to the recommendation, which was to use xsd:long |
xsd:double. I don't recall anyone presenting any argument as to why
xsd:decimal should be preferred. Not only does it not support
scientific notation, I believe it is harder for vendors to implement.
Did I miss something? (Always a possibility :-)

Arguments favoring xsd:decimal over xsd:double could be made (e.g.,
based on the approximation required of xsd:double), and I'd be happy
to entertain them. But as of now, I don't see why you made this
change.

---------

For those still wrapping their minds around these datatypes, I've
copied the following explanations from Eric van der Vlists book[1].

xsd:decimal --
    This datatype represents decimal numbers. The number of digits
    can be arbitrarily long (the datatype doesn't impose any
    restrictions), ... Leading and
    trailing zeros aren't considered significant and may be trimmed.
    The decimal separator is always a dot (.), and a leading sign (+
    or -) may be used, but any characters other than the 10 digits
    zero through nine are forbidden, including whitespace inside the
    value. ...

xsd:long --
    Contains an integer between -9223372036854775808 and
    9223372036854775807; i.e., the values that can be stored in a
    64-bit word.

xsd:double --
    ... represents IEEE ... double (64 bits) precision
    floating-point types. These store the values in the form of a
    mantissa and an exponent of a power of 2 (m x 2^e), allowing a large
    scale of numbers in a storage that has a fixed length.
    Fortunately, the lexical space doesn't require powers of 2 (in
    fact, it doesn't accept powers of 2), but instead uses a
    traditional scientific notation based on integer powers of 10.
    Because the value spaces (powers of 2) don't exactly match the
    values from the lexical space (powers of 10), the recommendation
    specifies that the closest value is taken. The consequence of
    this approximate matching is that float datatypes are the domain
    of approximation; most of the float values can't be considered
    exact and are approximate.
    These datatypes accept several special values: positive zero (0),
    negative zero (-0) (which is less than positive 0 but greater
    than any negative value); infinity (INF), which is greater than
    any value; negative infinity (-INF), which is less than any
    value; and "not a number" (NaN).

Note
----
[1] http://books.xmlschemata.org/relaxng/relax-CHP-8-SECT-1.html