[tei-council] oh no, it's datatypes again

Syd Bauman Syd_Bauman at Brown.edu
Thu Jan 5 11:11:05 EST 2006

> In the upcoming release of TEI P5, the global attributes rend and n
> have a declared datatype of data.words, which in turn maps to
> list{data.word}. with data.word being defined as
>    xsd:token { pattern = "(\p{L}|\p{N}|\p{P}|\p{S})+" }
> The effect of this definition seems to be that dots are not
> permitted, which seems more than a little strange, particularly for
> attributes like n, which may quite reasonably be expected to have
> values such as "1.3.4"

I'm suspicious that the dots are permitted, and that this is just a
bug in nxml-mode. Period (U+002E) is a member of Unicode class "Po"
(punctuation, other), and therefore should be matched by the "\p{P}"
part of the pattern above. A quick test shows that jing, rnv, and
xmllint all think that period (".") comma (",") and semicolon (";")
all match the pattern "\p{P}". nxml-mode things they do not, however.

