[tei-council] xml-colon-thing

Thu Nov 11 21:06:46 EST 2004

> * ID/IDREF would stay exactly the same

Yes, it would, but that doesn't change my point. Software being
written to process P5 files as they are currently envisioned would
not have to know how to process ID/IDREF. If the "id= or xml:id="
proposal is implemented, such software would have to know how to do
this. Furthermore, software would have to be able to process
XPointers pointing to id=, XPointers pointing to xml:id=, IDREFs
pointing to id=, and IDREFs pointing to xml:id=. (The last of which
may present a bit of a problem when there's no DTD available.)

> * xml:id doesn't supply an xpointer so why would processors need to
> handle one?

Because that's how you point at one. Yes, I know xml:id= merely
supplies an ID and thus can be pointed at by IDREF, but you weren't
suggesting that, were you?

> * @lang doesn't point to <lang> but to <language> and would
> continue to do so

Right, sorry for the slip. But I think you may be missing my point,
because you say "would continue to do so" as if that were a good
thing, whereas my point was that it would continue to do so, and
that's a bad thing because software vendors would have to deal with
it.

> * @xml:lang if supplied can only provide an RF6 3066 code, so
>   doesnt need to point to anything -- my guess is that many people
>   who are fed up with having to supply a <language> element would
>   regard this as a big step forward!

Right. But again, my point is that software reading a P5 file would
need to be prepared to handle both lang= and xml:lang=, even in the
same document. The big step forward (not needing a <language>) is
available to P5 users whether or not we implement the "lang= OR
xml:lang=" proposal.

> depends what you think the right thing is. i think that handling
> id/idref is all that's involved, and there are hundreds of
> processors that do that

I don't understand. First, what software other than validators (which
don't actually *do* anything, really) and XSLT style-sheets (which, if
I understand correctly, will be quite easy to convert) handle
ID/IDREF of lang= and <language>? In my mind, the right thing is
going to be processing of the content in a manner that's dependent on
its language. E.g.m, speaking German passages in German; or different
coloring in printed output; or looking up a word in a
language-specific dictionary when it's clicked.

> > For xml:id=, I can't imagine that there won't be software to
> > perform instance conversion.
> Data conversion is always possible, of course. The aim here is to
> reduce the number of times it is *essential*.

I suspect you've misspoken here, in that this aim doesn't support
your position -- data conversion is essential in 100% of cases,
whether or not we go with the "id= OR xml:id=" proposal.

Perhaps what you meant was that the aim here is to reduce the number
of things that have to be changed or the number of steps in the
conversion process?

> > However, if this "choice of lang= or xml:lang= but not both"
> > proposal is not adopted, then those converting legacy data will
> > have to go about creating a look-up table that will convert their
> > local lang= values (which are by definition arbitrary IDREFs)
> > into xml:lang= values (which will be by definition RFC 3066 or
> > its successor language tags).
> No they won't. My proposal is exactly that they don't have to
> convert their instance data at all.

Right. The above quoted passage starts with "... is *not* adopted
..." (emphasis added). But that said, 100% of all P4 instances will
have to be converted. None will be valid P5 without some conversion.

> > The main problem I foresee pertains to people who have stretched
> > the meaning of "language" a wee bit, and have things like
> >   <eg lang="xml">
> > and
> >   <formula lang="TeX">
> > in their files.
> 
> This problem will raise its head in the <language> element
> indicated (of which there will, by definition, be only one
> instance). That has an ISO639 attribute which should give the
> intended correspondence, of course.

There is no ISO639 code for either XML or TeX. But now that I think
about it, if such people are willing to stretch the meaning of
"language", they probably won't mind (ab)using the private use tags
("x-") of 3066, and won't have much of a problem.