[tei-council] xml-colon-thing

Lou Burnard lou.burnard at computing-services.oxford.ac.uk
Thu Nov 11 19:24:45 EST 2004


Syd Bauman wrote:

> 
> This seems like it could be quite problematic for anyone trying to
> write software that could do something intelligent with unmodified
> TEI P5 files. With this proposal authors of such software would need
> to be able to process 
> * ID/IDREF using id=, and
> * XPointers using xml:id=, and
> * lang= pointing to <lang> via ID/IDREF, and
> * xml:lang= being an RFC 3066 (or successor) language tag &
>   potentially pointing to <lang> via ident=

Taking these in turn:

* ID/IDREF would stay exactly the same
* xml:id doesn't supply an xpointer so why would processors need to 
handle one?
* @lang doesn't point to <lang> but to <language> and would continue to 
do so
* @xml:lang if supplied can only provide an RF6 3066 code, so doesnt 
need to point to anything -- my guess is that many people who are fed up 
with  having to supply a <language> element would regard this as a big 
step forward!

> 
> I'm not entirely sure why. It's not as if there are dozens of
> processing systems out there that can do the right thing with lang=,
> but won't be able to handle xml:lang=. Same for id=, really, with the
> obvious exception of validators.

depends what you think the right thing is. i think that handling 
id/idref is all that's involved, and there are hundreds of processors 
that do that

> My first reaction was "no it won't", but that's not true. It could,
> at least for the case of xml:lang=.

Good. Your second reaction is an improvement!

> 
> For xml:id=, I can't imagine that there won't be software to perform
> instance conversion. 

Data conversion is always possible, of course. The aim here is to reduce 
the number of times it is *essential*.


> However, if this "choice of lang= or xml:lang= but not both" proposal
> is not adopted, then those converting legacy data will have to go
> about creating a look-up table that will convert their local lang=
> values (which are by definition arbitrary IDREFs) into xml:lang=
> values (which will be by definition RFC 3066 or its successor
> language tags).

No they won't. My proposal is exactly that they don't have to convert 
their instance data at all.


> Of course, many projects actually deliberately choose to use RFC 3066
> language tags as the value of id= of <language> (and thus of lang=)
> already, and thus wouldn't have to do anything special.

Quite.


> The main problem I foresee pertains to people who have stretched the
> meaning of "language" a wee bit, and have things like
>   <eg lang="xml">
> and
>   <formula lang="TeX">
> in their files.

This problem will raise its head in the <language> element indicated (of 
which there will, by definition, be only one instance). That has an 
ISO639 attribute which should give the intended correspondence, of course.

>> 
>>Not least for us!
> 
> 
> Who's us? While the TEI-C certainly has lots of data lying around in
> P4 format, is there a strong reason to migrate it all to P5? It
> appears we have a history of leaving things in older formats -- there
> are still files on the website in Waterloo GML and LaTeX. (Although
> to somebody's credit (probably Lou's), plain-text equivalents are
> usually provided as well.)
> 

I was thinking more of our current tool chain.

>




More information about the tei-council mailing list