[tei-council] xml-colon-thing
Lou Burnard
lou.burnard at computing-services.oxford.ac.uk
Thu Nov 11 19:24:45 EST 2004
Syd Bauman wrote:
>
> This seems like it could be quite problematic for anyone trying to
> write software that could do something intelligent with unmodified
> TEI P5 files. With this proposal authors of such software would need
> to be able to process
> * ID/IDREF using id=, and
> * XPointers using xml:id=, and
> * lang= pointing to <lang> via ID/IDREF, and
> * xml:lang= being an RFC 3066 (or successor) language tag &
> potentially pointing to <lang> via ident=
Taking these in turn:
* ID/IDREF would stay exactly the same
* xml:id doesn't supply an xpointer so why would processors need to
handle one?
* @lang doesn't point to <lang> but to <language> and would continue to
do so
* @xml:lang if supplied can only provide an RF6 3066 code, so doesnt
need to point to anything -- my guess is that many people who are fed up
with having to supply a <language> element would regard this as a big
step forward!
>
> I'm not entirely sure why. It's not as if there are dozens of
> processing systems out there that can do the right thing with lang=,
> but won't be able to handle xml:lang=. Same for id=, really, with the
> obvious exception of validators.
depends what you think the right thing is. i think that handling
id/idref is all that's involved, and there are hundreds of processors
that do that
> My first reaction was "no it won't", but that's not true. It could,
> at least for the case of xml:lang=.
Good. Your second reaction is an improvement!
>
> For xml:id=, I can't imagine that there won't be software to perform
> instance conversion.
Data conversion is always possible, of course. The aim here is to reduce
the number of times it is *essential*.
> However, if this "choice of lang= or xml:lang= but not both" proposal
> is not adopted, then those converting legacy data will have to go
> about creating a look-up table that will convert their local lang=
> values (which are by definition arbitrary IDREFs) into xml:lang=
> values (which will be by definition RFC 3066 or its successor
> language tags).
No they won't. My proposal is exactly that they don't have to convert
their instance data at all.
> Of course, many projects actually deliberately choose to use RFC 3066
> language tags as the value of id= of <language> (and thus of lang=)
> already, and thus wouldn't have to do anything special.
Quite.
> The main problem I foresee pertains to people who have stretched the
> meaning of "language" a wee bit, and have things like
> <eg lang="xml">
> and
> <formula lang="TeX">
> in their files.
This problem will raise its head in the <language> element indicated (of
which there will, by definition, be only one instance). That has an
ISO639 attribute which should give the intended correspondence, of course.
>>
>>Not least for us!
>
>
> Who's us? While the TEI-C certainly has lots of data lying around in
> P4 format, is there a strong reason to migrate it all to P5? It
> appears we have a history of leaving things in older formats -- there
> are still files on the website in Waterloo GML and LaTeX. (Although
> to somebody's credit (probably Lou's), plain-text equivalents are
> usually provided as well.)
>
I was thinking more of our current tool chain.
>
More information about the tei-council
mailing list