[tei-council] ODD processing and renaming

Sat Apr 7 09:03:25 EDT 2007

Perhaps because I'm used to the current system, or perhaps because
it's still morning here, I don't think I understand what the
advantage of a useMe="yes" on <altIdent> would be.

Current encoding lets an ODD processor: 

* use *Spec/@ident, ignoring <altIdent>s

* use altIdent[xml:lang="xx"] if present, otherwise altIdent[xml:lang="en"]
  if present, otherwise *Spec/@ident (what I would call the
  "expected" behavior, but as I said, perhaps just because that's
  what I think ours does)

* use altIdent[xml:lang="xx"] if present, *Spec/@ident
  otherwise (ignoring altIdent[xml:lang="en"])

* Arrange possible languages in a desired languages in order (likely
  to be user-specified, of course. i.e.:
          use altIdent/[xml:lang="x1"] if present,
  if not, use altIdent/[xml:lang="x2"] if present,
  if not, use altIdent/[xml:lang="x3"] if present,
  if not, use altIdent/[xml:lang="x4"] if present,
  if not, use altIdent/[xml:lang="en"] if present,
  if not, use *Spec/@ident.

> Is overloading xml:lang in this way preferable to having an
> explicit attribute on the <altIdent> which says "use ME" ? (Note
> that the current scheme gives us no way of saying "here's another
> name for this element in some language but don't use it")

I'm having some trouble wrapping my brain around that. Why would we
want to put a name for an element/attribute/class/macro in the ODD
and then tell the ODD processor not to use it?

Maybe what you're objecting to is that there can only be 1 identifier
per any given language, except for English, which (because it is the
canonical language) has 2 possibilities?

On quick thought, this doesn't bother me too much. Nothing stops a
user from using RFC 3066 tags in a more fine-grained manner: 
  <altIdent xml:lang="en-long">hypertextDivision</altIdent>
  <altIdent xml:lang="en-short">hyperDiv</altIdent>
  <altIdent xml:lang="en-cute">ldb</altIdent>

Notes
-----
* Remember, that these names are not *in* a natural language -- they
  are arbitrary identifiers chosen to be easy for native speakers of
  a particular natural language to remember. This, IMHO, is an
  argument for not using xml:lang=.

* IIRC, the only special restriction RFC 3066 places on the second
  subtag is that it may not be two letters long unless it is a
  country code from ISO 3166 or whatever. (And in which case it is
  recommended, but not required, that it be in upper case.) Also
  1-letter codes may not be registered with IANA, but I don't think
  that's an issue here :-)