[tei-council] MD chapter revised: namespace rules

Daniel O'Donnell daniel.odonnell at uleth.ca
Fri Apr 13 11:44:07 EDT 2007


On Fri, 2007-04-13 at 10:45 +0100, Lou Burnard wrote:

> Let me try to reformulate the issues here:
> 
> 1. "Conformance" means, fundamentally, conformance to the TEI abstract 
> model. The markup scheme defined by or applied in a TEI conformant 
> document marks up abstract concepts which are also present in the TEI 
> abstract model, and with the same meaning.

This is a definition of identity, not conformance, IMO. For me
conformance means the same as what I've always understood "clean" to
mean: not necessarily identity, but any changes that have been made 
a) can be undone using canonical tools that must be provided with the
document (i.e. <equiv> and the like and ODDs)
b) are made using provided mechanisms for making such changes.

a) is fundamental; b) is slightly less so since the end of SGML days.

> 
> 2. The particular part of the TEI abstract model which a set of document 
> uses can be expressed in three ways, only two of which can be 
> automatically checked:
> 
> a) the application of the elements is something that a human reader 
> recognizes as plausible i.e. the thing tagged <l> does actually contain 
> a line of verse
> 
> b) particular things are called by particular names i.e. the thing 
> containing a line of verse is called <l> (in the TEI namespace) and not 
> <line>.  Namespaces are a useful way of asserting whether the "l" here 
> is the TEI "<l>" or some other one.
> 
> c) TEI syntactic rules are respected e.g. <l>s appear inside <div>s but 
> not vice versa. Validating this is what schemas are for.

Or you could have renamed <l> <line> and 
i) properly documented this with <equiv> (a local renaming) and/or
ii) built it into an ODD (as in Tite, as proposed in this discussion)
or 
iii) changed it back before shipping it to somebody else (interchange).

All three of these methods are essentially methods of achieving your
states b) and/or c) and are open to automatic checking.

I hasten to note moreover, that I am saying these should be allowed ONLY
on the condition that they can be reversed on a 1:1 basis and reversed
using supplied information. I.e. if you don't use equiv, introduce a
conflict to the existing TEI namespace you are using, and/or don't
supply the ODD, the document is not conformant.

> 
> 3. Modification/customization/personalization is something we do in 
> order to generate and document a schema, so it is about both (a) and 
> (c). A modification which replaces the <desc> of <l> to say that it is 
> for typographic lines is an unclean one, as is one which modifies its 
> content model to permit <div>s within it -- in the same way   and for 
> the same reasons. Only "clean"[1] modifications are conformant (this 
> doesnt mean that unclean modifications are evil, just that they are 
> different. Most improvements to the TEI scheme start off life as unclean 
> modifications.)
> 
> 4. We are disagreeing about whether or not a modification which is a 
> simple renaming is unclean, i.e. about whether 2(b) is somehow different 
> from the other kinds of conformance constraint. I can identify the 
> following reasons for this disagreement:
> 
>   i/ We are so used to seeing different names for the same thing that it 
> just doesn't seem important to insist on using a specific name, or to 
> insist that names declare their namespace. (cf the pain of going from 
> case-insensitive to case-sensitive identifiers which some of us old'uns 
> are still suffering from).

Not a great reason, but not a bad one either for projects that have
invested immense amount of time in legacy training and/or data. And not
by any means a fatal error--it allows more things to be acceptable P5
without polluting the namespace because it is automatically undoable if
the document is conformant in the way I've described it.

> 
> ii/ We think many people will find namespace technology a step too far 
> and that this fear will lead them (or us) into all manner of folly

Not a good reason in my view. I agree with whoever it was who pointed to
the parallel with SGMLers attitude towards the question of throwing
errors in XML.

> 
> iii/ We know how to easily convert document instances which use renaming 
> into ones which don't (whereas conversion of other kinds of uncleanly 
> modified documents is in general problematic or impossible)

This is THE crucial distinction in my view. I still thing "conformant
documents are identical to canonical TEI or cleanly and recoverably
modified from it" is the principle we should have.

> 
> iiii/ We (or me at least) are not quite sure how to support extra 
> namespaces for renaming modifications with the current tool kit

Not such a good reason in my view.

> 
> v/ Namespaces are just not the same kind of constraint as schemas -- in 
> particular they are open-ended: any element can assert that it belongs 
> to a given namespace and the assertion cannot be validated


Not really an issue.
> 
> 
> Of these I think all but iii are fairly weak. If we take iii seriously 
> though, maybe what it shows is that we do need an extra concept. We have 
> previously spoken about "conformance" and "interchange format" and 
> havered somewhat about the distinction between the two. Maybe what we 
> need instead is a concept of "canonical representation".
> 
> How about this as a compromise:
> 
> A conformant TEI document can take either of two forms
> 
> * local form -- in which ODD-documented renamings are permitted to join 
> the TEI namespace
> * canonical form -- in which ODD-documented renamings must either be 
> converted to their equivalent TEI form, or must be assigned to some 
> other namespace

Basically this looks like "conformant" (in my non-"identical" sense
above) and "identical" (in your "conformant" sense, above).  I think
"local" is a misnomer, however, since that implies that it is for local
use only, whereas your tagging it as "conformant" and the provisos you
add make it clear that it is useful for interchange. If you revise the
definition of local to "in which ODD-documented renamings THAT DO NOT
CONFLICT WITH CANONICAL FORMS" to you definition of local (capitals =
<hi> not <shout>), then you have a form that can be pretty freely
circulated as long as the ODD is included--which given the big deal we
are making of it, seems not unreasonable.

As I read this, what is striking to me that the difference between what
you call "local" (if you edit it as I suggest) and "canonical" forms
really involves where the processing is done: on my computer or on
yours. This suggests to me they may be from an XML point of more much of
a more-or-less-ness.

> 
> Probably enough to be going on with, but I would appreciate some 
> indication of where in this diatribe you stopped saying "yeah yeah we 
> know that"

Hopefully the annotations will help ;)

> 
> 
> L
> 
> 
> [1] "clean" as defined in the current draft for MD
> 
> 
> 
-- 
Daniel Paul O'Donnell, PhD
Department Chair and Associate Professor of English
Director, Digital Medievalist Project http://www.digitalmedievalist.org/
Chair, Text Encoding Initiative http://www.tei-c.org/

Department of English
University of Lethbridge
Lethbridge AB T1K 3M4
Vox +1 403 329-2377
Fax +1 403 382-7191
Email: daniel.odonnell at uleth.ca
WWW: http://people.uleth.ca/~daniel.odonnell/




More information about the tei-council mailing list