[tei-council] More thoughts about examples.

Thu Apr 30 06:40:24 EDT 2009

In Lyon we had a discussion about examples and we've been moving forward 
on some of the results of that discussion (thanks especially to 
Sebastian and David).

One of the things that I suggested was "decoupling examples from the 
Guidelines and making them referenceable" (according to the minutes, I'm 
sure I didn't put it so cogently). I just wanted to go on record about 
what I meant by this. The idea was that, as with the element/class/macro 
specifications, the underlying code for the examples should be stored 
separately and then pointed to by the Guidelines in some manner (a 
tei:ptr that is resolve when generating the Guidelines, an XInclude, 
whatever.)

The arguments against this (according to the minutes) were that a) this 
is not our most important task right now and b) decontextualised 
examples don't always make sense.  I cannot help but agree with a), but 
don't think that is a reason not to think about, just a reason to put 
its implementation at a low priority.  I also agree with b), but the 
point isn't about decontextualising the examples in the rendered version 
of the guidelines, but in the underlying infrastructure giving us a more 
resilient and flexible structure for re-use and maintenance of the 
examples in an internationalised context.

One of my arguments for decoupling the examples was that they are 
currently stored in multiple places.  If memory serves my learned 
colleagues here in Oxford argued against that during the meeting saying 
that the Guidelines are a 'single resource' and although the element 
specifications are stored separately from the prose they are really all 
part of one big document.

That is, indeed, true. Well, to a point, and only as long as we have a 
single English source.  As soon as we have multiple translations of the 
Guidelines prose, then this logic falls down.  A quick review for those 
who haven't looked at the P5/Source/ directory in subversion for awhile:

There is a master file (P5/Source/guidelines.xml) which is a symbolic 
link to P5/Source/Guidelines/en/guidelines-en.xml. (i.e. the English 
version is the current source of the Guidelines.) This file contains a 
whole bunch of entity references to each of the chapters, the 
appendices, and every element/class/macro specification as sub-files. 
(That including things by entity reference makes me feel icky is a 
matter for another discussion...)  In the separate files which make up 
the chapters, these entities can then be used to virtually include the 
specifications at the correct place.

However, my reasons for decoupling the examples are brought about mainly 
by our attempts towards internationalisation.  We have a good system of 
separation for the element/class/macro specifications which helps to 
enable the internationalisation of the contents of those files.  In most 
cases (e.g. element descriptions) this is the specifications are the 
only source for that kind of thing in the Guidelines.  This is not the 
case with examples, they appear both in the specifications and in the 
prose of the Guidelines.

As noted above currently the source of the guidelines.xml file is the 
English one.  While there is an 'fr' directory for a French version of 
the Guidelines these are all symbolic links back over the the English 
version.  (There is an HD-Header.xml with a now dated translation of the 
Header chapter, but it doesn't appear to be used in generation of the 
French version of the Guidelines online.) But the point is that our 
underlying infrastructure should accommodate the possibility of 
translations of the Guidelines in an efficient manner.

Hypothetically, let's say the French/Chinese produce a full translation 
of the Guidelines, for some of the examples they will have replaced them 
with better French/Chinese ones, for others they may have just 
translated them into French/Chinese because their textual content is 
less significant than the structure of the elements being demonstrated. 
   In other cases they might have not translated the example and just 
used the English one, for whatever reason. Let's say they are showing an 
example of the 'cb' element.  They might produce a new example which 
shows something interesting peculiar to French/Chinese texts, in our 
revision of the Guidelines we might want to use it.  Sebastian's work at 
producing element example reference pages means that someone reading the 
Guidelines will now be able to see this, but if we want to use that 
French example in our Chinese translation (or the English one!) then we 
have to duplicate the content.  This seems to go against the XML and TEI 
doctrines of storing the information once and using it in multiple 
places/manners.

To sum up, I think that we should decouple the examples from both the 
prose and element/class/macro specifications and store them in a single 
place, a corpus of examples.  These should at very least all have 
@xml:id's to make them referenceable and @xml:lang's to indicate their 
language.  Any place they existed before should point to or otherwise 
include these examples when rendering the Guidelines.   If just that was 
done then there would be no visible change to users.  However it enables 
many more possibilities:
- examples which are duplicated in more than one place (multiple 
Chapters, multiple element specifications, etc.) only need to exist once 
and be pointed to multiple times
- examples which are translations can be stored next to their originals 
and thus it be recognised that the one needs to change when the other 
does (as with element descriptions)
- examples which are used verbatim in different language versions of the 
Guidelines only need to be stored once
- it gives us a place to store new examples that people have created 
(for later inclusion in the Guidelines/specs).
- it could be implemented gradually in a modular way, people choosing to 
use it or not
- but overall it seems a more flexible and modular system than storing 
them in multiple places.

I'd argue that storing examples both in the prose of the Guidelines and 
in the element/class/macro specifications is inefficient and makes us 
prone to producing inconsistent examples.  While some may like to argue 
that theoretically these are all in one document because of those entity 
references, from a practical maintenance point of view they are in 
separate files.  I believe this impacts negatively on the way we produce 
and use examples.  There are also sorts of problems/discussions inherent 
to the implementation of this and I've intentionally left them out 
because I think that is separate from whether it is theoretically better 
to separate the examples from their multiple storage places.

I believe Fielding, paraphrasing Seneca I believe, said it best:

"It is a trite but true observation, that examples work more forcibly on 
the mind than precepts."

Sorry for the length,
-James

-- 
Dr James Cummings, Research Technologies Service, University of Oxford
James dot Cummings at oucs dot ox dot ac dot uk