[tei-council] Schematron rules

Sebastian Rahtz sebastian.rahtz at oucs.ox.ac.uk
Tue Sep 9 05:54:01 EDT 2008


Schematron rules in ODD
-----------------------

Currently, we have no formal view on how constraints
must be expressed in ODD. We have a general element
<content> whose content model is "text" by default,
but which should be changed to "any XML", and that is
as far as we go. For the purposes of TEI P5, we
redefine <content> to be "<valList> or any RELAX NG element,
followed by any Schematron element", allowing us to
write Schematron rules to extend the RELAX NG rules. This
means that the TEI uses a conformant extension of itself.

Problems
--------
There are two problems:

  1. Schematron rules can only be expressed in the context of an
  element, which is rather counter to the spirit of it. Where do
  we say "all ID attributes must be at least 8 characters long"?

  2. A common requirement in a project ODD would be to
  *add* Schematron rules, but this at present means duplicating
  the whole of <content> in "replace" mode, since the components
  are not identifiable.

Solutions
---------

There are three possibilities for fixing this:

   1. allow Schematron rules to occur at the end of any
   <classSpec>, <elementSpec> or <macroSpec> (and even
   <schemaSpec>), just sitting there in their own namespace

   2. allow Schematron rules in
   <classSpec>, <elementSpec> or <macroSpec> (and even
   <schemaSpec>) inside a new element <constraint>,
   alongside <content> in <elementSpec>, and added to
   the other *Spec.

   3. separate the Schematron entirely from the *Spec
   and say that the whole thing must be maintained
   separately, and not able to be tied to a particular
   element. It could be dropped in under <schemaSpec>.

The first choice is inelegant, though conformant
(since the added elements would be in their own namespace),
and relatively easy to implement (a small change to the
current setup). It would mean no
change to the ODD language. The ODD processing tools
we have would be adapted in an ad hoc way.

The second choice would mean a change to ODD, as it
would add a new element with no other purpose
at present, and no default content model other than
"anyXML". It would be fairly easy to implement and
support in eg Roma. The main argument against this
is that it is an ad hoc extension to ODD, with two
elements <content> and <constraint> doing substantially
the same thing.

The third choice is simple to implement, but allows for no extension
of ODD building on ODD, or granularity in the rules.


Conclusion
----------

The disadvantages of the first and third proposals
seem to me to outweigh the issues of the second.
I therefore propose that we we add a <constraint> element, with
a content model of "any XML", in the following places

 1. as a sibling of <content> in elementSpec
 2. as a sibling of <datatype> in attDef
 3. as a child of <classSpec>
 4. as a child of <schemaSpec>

For TEI itself, we would constrain the "any XML"
as follows:

 - if the parent is <elementSpec> or <schemaSpec> allow <s:pattern>
 - if the parent is <schemaSpec> allow <s:ns> as well
 - if the parent is <classSpec> allow <s:assert>,
   and generate <s:pattern> for each member of the class.

-- 
Sebastian Rahtz      
Information Manager, Oxford University Computing Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431



More information about the tei-council mailing list