Stand-Off Markup Workgroup

Mon Nov 25 18:58:46 EST 2002

Council members might want this at hand during the conference call 
tomorrow, Tuesday.

John

<p>>X-Sender: dgd at mama.stg.brown.edu (Unverified)
>Date:         Mon, 25 Nov 2002 15:33:10 +0100
>Reply-To: "TEI Stand-Off Markup, XLink, XPointer WG" 
><TEI-SOM at listserv.brown.edu>
>Sender: "TEI Stand-Off Markup, XLink, XPointer WG" 
><TEI-SOM at listserv.brown.edu>
>From: "David G. Durand" <David_Durand at brown.edu>
>Subject:      Rough outline of topics and deliverables and outstanding issues
>To: TEI-SOM at listserv.brown.edu
>
>New copy of hasty converted file.
>
>[Note to Lou: these are notes, not a proper working document at this
>point. I understand that work group products must and will be proper
>TEI XML. This isn't even proper HTML (though the tags do match)]
>
>
>TEI-export temp.tbx
>    * TEI
>        * For call to TEI council!
>
>        * Collaboration between ISO TC4 and TEI.
>
>        * [not an issue for SOM group]Henry Thompson has explicitly 
> requested comments about SDATA, because the W3C is becoming aware that 
> the Private-use area is not a legitimate solution to the non-unicode 
> character problem.
>
>        * notes
>            * What we're doing
>
>            * XPointer will be used for everything, at least everything 
> that is hypertext-like. Because TEI extended pointers will no longer be 
> needed to address external elements, or to select ranges, Hypertext 
> linking attributes will no longer be IDREFs by default, but rather URI 
> references. This means that TEI linking elements can be standard XLinks, 
> which also increases the power and standards-reliance of the TEI. The TEI 
> DTD will need to be changed so that the DTD can indicate the fact that 
> the new, specialized CDATA attributes contain Uniform Resource Identifiers.
>            * This can perhaps be done by using the type system of a 
> schema language, or may bn reflected by a special entity in the DTD. It 
> is an open question whether we would like to accomodate a standard option 
> for reverting the URI references to IDREFs. This is still an open issue 
> for the committee.
>            * a new parameter entity will be used to define an attribute 
> value type (TEI terminology?), that will declare it as a "pointing 
> attribute". An IDREF attribute will mutate into a URI Reference attribute 
> that will contain a single URI reference according to the proper IETF RFC 
> ????. An IDREFS attribute will contain multiple URI references, separated 
> by one or more spaces. The base URI used for the URI references will be 
> determined following the appropriate XML rules, including xml:base.
>            * Why xpointer
>
>            * Normalizes ranges
>            * Makes external file references simple
>        * Deliverables
>            * rewritten hypertext chapter (SA)
>
>            * This chapter as it stands will need to be re-done almost 
> completely. We have not discerned differences significant enough to keep 
> the old mechanisms in place.
>
>            * We have discovered some things that the TEI will 
> legitimately need to extend: e.g. the canonical reference mechanism. This 
> should be possible by means of a new scheme.
>
>            * The current treatment of ranges will change significantly 
> with the use of XPointer. The cascading pointer mechanism will be 
> eliminated for normal links entirely. The entity-reference technique may 
> well be worth abandoning, or at least deprecating.
>
>            * We recommend, though we cannot perform the entire revision 
> ourselves, that IDREFs genrally be replaced by URI references (and 
> XPointer, as needed). In this case, the cascading or indirect pointer can 
> be completely eliminated from the TEI. This also simplifies TEIU linking 
> applications.
>
>            * The biggest lack is regular expression selection of elements 
> and attributes (non-critical, we believe) and regular expression 
> selection within body text (critical, most of us think).
>                * New description of standoff markup for TEI
>
>                * This document will describe the way in which TEI 
> documents can create standoff markup of other documents, in a variety of 
> media types. This will be handled as a TEI document in which elements and 
> PCDATA chunks may be replaced by specuialized linking elements specifying 
> their content. This description will be integrated into the segmentation 
> and Alignment chapter.
>                * While a conref-like facility is a tempting idea, we will 
> not implement a global attribute that pulls remote data into its content. 
> This means that any external annotation will be done by means of creating 
> a special reference element as a child in the position where the remote 
> data would appear. In the case where the entire content of some element 
> is remotely stored, that element will simply have only one child -- the 
> standoff markup link.
>                * The standoff markup link differs from a standard link in 
> several ways:
>                * It has special semantics of establishing a parent-child 
> relationship between it's parent element and the element or elements it 
> points to.
>                * It is an (not?? I no longer am sure this condition is 
> needed) an error for a standoff markup element to indirectly refer to 
> itself in this way. While cyclic annotations are possible, they should be 
> expressed with links, not standoff markup. This condition corresponds to 
> the layering conditions typically imposed by linguistic markup. It also 
> enables the creation of "flatttened" documents, by starting at one 
> document, and treating standoff links to other documents as inclusion 
> links (although some standoff links may remain in cases where children 
> are shared).
>                * The range of elements selected by a standoff markup 
> element must meet the same tag-balancing conditions as the XML standard 
> imposes on an external entity.
>            * Notes on media formats
>
>            * We will create a document (which might be an appendix) 
> explaining critical issues in media formats (and delimiting which ones 
> cannot be solves by the TEI, for instance, varying timebases interaction 
> with format conversions). This means that linking in and out of media 
> will be storage format dependent. We cannot speak for the ages as to the 
> best formats (in fact they change a lot), but should give some general 
> principles for such decisions, and "good practice" suggestions based on 
> those principles.
>            * Corpus notes
>
>            * These will take the form of a moderate requrite to the 
> current chapter, keeping the basic ideas, but using the new mechanisms 
> (Xpointer, XLink, etc).
>
>            * There will also be a good practice appendix associated with 
> this section, describing such issues as the recommendation for separate 
> storage of annotations and "base texts" in corpora, as well as the use of 
> stratified annotations where appropriate. We will endeavor to make all of 
> the mechanisms used here be compatible with XLink and XPointer.
>
>            * We should describle the use of URNs and xml:base in 
> controlling the resolution of URI references in practical processing, so 
> that people do not get tracpped into constantly re-writing 
> inappropriately absolute URLs.
>
>            * We will have corpus examples from memebers of the workgroup 
> (in progress).
>            * software
>
>            * We have a preliminary TEI extended pointer translator already.
>
>            * We will want to have a TEI "link converter" that will 
> convert linking elements of P4 XML encoded texts to the new P5 format 
> linking elements.
>
>            * We want to have a working XPointer implementation, of at 
> least limited use, with some worked examples. (This is not yet started).
>
>            * brief note on graphs
>
>            * We will present alternatives using RDF and topic maps for 
> the representation of graphs. These will be in the form of a note to the 
> editors, rather than a complete specification, as it's really a sideline.
>
>            * At the minimum, we will clean up the RDF note already 
> written, as that may be of use.
>            * update of canonical reference chapter
>        * Issues
>
>        * A variety of issues drawn from little notes:
>            * Alternatives and Joins
>
>            * What do we do about this?
>            * Is the form of the deliverable just instructions to the 
> editors, or do we want to revise the chapter itself? At the least, it 
> seems that we will need to allow/require XPointer rather than IDREFs if 
> we want these elements to be maximally useful for linguistic annotation, 
> because we neeed to be able to transparently annotate external files.
>            * Idrefs
>
>            * ID/IDREF complete unanimity on the hypertext area. strong 
> recommendation for the rest of the TEI, where ID/IDREF come into play.
>            * IDREF fallback
>
>            * It is still an open question whether we should make it easy 
> for simple hypertext applications to fall back to the use if ID/IDREF.
>
>            * Advantages:
>            * This allows people to use very simple markup.
>            * ID/IDREF are checked by any validating parser.
>
>            * Disadvantages:
>            * The required # sign will be confusing to those already used 
> to ID/IDREF (if they are unfamiliar with HTML).
>            * link checking software willl be required for project 
> production personnel to check links, because the lost functionality of 
> ID/IDREF is very important.
>            * TEI Graphs
>
>            * Graphs recommend re-orientation to RDF or Topic Maps. Choice 
> not yet laid out, but an initial study of options has been done by Chris 
> Catton.
>            * canonical references
>
>            * We need to consider chap 32: algorithm for canonical 
> references.
>            * working document
>            * Schema language
>
>            * This is an place where we could take a more relaxed approach 
> than our predecessors, perhaps aided by more modern schema technologies.
>            * At the TEI meeting, Sebastian explained a plan whereby the 
> guidelines would describe practice in a way that would allow automatic 
> generation of the relevant TEI schemas directly from the guidelines. This 
> would mean that the guidelines would be capable of supporting XML Schema, 
> RelaxNG, XML DTDs, and maybe even SGML DTDs, because the actual 
> declarations would not be part of the document.
>            * This makes our job easier, because the TEI guidelines could 
> define new attribute types: Pointer, which could be IDREF or XPointer 
> depending on the user's needs,
>            * Pointers, which could be IDREFS or a list of XPointers.
>            * Examples for TEI
>
>            * Xlink examples from corpus-people.
>            * literary exasmples (markup of a CD-ROM?)
>
>            * Conversions of existing examples (which are decent), but 
> whose context may have changed radically.
>            * How do namesppaces get declared in TEI document instances
>            * if we use xlink for instance, is there a standard prefix 
> declared? this is needed to use DTDs for instance, so that arrributes in 
> other namespaces can be included. WE need this for xlink.
>            * Some data model issues
>
>            * relations, and entities:
>
>            * cascading of ordering schemes and annotations, vs. explicit 
> orderings of items
>            * linear orderings within which ranges are objects
>
>            * differences between various decisions on ordering of ranges.
>
>-- ------------------------ David Durand Adjunct Associate Professor 
>(research), Brown University. VP, Software Architecture, Ingenta Cell: +1 
>401-935-5317 FAX: +1 401-331-2015