Stand-Off Markup Workgroup
John Unsworth
jmu2m at virginia.edu
Mon Nov 25 18:58:46 EST 2002
Council members might want this at hand during the conference call
tomorrow, Tuesday.
John
<p>>X-Sender: dgd at mama.stg.brown.edu (Unverified)
>Date: Mon, 25 Nov 2002 15:33:10 +0100
>Reply-To: "TEI Stand-Off Markup, XLink, XPointer WG"
><TEI-SOM at listserv.brown.edu>
>Sender: "TEI Stand-Off Markup, XLink, XPointer WG"
><TEI-SOM at listserv.brown.edu>
>From: "David G. Durand" <David_Durand at brown.edu>
>Subject: Rough outline of topics and deliverables and outstanding issues
>To: TEI-SOM at listserv.brown.edu
>
>New copy of hasty converted file.
>
>[Note to Lou: these are notes, not a proper working document at this
>point. I understand that work group products must and will be proper
>TEI XML. This isn't even proper HTML (though the tags do match)]
>
>
>TEI-export temp.tbx
> * TEI
> * For call to TEI council!
>
> * Collaboration between ISO TC4 and TEI.
>
> * [not an issue for SOM group]Henry Thompson has explicitly
> requested comments about SDATA, because the W3C is becoming aware that
> the Private-use area is not a legitimate solution to the non-unicode
> character problem.
>
> * notes
> * What we're doing
>
> * XPointer will be used for everything, at least everything
> that is hypertext-like. Because TEI extended pointers will no longer be
> needed to address external elements, or to select ranges, Hypertext
> linking attributes will no longer be IDREFs by default, but rather URI
> references. This means that TEI linking elements can be standard XLinks,
> which also increases the power and standards-reliance of the TEI. The TEI
> DTD will need to be changed so that the DTD can indicate the fact that
> the new, specialized CDATA attributes contain Uniform Resource Identifiers.
> * This can perhaps be done by using the type system of a
> schema language, or may bn reflected by a special entity in the DTD. It
> is an open question whether we would like to accomodate a standard option
> for reverting the URI references to IDREFs. This is still an open issue
> for the committee.
> * a new parameter entity will be used to define an attribute
> value type (TEI terminology?), that will declare it as a "pointing
> attribute". An IDREF attribute will mutate into a URI Reference attribute
> that will contain a single URI reference according to the proper IETF RFC
> ????. An IDREFS attribute will contain multiple URI references, separated
> by one or more spaces. The base URI used for the URI references will be
> determined following the appropriate XML rules, including xml:base.
> * Why xpointer
>
> * Normalizes ranges
> * Makes external file references simple
> * Deliverables
> * rewritten hypertext chapter (SA)
>
> * This chapter as it stands will need to be re-done almost
> completely. We have not discerned differences significant enough to keep
> the old mechanisms in place.
>
> * We have discovered some things that the TEI will
> legitimately need to extend: e.g. the canonical reference mechanism. This
> should be possible by means of a new scheme.
>
> * The current treatment of ranges will change significantly
> with the use of XPointer. The cascading pointer mechanism will be
> eliminated for normal links entirely. The entity-reference technique may
> well be worth abandoning, or at least deprecating.
>
> * We recommend, though we cannot perform the entire revision
> ourselves, that IDREFs genrally be replaced by URI references (and
> XPointer, as needed). In this case, the cascading or indirect pointer can
> be completely eliminated from the TEI. This also simplifies TEIU linking
> applications.
>
> * The biggest lack is regular expression selection of elements
> and attributes (non-critical, we believe) and regular expression
> selection within body text (critical, most of us think).
> * New description of standoff markup for TEI
>
> * This document will describe the way in which TEI
> documents can create standoff markup of other documents, in a variety of
> media types. This will be handled as a TEI document in which elements and
> PCDATA chunks may be replaced by specuialized linking elements specifying
> their content. This description will be integrated into the segmentation
> and Alignment chapter.
> * While a conref-like facility is a tempting idea, we will
> not implement a global attribute that pulls remote data into its content.
> This means that any external annotation will be done by means of creating
> a special reference element as a child in the position where the remote
> data would appear. In the case where the entire content of some element
> is remotely stored, that element will simply have only one child -- the
> standoff markup link.
> * The standoff markup link differs from a standard link in
> several ways:
> * It has special semantics of establishing a parent-child
> relationship between it's parent element and the element or elements it
> points to.
> * It is an (not?? I no longer am sure this condition is
> needed) an error for a standoff markup element to indirectly refer to
> itself in this way. While cyclic annotations are possible, they should be
> expressed with links, not standoff markup. This condition corresponds to
> the layering conditions typically imposed by linguistic markup. It also
> enables the creation of "flatttened" documents, by starting at one
> document, and treating standoff links to other documents as inclusion
> links (although some standoff links may remain in cases where children
> are shared).
> * The range of elements selected by a standoff markup
> element must meet the same tag-balancing conditions as the XML standard
> imposes on an external entity.
> * Notes on media formats
>
> * We will create a document (which might be an appendix)
> explaining critical issues in media formats (and delimiting which ones
> cannot be solves by the TEI, for instance, varying timebases interaction
> with format conversions). This means that linking in and out of media
> will be storage format dependent. We cannot speak for the ages as to the
> best formats (in fact they change a lot), but should give some general
> principles for such decisions, and "good practice" suggestions based on
> those principles.
> * Corpus notes
>
> * These will take the form of a moderate requrite to the
> current chapter, keeping the basic ideas, but using the new mechanisms
> (Xpointer, XLink, etc).
>
> * There will also be a good practice appendix associated with
> this section, describing such issues as the recommendation for separate
> storage of annotations and "base texts" in corpora, as well as the use of
> stratified annotations where appropriate. We will endeavor to make all of
> the mechanisms used here be compatible with XLink and XPointer.
>
> * We should describle the use of URNs and xml:base in
> controlling the resolution of URI references in practical processing, so
> that people do not get tracpped into constantly re-writing
> inappropriately absolute URLs.
>
> * We will have corpus examples from memebers of the workgroup
> (in progress).
> * software
>
> * We have a preliminary TEI extended pointer translator already.
>
> * We will want to have a TEI "link converter" that will
> convert linking elements of P4 XML encoded texts to the new P5 format
> linking elements.
>
> * We want to have a working XPointer implementation, of at
> least limited use, with some worked examples. (This is not yet started).
>
> * brief note on graphs
>
> * We will present alternatives using RDF and topic maps for
> the representation of graphs. These will be in the form of a note to the
> editors, rather than a complete specification, as it's really a sideline.
>
> * At the minimum, we will clean up the RDF note already
> written, as that may be of use.
> * update of canonical reference chapter
> * Issues
>
> * A variety of issues drawn from little notes:
> * Alternatives and Joins
>
> * What do we do about this?
> * Is the form of the deliverable just instructions to the
> editors, or do we want to revise the chapter itself? At the least, it
> seems that we will need to allow/require XPointer rather than IDREFs if
> we want these elements to be maximally useful for linguistic annotation,
> because we neeed to be able to transparently annotate external files.
> * Idrefs
>
> * ID/IDREF complete unanimity on the hypertext area. strong
> recommendation for the rest of the TEI, where ID/IDREF come into play.
> * IDREF fallback
>
> * It is still an open question whether we should make it easy
> for simple hypertext applications to fall back to the use if ID/IDREF.
>
> * Advantages:
> * This allows people to use very simple markup.
> * ID/IDREF are checked by any validating parser.
>
> * Disadvantages:
> * The required # sign will be confusing to those already used
> to ID/IDREF (if they are unfamiliar with HTML).
> * link checking software willl be required for project
> production personnel to check links, because the lost functionality of
> ID/IDREF is very important.
> * TEI Graphs
>
> * Graphs recommend re-orientation to RDF or Topic Maps. Choice
> not yet laid out, but an initial study of options has been done by Chris
> Catton.
> * canonical references
>
> * We need to consider chap 32: algorithm for canonical
> references.
> * working document
> * Schema language
>
> * This is an place where we could take a more relaxed approach
> than our predecessors, perhaps aided by more modern schema technologies.
> * At the TEI meeting, Sebastian explained a plan whereby the
> guidelines would describe practice in a way that would allow automatic
> generation of the relevant TEI schemas directly from the guidelines. This
> would mean that the guidelines would be capable of supporting XML Schema,
> RelaxNG, XML DTDs, and maybe even SGML DTDs, because the actual
> declarations would not be part of the document.
> * This makes our job easier, because the TEI guidelines could
> define new attribute types: Pointer, which could be IDREF or XPointer
> depending on the user's needs,
> * Pointers, which could be IDREFS or a list of XPointers.
> * Examples for TEI
>
> * Xlink examples from corpus-people.
> * literary exasmples (markup of a CD-ROM?)
>
> * Conversions of existing examples (which are decent), but
> whose context may have changed radically.
> * How do namesppaces get declared in TEI document instances
> * if we use xlink for instance, is there a standard prefix
> declared? this is needed to use DTDs for instance, so that arrributes in
> other namespaces can be included. WE need this for xlink.
> * Some data model issues
>
> * relations, and entities:
>
> * cascading of ordering schemes and annotations, vs. explicit
> orderings of items
> * linear orderings within which ranges are objects
>
> * differences between various decisions on ordering of ranges.
>
>-- ------------------------ David Durand Adjunct Associate Professor
>(research), Brown University. VP, Software Architecture, Ingenta Cell: +1
>401-935-5317 FAX: +1 401-331-2015
More information about the tei-council
mailing list