[tei-council] chapter proposal

Lou Burnard lou.burnard at oucs.ox.ac.uk
Wed Mar 5 05:10:28 EST 2008


Dear Brett

This is an interesting proposal, to say the least. The TEI Council is 
the appropriate forum to take it forward in any shape,  so I am copying 
this and your message to them. As it happens there is a meeting of the 
Council early next month,  and although the agenda is already rather 
full, I think this could be added to it. 

It is very rare indeed for a chapter to get into the Guidelines without 
being chewed over by a workgroup of some sort, and then submitted to the 
Council for approval. The Council has in the past chartered workgroups 
to address specific areas which it felt needed attention in the 
Guidelines: I cannot predict whether it would want to do the same for 
this topic, or whether it might instead feel it should become a free 
standing document (not having read your draft closely I am not even yet 
sure what I think myself). If the Council does decide to charter such a 
group, it would probably ask you (and other people) to suggest other 
people who you think could usefully contribute to the work, set a 
timetable and a budget for completing the draft, etc.

On the other hand, I should warn you that after the large effort in 
getting P5 to completion, I think the Council is hoping to devote more 
of its efforts to outreach and training activities than to fundamental 
changes or expansions in the Guidelines. But that need not stop this 
document becoming a useful adjunct to the TEI, if only as a recipe book 
or guide to good practice.

best wishes, and thanks again

Lou

p.s. I hope you will be considering standing for election to the TEI 
Council this year...


Brett Zamir wrote:
> Hello all,
>
> I wanted to be so bold as to offer the following for your 
> consideration in adding as a chapter or section to the TEI 
> documentation. I'm not attached to this idea, so if you think it might 
> only be useful at the wiki (if that), I'm completely resigned to your 
> decision. I have not marked up the document fully, in part because 
> this is only a draft and I have no idea if you'd like to have 
> something like this! I also realize that this could benefit from a 
> wider discussion (if you even wish to include such a chapter) and that 
> my prose can certainly be improved upon (looking for an excuse yet to 
> turn the red pen back on me?). :)
>
> I have cc'd Sebastian as this overlaps his work in particular. I 
> realize that the portion that I did not write for below, but which I'm 
> offering for your consideration, would require quite a bit of effort 
> to flesh out, so as to be both comprehensive and readable for an 
> audience unfamiliar with XSL (i.e., the proposal to list default 
> formatting behavior within the guidelines while making clear these 
> were in no way required).
>
> Again, if this is something which you feel is too 
> implementation-specific for TEI to publish as part of its guidelines, 
> I'll understand, though I attempted to implicitly make the case for 
> the existence of such a chapter within the rationales of the chapter 
> itself.
>
> I've also proposed some stylesheet changes to Sebastian which I think 
> might offer some assistance for this work as well, though I recognize 
> such work may take a pretty good amount of time if you were amenable 
> to attempting it.
>
> A first draft of the proposed document follows...
>
> best wishes,
> Brett
>
> --------------------------------------------------------------------------------------
>
>
> While TEI makes no claims about how a document is to be 
> rendered--except to the extent that it allows description of the 
> original formatting and to the extent that publishers wish to reflect 
> that formatting--it is a general expectation that the TEI documents a 
> project is creating can be rendered at some point in a more 
> human-readable manner. While the desired formatting should not prompt 
> one to violate the TEI Abstract model by altering the semantics of a 
> document in order to adjust output formatting, the process of creating 
> formatting output (or even well before doing so), may lead a project 
> to reconsider some aspects of its semantic markup, as well as the more 
> conventional means of adjusting a stylesheet, so the issues raised 
> here go beyond the work of a designer. This chapter will discuss 
> various issues related to formatting a TEI document.
>
> *Respecting the original formatting
> *
> In order to fully respect the original formatting of a document, it is 
> necessary to consider (and thoroughly use) the global @rendition 
> attribute and/or the @render attribute on <tagUsage> elements to point 
> to <rendition> elements and/or the global @rend attribute to provide 
> its own definition without referring elsewhere.
>
> While semantic hooks and reliance on the typical formatting of 
> specific tags for cases where @rend and @rendition are not needed to 
> override typical behaviors might be sufficient to create an output 
> document reflecting the formatting of the original document, if the 
> original rendering is important information to preserve, this can be 
> done more explicitly by ensuring that all elements are given a 
> <tagUsage> element which uses its @render attribute to point to a 
> <rendition> element which contains the styling details that can be 
> applied to all instances of the tag. In this way, there is no 
> ambiguity about how a tag is to be rendered. This is also recommended 
> practice when <tagUsage> elements are provided.  (see 
> http://tei.oucs.ox.ac.uk/P5/Guidelines-web/en/html/HD.html#HD57 ).
>
> However, note that <tagUsage> elements do not allow for the 
> specification of a default rendering behavior for element-attribute 
> combinations--only for specific elements regardless of any attributes 
> or attribute values. They also do not allow the specification of a 
> default behavior for an element based on its position (e.g., <quote> 
> within <cit>), so as this point, it may be inevitable to heavily rely 
> on @rend and @rendition if one must thoroughly indicate this original 
> formatting, independent of a stylesheet.
>
> For maximum specificity in encoding formatting details, the content 
> supplied within <rendition> and @rend ought to use a formal formatting 
> language. However, besides not specifying a particular formatting 
> language that ought to be used, at present TEI does not offer a 
> fail-safe means of translating @rend and <rendition> content into a 
> stylesheet (even though this information could potentially be used to 
> create @style attribute content, or, by using XSLT 2.0, simultaneously 
> create a CSS stylesheet alongside the X/HTML output.
>
> While it might be sufficient in most cases, there is no formal means 
> at present to express that a particular style rule should target CSS 
> pseudo-elements (like :before, :after, :first), such as one might wish 
> to do in specifying the addition of distinct content at the beginning 
> or end of a tag (e.g., adding left and right curly quotes to a <q> 
> element; this is nothing to speak of needing to target a specific 
> attribute, say xml:lang to determine which type of quotation marks to 
> add). And even where the output does faithfully reflect the original, 
> it should be fairly common to need to optimize the resulting CSS 
> stylesheet, since it would likely be fairly large with there being no 
> means besides <tagUsage> to indicate a behavior to apply to 
> element-attribute combinations or elements in particular positions.
>
>
> *Original vs. output rendering*
>
> One might find oneself tempted to force the rendition mechansims in 
> TEI (@rend or @rendition or <tagUsage>'s @render with <rendition>) to 
> go beyond their intended use for describing the <emph>original</emph> 
> rendering of a document. However, it is important to keep in mind that 
> if one wishes to control how the formatting will be output, 
> independently of the original formatting (adding details of formatting 
> not expressed about the original with the TEI render mechanisms or 
> overriding these details), one must not subvert the semantics of a TEI 
> document (at the risk of introducing TEI non-conformance and 
> interoperability issues) for the sake of controlling formatting--that 
> should instead be handled by a stylesheet.
>
> This might necessitate a different stylesheet or, as is probable for 
> most cases, modifications to the default stylesheets provided by TEI, 
> if the parameter options (assuming XSL stylesheets are used) are 
> insufficient to express a project's output requirements. It is, for 
> example, possible to allow the stylesheets to recognize multiple 
> attributes, even at the same time.
>
> *Returning changes to the default stylesheets*
>
> The stylesheets from TEI are evolving with the TEI project, however, 
> so it may be possible that the TEI project might be open to certain 
> changes (whether optional or required) to its default stylesheets, if 
> the changes offered may be of interest to a wider audience and TEI has 
> the resources to implement the changes. Given that the code of these 
> stylesheets is open source, it may benefit both TEI and its users as 
> well as an individual project for stylesheet improvements (or other 
> TEI resources for that matter) to be returned back to the community, 
> as it precludes the individual project from needing to make 
> modifications each time an update occurs.
>
> With more standard (but not standardized) styling expectations (and 
> stylesheet), the more likely that TEI processing applications might be 
> used to render TEI in a familiar format (such as when obtained 
> directly off the web, etc.), even while allowing publishers' full 
> freedom to deviate from such common conventions if they wish.
>
> *Influence of formatting (or accessibility) concerns on markup*
>
> While one should not subvert the semantics of a TEI document in order 
> to control formatting, besides customizing a stylesheet alone, the 
> viewing of a formatted document might prompt a project to consider 
> changes to the original TEI documents, such as:
>
> 1) giving a more detailed encoding of the original rendering (as that 
> information can be used to produce output rendering, assuming again 
> that the original document being represented indeed possesses this 
> rendering), using <rendition>, <tagUsage render>, @rendition, and @rend
> 2) adding more semantic "hooks" whether this is the use of hitherto 
> unused elements or attributes such as @n or @type (potentially with 
> the adding of generic elements like <seg>) which can provide more 
> semantic detail about certain text that can in turn be targeted by a 
> stylesheet to provide more granular control in output formatting. This 
> may also have the benefit of providing more semantic richness to the 
> document (ideally using the more specific elements already recommended 
> for this purpose). Such semantic 'hooks' can also be of the variety 
> that ensures that the output formatting includes sufficient 
> accessibility features such as to make available alternative text 
> along with any graphics or images that could not otherwise be 
> interpreted by a speech browser.
>
> *Semantic information and output formatting
> *
> Besides formatting concerns leading one to add additional semantic 
> distinctions into a TEI document, one may also wish to encode a 
> certain degree of semantic information (to the extent allowed in the 
> output formatting language) into one's formatting output and consider 
> the extent to which output formatting markup is separated from any 
> more generic output structural markup (e.g., creating CSS to hold 
> styles with XHTML used to present the structure or encoding structural 
> and formatting markup together). These are both discussed below.
>
> *Encoding semantic information within formatted output
> *
> While it may often be the case that TEI will be converted to a 
> formatted output in which semantic information is lost, certain output 
> formats allow some if not all semantic information to be retained in 
> some manner. For example, XHTML can use the approach of microformats 
> (http://microformats.org) to use the global and generic XHTML @class 
> attribute to contain information such as the original TEI tag name. 
> While it would likely be too cumbersome to originate documents in such 
> a format (assuming all TEI semantics could be encoded with such an 
> approach), it offers the advantage that one might, for example, use a 
> web browser to obtain a document already pre-rendered, yet use a 
> microformat processor within the browser (possibly available as a 
> browser extension) to search for semantic information.
>
> *Encoding formatting within structural and semantic output*
>
> It has become a generally recommended practice for even XHTML 
> documents on the web to separate their formatting content (as with 
> CSS) into a separate file from the structural content (of paragraphs, 
> generic divisions, etc.). This offers various advantages such as speed 
> in downloading (by browser caching for repeatedly used stylesheet 
> files or by those using speech browsers being able to avoid 
> downloading visually-oriented stylesheets), or flexibility in 
> subsequent style changes. While one might define an XSL stylesheet to 
> create specific XHTML @class attribute values which are associated 
> with those classes targeted in a predefined CSS stylesheet, XSLT 2.0 
> might be employed to utilize information such as contained within 
> @rend or <rendition> elements to specify the creation of a CSS 
> document while also simultaneously creating the XHTML output document. 
> See the sections on preserving original formatting.
>
> Despite the generally recommended practice of separating styles from 
> structure and semantic information, given the present absence of a 
> means of making queries which utilize style information contained in 
> separate files, it may be conceivable for some to wish to have their 
> formatting output mixed in with structural output (the @style 
> attribute might be harder to parse in a query than if specific XHTML 
> formatting structures were used--even though these may be deprecated 
> in later versions)--just as one might prefer to encode say italic 
> emphasis using <emph n="italic" rend="font-style:italic"> rather than 
> use a more formal but correct syntax of <emph rend="font-style: 
> italic;"> since the former is easier to parse--so that queries can 
> take advantage of both styling information and/or semantic 
> information. For example, if one views a document and sees that italic 
> text is used for emphasis, one might wish to search for a certain 
> phrase contained within italic text, for example, because one recalled 
> the text occurring there, or because one identified a pattern 
> represented by italic text but where one did not know what the exact 
> name of the pattern was, and thus not knowing what specific tag one 
> must search for to find the desired text.
>
>
> *Consideration of default transformation behavior
>
> *While, as mentioned earlier, there is no required mapping of TEI 
> elements and attributes to specific output document structures (e.g., 
> XHTML/CSS, LaTeX, etc.), the fact that TEI provides a default set of 
> stylesheets to work with (albeit a parameterized one) and that these 
> are presumably well-used [by the number of downloads????] indicates 
> that there are a general set of expectations about how most TEI 
> structures will appear when output. The effort required to create 
> one's own stylesheets from scratch for such a large vocabulary as TEI 
> provides, or even to significantly modify existing stylesheets (no 
> less each time as improvements and adjustments are made to the default 
> files), also makes the understanding of how documents will be 
> transformed an imperative for many projects. Thus, it becomes 
> necessary to understand how TEI might commonly be transformed (or 
> understood to be transformed), even beyond the extent to which the 
> stylesheets themselves are documented and express (mostly in technical 
> language) the templates used to transform TEI into a formatting language.
>
> The default stylesheets provided by TEI serve as a good basis for 
> discussion on how formatting can be performed and are documented here 
> for the sake of those who wish to know how each structure they might 
> use in a TEI document might be rendered by default. The stylesheets 
> nor this discussion should be taken as any kind of requirement to use 
> these stylesheets as a base, or even at all.
>
> *General categories of elements to consider in formatting (or not 
> formatting at all)*
>
> Before considering the usual rendering and default rendering options 
> for specific elements (along with any specific attributes), it is 
> worth considering some general issues pertaining to certain types of 
> elements.
>
> *Likelihood of printing out specific elements*
>
> Elements differ in the likelihood a project will wish to render them 
> in a formatted output document. They range from editorial information 
> which might never be printed out for common viewing, to elements which 
> will sometimes be printed out (such as a <choice> listing the original 
> text and a regularized or corrected form) to elements such as 
> paragraphs which almost certainly will be printed out.
>
> For the case of those which will always be printed out, one can make 
> their styling explicit by using <tagUsage> elements with a @render 
> attribute pointing to a <rendition> element with the styling details 
> (and optionally the code). One might optionally even indicate specific 
> elements which should not be displayed (though depending on the 
> stylesheet language, this might not strictly be necessary).
>
> Moreover, a stylesheet might wish to depend on element-specific or 
> global attributes (whether semantic or rendition-related) to target 
> elements with or without these attributes or with specific values to 
> display or not display them selectively.
>
> Elements which occur in the header, will generally not be printed out, 
> though for some project's purposes, display of this information (e.g., 
> bibliographic data) may be useful to include in the formatting output.
>
> While other elements that occur within the running text will generally 
> be printed out, it is important to understand that with TEI--which, as 
> cannot be emphasized enough, is not a formatting language--this will 
> not always be the case. If one has editorial information that should 
> not be printed out within the running text (or at least should not 
> appear alongside the running text), as a project might not wish the 
> encoder-added information to disrupt the flow of the text (e.g., of a 
> narrative) and for which it might even be considered irreverent by 
> some viewers (such as for scriptural works), it will be important to 
> be aware of all such tags that a project might not want printed out so 
> that the stylesheet (possibly in conjunction with special semantic TEI 
> markup if not markup indicating original rendering) does not display 
> those tags' content.
>
> Elements which are defined by the following macros are generally not 
> to be displayed:
>
> 1) macro.limitedContent 
> (http://www.tei-c.org/release/doc/tei-p5-doc/html/ref-macro.limitedContent.html 
> ): desc, fDescr, figDesc, fsDescr, meeting, rendition, tagUsage, witness
> 2) macro.phraseSeq.limited 
> (http://www.tei-c.org/release/doc/tei-p5-doc/html/ref-macro.phraseSeq.limited.html 
> ):  activity, age, authority, channel, classCode, constitution, 
> creation. derivation, domain, factuality, funder, interaction, interp, 
> langKnown, language, locale, metSym, preparedness, principal, purpose, 
> resp, span, sponsor, valDesc
>
> Moreover, there are some elements such as those in model.noteLike 
> (<note> and <witDetail>) which while they might occur in the outputted 
> document, might also in other cases or within some projects not always 
> be rendered if at all. model.global.meta with members such as alt, 
> altGrp, certainty, fLib, fs, fvLib, index, interp, interpGrp, join, 
> joinGrp, link, linkGrp, respons, span, spanGrp, and timeline as well 
> as elements containing the global @exclude attribute may or may not be 
> output when included within a document.
>
> Still others include items such as may be contained within <choice>: 
> abbr, am, corr, ex, expan, orig, reg, seg, sic, unclear .. A project 
> may need to consider whether to output these elements with both 
> choices being shown in some manner (even as a mere tooltip that is 
> exposed when certain text is hovered over) or whether to only show one 
> of the choices (such as that reflecting the original or some 
> regularization, correction, expansion, or abbreviation).
>
> Likewise with elements belonging to model.pPart.transcriptional:  add, 
> app, corr, damage, del, orig, reg, restore, sic, supplied, unclear . 
> One may or may not wish to indicate<supplied> text for example, or 
> choose how to format damaged sections in some particular manner.
>
> (any others????)
>
> Since there is no way of knowing whether some of the elements 
> mentioned above such as <note> refer to text that should be printed 
> out or not, one must rely on other mechanisms to specify or glean this 
> information. One way would be to use attributes, such as @resp to 
> detect whether the note was the responsibility of a markup editor of 
> the document, or whether it was provided by an original annotator of 
> the document. However, as the detection of this might not always be 
> clear (especially if the markup annotator also served as the original 
> annotator), the user of other attributes such as @type, or where @type 
> is not available, possibly @n or even xml:id might be used.
>
> Note that despite its being listed above, an element such as 
> <figDesc>, while it might not normally be displayed immediately to a 
> visual browser, might still nevertheless be important (or even 
> required in some formatting languages or in use with projects needing 
> by law to adhere to accessibility regulations) for the sake of being 
> accessible to those with visual disabilities who might depend on 
> speech browsers or tooltips to be able to get a sense of what a 
> particular graphic, photograph, etc. was displaying. It is certainly 
> good practice to follow such an encoding, both within TEI documents 
> and in the formatted output, where available.
>
> *Items needing replication*
>
> Some elements or elements with certain attributes may need special 
> consideration for output such as @copyOf or <join>, as these might 
> indicate that certain formatting output might need to be created such 
> as might (as with other cases described earlier) not be evident by 
> simply stripping the markup out of the document.
>
> *Text attributes
> *
> Most attributes are used with coded values, as they are not mean to 
> represent human language or to be displayed. Text attributes represent 
> the exception to this, though it is commonly preferred for an XML 
> language to represent these attributes as elements so that further 
> nested subelements representing markup at the phrasal level, etc. can 
> be added within as needed.
>
> Text attributes have generally been removed from TEI, and some of the 
> ones that remain one might not wish to output in a formatted version 
> anyways, but if one wishes to include, for example, @reason in one's 
> output, one will be unable to add styling which depends on child 
> elements for more specific formatting since the information is 
> expressed within an attribute (but one can style differently depending 
> on the element's @xml:lang, as that does apply for text attributes, as 
> well as any other attributes on the element). Likewise for the 
> dictionary attributes, @expand, @norm, @split, @value, and @orig which 
> represent the remaining text attributes????.
>
> ((((Syd prepared a list of potential text attributes to review to see 
> if they were still text attributes--it'd be nice to be able to give 
> such an exhaustive list here.))))
>
>
> I think the element-specific details might be logically incorporated 
> as documentation elements within XSL that could be extracted for 
> automatic inclusion within the TEI reference pages, making clear that 
> the formatting discussed is only that of the default behavior used in 
> TEI-provided stylesheets (though also discussing the range of options 
> that the stylesheet makes available through parameters). I really 
> think giving awareness of these formatting issues in the context of 
> considering these elements would be more helpful than waiting for 
> people to discover them separately in the stylesheets.
>
>
> *Specific formatting for specific elements (and any attributes)
>
> *(to be displayed on reference pages?)*
> *
> *Specific formatting for specific categories of formatting (images, etc.)
> *
> (to be compiled after reference pages have their information fleshed out)



More information about the tei-council mailing list