Index: tei_tite.odd =================================================================== --- tei_tite.odd (revision 5763) +++ tei_tite.odd (working copy) @@ -75,43 +75,46 @@
Introduction -

This document specifies how the Tite standard should be applied. Its + +

This document specifies how TEI Tite should be applied. Its organizing model is roughly the structure of a TEI document itself, and it proceeds from high-level features to low, starting with general requirements, text structure, directions on when to group texts, considerations about type of text (genre and format), continuing down to instructions on marking phrase-level features, reference systems, and so - forth. In its original - ODD - (one document does-it-all) format, this document can generate - everything necessary for working in Tite: both documentation (this - Tite-specific prose as well as the full TEI technical documentation for - relevant elements) and schemas in either W3C Schema, RelaxNG, or XML DTD. - The Roma web tool can - generate all of these.

-

Tite-encoded documents are TEI documents, and the Tite - standard, with the exception of convenience elements (b, - i, ul, sup, sub, - smcap, cols and ornament all of which - can be converted back to canonical TEI), is a pure subset of the - TEI. That is, it was created primarily by removing - elements and attributes from the TEI, and not from extensive - modification. Thus as part of the TEI family, Tite - inherits TEI semantics, and ambiguity in this specification should - be resolved with reference to the TEI Guidelines. What makes Tite - distinct is that where the TEI in general is famously tolerant of - multiple methods of encoding a given feature, Tite seeks - uniformity of encoding through constraint, constraint via its - stripped-down tag set and via this specification.

+ forth. In its original ODD (one document does-it-all) format, this + document can generate everything necessary for working in TEI Tite: both + documentation (this Tite-specific prose as well as the full technical + documentation for each of its elements) and schemas in either W3C Schema, + RELAX NG, or XML DTD. Software utilities, including the Roma web tool, can generate + these.

+

Tite-encoded documents are TEI documents, and TEI Tite, with the + exception of convenience elements (b, + i, ul, sup, sub, smcap, + cols and ornament, all of which can be converted back to + canonical TEI), is a pure subset of the TEI. That is, it was created + primarily by removing elements and attributes from the TEI, and + not from extensive modification. As a TEI customization, Tite + inherits TEI semantics, and ambiguity in this specification should be + resolved with reference to the TEI Guidelines. What makes + Tite distinct is that where the TEI in general is famously tolerant of + multiple methods of encoding a given feature, Tite seeks uniformity of + encoding through constraint, via its stripped-down tag set and via this + specification.

+

Tite can be used to encode printed prose, poetry, drama, newspapers, and anything else which can be described with the basic TEI building-blocks of divisions, paragraphs, line groups, and speeches.

-

A note on terms: I use document to refer generally to the + +

In this documentation, document refers generally to the item (book, pamphlet, newspaper, etc.) to be encoded and text - as either linguistic material (as opposed to graphic or imagistic) or a - logically distinct literary unit.

+ to either linguistic (as opposed to graphic) material or a logically + distinct literary unit.

+
General Requirements @@ -120,39 +123,45 @@

All printed material should be captured: all text (that is, printed characters) should be transcribed and the presence of graphical items or other non-transcribable elements should be indicated with markup. +

End-of-line Hyphens +

A distinction should be maintained in the electronic transcription between end-of-line or soft hyphens (an artifact of page layout) and hard hyphens (a linguistic feature). - In the rare case of coincidence of the two types, the hyphen should be - marked as hard (as it is not properly an end-of-line hyphen at all in this - case).

+ The former should be transcribed as the SOFT HYPHEN (U+00AD) character; + the latter, as the HYPHEN-MINUS (U+002D) character generally available on + Western keyboards. In the rare case of coincidence of the two types — + where a word that would normally be hyphenated is split across a line + break at its hyphen — the hyphen should be considered hard, and + transcribed as the HYPHEN-MINUS.

+
Character Encoding -

Texts should be encoded in UTF-8. For non-keyboard characters, each - project should decide what kind of entities will serve best (mnemonic, - numerical, etc.).

+

Characters should be encoded in UTF-8. For characters not easily input + from the keyboard, use hexadecimal numeric entities (e.g. é, the small + latin e with acute accent, is represented as é).

Accuracy and Verification -

The standard for accuracy of transcription should be 99.99% (1 error in - 10,000 characters). The sample size for verification should be 5% of the - total text.

+

The standard for accuracy of transcription should be at least 99.99% (1 + error in 10,000 characters). The sample size for verification should be 5% + of the total text.

Documenting the Encoding Process

Almost unavoidably, difficult encoding situations will arise whose - resolution may not be covered by these guidelines, common - sense, or anything else close to hand. In any case like this - where there is doubt or difficulty, it is important to document the markup - choices that are made. To this end each encoded text should be accompanied + resolution may not be covered by this documentation or the TEI Guidelines. + In such cases, it is important to document the markup + choices that are made. To this end each encoded file should be accompanied by a document with such notes. These notes should reference both features of a document that seem remarkable to encoders (and how these were handled) and remarkable or non-obvious encoding decisions made by @@ -167,11 +176,11 @@ xmlns="http://www.tei-c.org/ns/Examples"> - [ TEI Header information ] + - [ front matter ... ] - [ body of text ... ] - [ back matter ... ] + + + @@ -190,11 +199,11 @@ the text element. For validation of header-less texts, if an XML DTD is being used as the schema, simply replace the TEI element with text as the root element in the document type - declaration. (I.e. <!DOCTYPE text PUBLIC . . . + declaration. (I.e. <!DOCTYPE text PUBLIC ... >.) If using RELAX NG, TEI and text are valid root elements.

-

As above, the content model for Tite is taken verbatim from the - TEI Tite, thus maintaining full flexibility for the client institution +

As above, the content model for Tite is taken verbatim from + TEI Lite, thus maintaining full flexibility for the client institution while at the same time not imposing undue burden: a valid header can include as little as a title (titleStmt), publication statement (publicationStmt), and a description of the source document @@ -213,23 +222,23 @@ texts, the basic TEI text structure is modified to look like: - [ header information for the group ] + - [ front matter for the group ] + - [ front matter of first text ] - [ body of first text ] - [ back matter of first text ] + + + - [ front matter of second text ] - [ body of second text ] - [ back matter of second text ] + + + - [ more texts or groups of texts here ] + - [ back matter for the group ] +

@@ -248,16 +257,17 @@ end; other cues can be blank pages, recurring typographical or ornamental features, or a numbering system ("Chapter 5" etc.). Also, the presence of a heading will often indicate the beginning of a division.

+

The type attribute should be used to express the type of - division being marked. Where present, use the unit name given in the - document itself. Though any constrained enumerated list of type - values will have to be determined on a job-by-job basis, some examples of - appropriate division types are: + division being marked. Where present, use a name for division type + given in the document itself. Though any constrained enumerated list of + type values will have to be determined on a job-by-job basis, + some examples of appropriate division types are: act - article + article book @@ -287,19 +297,10 @@ constrain possible types, see the University of Virginia Library's DLPS vendor specification.

-

The n attribute should be used to record sequential labels - associated with a structural division (numbers, numerals, letters). When - present, these labels should also be transcribed within the head - tag. For instance: -

- III: It Awakes - [...] -
-

-

When a heading is present, encode it with the head tag. If +

When a heading is present, encode it with the head element. If there is more than one heading at the beginning of a given division, - encode each heading separately, using the type attribute to - distinguish them. Appropriate values are: + encode each heading with its own head element, using the + type attribute to distinguish them. Appropriate values are: main @@ -308,6 +309,16 @@ desc (descriptive)

+

The n attribute should be used to record sequential labels + associated with a structural division (numbers, numerals, letters). When + present, these labels should also be transcribed within the content of + head element. For instance: + +

+ III: It Awakes + +
+

False Indicators

A divisional title is a page that resembles a half-title @@ -334,9 +345,9 @@ back elements, respectively. div1 elements should contain the major sections and should be characterized by type attribute values. The exception, however, is the title page, which should - be encoded with the titlePage element. The titlePart - element should have type attributes like for head but - with one addition (volume): + be encoded with the titlePage element and its children. The + titlePart element should have a type attribute with + one of the following values: main @@ -345,7 +356,7 @@ alt (alternate title) volume (volume information) - <titePart type="volume"> should be used to + titlePart type="volume" should be used to encode volume information wherever it is found on the title page, even if it is separated from the other title information. Here is the element class that forms the titlePage content model: <!ENTITY % @@ -358,6 +369,7 @@

Common items to encode in front and back matter -- and therefore common type attribute values for front and back divisions are: + front acknowledgements advertisement castlist @@ -367,12 +379,15 @@ foreword introduction preface - appendix (back) - bibliography (back) - colophon (back) - glossary (back) - index (back) + + back + appendix + bibliography + colophon + glossary + index +

Half-title and fly-title pages may be encountered in the front matter. A half-title page precedes @@ -399,7 +414,7 @@ beginning and ending sections of letters, prefaces, diary entries, or other personal types of writing. Both elements contain: dateline: for recording time and place of composition; - use date with type value (formatted + use date with when value (formatted yyyy-mm-dd) to record date information signed: for recording a signature salute: for recording salutation at the beginning ("Dear @@ -419,30 +434,32 @@

Verse -

All verse should be encoded within at least one lg tag, even +

All verse should be encoded within at least one lg element, even when there are no distinct stanzas or when the verse is interspersed with prose. If it is known, use the type attribute to express the type of line group. Sometimes within a poem there is a question about what should be tagged as a lg or as a separate div. As a rough rule of thumb, if there is a title accompanying the division, use the div element; otherwise, use lg.

-

Each line of verse should be encoded with the l tag, and care +

Each line of verse should be encoded with the l element, and care should be taken to distinguish these logical lines of verse from lines motivated by page layout. The latter should be encoded as lbs. Thus should be encoded as AS virtuous men pass mildly away, - And whisper to their souls to go, + And whisper to their souls to go, Whilst some of their sad friends do say, - "Now his breath goes," and some say, - "No." + "Now his breath goes," and some say, "No." Also, as in the example above, use the rend attribute to mark when a line is indented more than its siblings. Using @@ -455,25 +472,30 @@

The standard TEI elements for drama should be used: sp, stage, speaker. If the who attribute is used on sp, also transcribe who is given as the speaker, in whatever - form it is written, in the speaker tag. Short pieces of + form it is written, in the speaker element. Short pieces of stage direction that accompany the speaker designation may be included in - the speaker tag.

+ the speaker element.

Scenes and acts should be encoded as appropriately nested div elements with type attributes of scene or act, respectively. Cast lists can likewise be encoded using div and type="castlist".

Prologues and epilogues can be treated as sps of their own, unless their structure would be better represented by nesting these in - div tags.

+ div elements.

Newspapers

Tite includes the elements cols and cb which are well - suited for the multi-column layout of newspapers. Decisions about how to - render the layout in markup may not be well addressed here, but as an - example of project-specific specifications see the ref, to encode a pointer to the continuation of a story in a + different column or on a different page; and figure, to describe + illustrations, advertisements, and cartoons.

+
@@ -483,58 +505,49 @@

Use the q element to encode block quotations, but not inline quotations. A block quotation is indicated by its being set off from surrounding text either with extra line-spacing or margins or with a - different typeface. If necessary, an entire quoted text can go inside of a - q tag, in which case a secondary TEI text hierarchy - should nest inside of the quotation. This is preferable to employing an - ambiguous use of div elements (where it is not clear whether the - div is a structure in the containing text or the quoted text). - Thus this is desirable: - -

[ . . . ]

- [ here's a poem ] -

[ . . . ]

- - and this is not: - - -

[ . . . ]

-
- - - - here's a poem - - - - -

[ . . . ]

-
-
-
-

-

A q tag should not have any affect on the presence of quotation - marks: if they are there, transcribe them.

+ different typeface. If the quotation is of an entire text, use the + floatingText element and its children inside the q element: + + +

+ + + + + + + + + +

+
+
+

+

If present, transcribe all quotation marks or other delimiters inside + the q element.

Figures

If a figure has a heading or caption, encode it with the head - tag. If there is associated text, simply use a p to encode it. + element. If there is associated text, simply use a p to encode it.

Tables and Lists -

If a cell in a table is a heading or a label, use the role - attribute on the cell tag and set it to label; if - the cell contains data, there is no need to use role: +

Tables and lists are encoded as in the TEI Guidelines, but note the + following.

+

If a cell in a table is a heading or a label, set the role + attribute to label; if the cell contains data, there is no + need to use role: data is the default. If a cell or row spans more than one column or row, use the rows or cols attributes set to the number of columns or rows that it spans.

If unsure about whether a structure is best encoded as a list or table, record it as a table only if it would not be properly understood without tabular layout.

-

TEI lists are either sequences of items or - label-item pairs. If a list has the latter structure, - be sure to explicitly encode each part.

+

Lists should be encoded as either sequences of items or + label-item pairs. When items in the list contain a + label, as in a gloss list, be sure to use the latter form.

Notes @@ -546,7 +559,7 @@ ref element and include the reference text as the content. In both cases, a target attribute must be supplied which contains the xml:id value of the associated note.

-

When encoding the note itself with the note tag, the +

When encoding the note itself with the note element, the xml:id and place attributes must be supplied. See the TEI documentation for acceptable values for place; the most common will be foot, end, margin-left @@ -582,7 +595,7 @@ Uncertain Blocks

In rare cases where the logical identity of a block-level element is hard to discern, use the TEI element ab (anonymous block) instead - of applying a p or div tag. In these cases, be sure to + of applying a p or div element. In these cases, be sure to document this decision in accompanying notes. Applying this element should be viewed as a last resort.

The gap element should be used when for some reason the @@ -597,7 +610,7 @@

- Phrase-level features + Phrase-level Features
Typographical Changes

There are six elements in Tite that capture specific typographical @@ -629,15 +642,14 @@ These mark the physical change, and are agnostic about a logical motivation for it. There are two exceptions to this approach, however: marking foreign words and titles. In the case of foreign words, use the - foreign tag; in the case of titles, use the title tag - only if certain that the word or phrase in question is a title. If a - phrase is, say, italicized but you are uncertain about its being a title, + foreign element; in the case of titles, use the title + element only if certain that the word or phrase in question is a title. If a + phrase is, say, italicized, but you are uncertain about its being a title, use the i tag instead. Foreign words should be marked only if they are typographically distinguished from surrounding text.

If there is a typographical feature not covered by the above elements, - the TEI hi tag is still available in Tite. Enumerated lists of - attribute values for hi's rend attribute should be - negotiated job-by-job.

+ the TEI hi tag is still available in Tite. Use it without a + rend attribute.

Alignment and Indentation @@ -647,13 +659,13 @@ However, exhaustive description of alignment is not necessary. Headings, for instance, do not need to be marked as being centered, etc.

+
Uncertain Segments

The seg element is the phrase-level analogue to the ab @@ -737,20 +749,21 @@ stringVal, tag, timeline, valDesc, valItem, valList, variantEncoding, when.

+

Tite excludes the modules analysis and tagdocs where Lite includes some elements from these modules. Therefore Tite is less those elements as well.

+

The following elements are excluded in the TEI Lite but included in Tite: ab, div1, div2, div3, div4, - div5, div6, div7.

+ div5, div6, div7.

The following are the elements that Tite excludes from the TEI Lite: - add - altIdent, biblFull, choice, corr, - del, divGen, emph, expan, - gloss, imprint, index, mentioned, - orig, reg, rs, sic, soCalled, - teiCorpus, term, div - anchor.

+ add altIdent, anchor, biblFull, + choice, corr, del, div, + divGen, emph, expan, gloss, + imprint, index, mentioned, orig, + reg, rs, sic, soCalled, + teiCorpus, term.

Finally, the following are the elements that Tite adds to the TEI: @@ -1044,4 +1057,4 @@

- \ No newline at end of file +