[tei-council] Content for <pb/> etc. [was: soft hyphens (again)]
Kevin Hawkins
kevin.s.hawkins at ultraslavonic.info
Sun Jun 27 11:14:40 EDT 2010
Our discussion has wandered quite a bit, so let me try to summarize the
ideas so far ...
I want to know how to encode a hyphen that occurs at a line, column, or
page break and whose status with respect to breaking a word is unclear.
We had settled on doing this with <lb type="?"/>, <cb type="?"/>, and
<pb type="?"/> but needed a value that doesn't imply that we are
uncertain of whether a line break, column break, or page break occurs here.
Martin suggested we invent an element to surround the hyphen and put the
@type on this element rather than on the following <lb/>, <cb/>, or
<pb/>. Lou noted this would lead to two different ways of encoding a
hyphen and noted that there might be a different character or no
character at all marking (or not marking) the point of continuation.
Lou suggested three options (which I've clarified):
1. come up with a better word than "uncertain" for the type attribute,
such as "wbsu" or "wordBreakStatusUnknown"
2. create a new attribute @wordBreaking, whose value could be
true|false|unknown
3. redefine the semantics of @type="inWord" to mean just "this is
probably a word breaker but possibly between words"
I supported (1) for P5 but suggested a fourth option:
4. redefine the semantics of @type='betweenWords' to mean just "this is
probably between words but possibly a word breaker"
Gabby suggested that we do to <lb/>, <cb/>, and <pb/> what has already
been done to <gap> and <space>: allow <certainty>, <precision>, etc. as
content. That way if are *unsure whether a break actually occurs*, you
could have something like:
<lb>
<certainty locus="break" degree="0.5"/>
</lb>
leaving the following way to express that we're *uncertain of the type
of hyphen*:
Some people say TEI is a mark-<lb type="uncertain"/>up language.
Elena supported Gabby's change to content models since it would also
work to handle missing, corrected, and incorrect page or line numbers,
but Lou, Martin, and Dot said to use <fw> for representing numbering as
it appears in the source.
***
I still support (1). Gabby asked whether anyone would characterize
breaks by anything other than whether they break words, and I too am
having trouble imagining being uncertain whether a line, column, or page
break actually *occurs* in a source. Therefore, I'm not too concerned
with expanding the content model of <lb/>, <cb/>, and <pb/>.
More information about the tei-council
mailing list