[tei-council] soft hyphens (again)
kevin.s.hawkins at ultraslavonic.info
Sun Jun 27 16:53:58 EDT 2010
On 6/16/2010 6:07 AM, Gabriel Bodard wrote:
> Has anyone ever had any use-case for characterizing linebreaks (and cb,
> pb, etc.) other than by whether they break works or not?
I asked Paul Schaffner about this, and he offered the following:
-- use of @type to distinguish word-breaking
from in-word <lb>s seems to me a little strange
to begin with. I should think that there are
lots of other ways in which lines differ (and
therefore their breaks differ) other than whether
they occurr in a word-dividing position. And
that some of those ways are a more natural fit
for @type. E.g. ="forced" (by lack of space) vs.
="deliberate"; or "significant" vs. "insignificant";
or "vertical" vs. (whatever--line breaks can
appear between lines in all sorts of formatted
text, e.g. chunks of a 'scroll'-style heading
in engravings are most easily divided by <lb>s,
even though one such 'line' does not sit neatly below
the previous one).
-- the trio of "inWord" "betweenWords" and "uncertain"
may not express all the options. One might want to
distinguish (e.g.) (?)
street<lb/>walker between components of a non-hyphenated cpd
bag<lb/>lady between components of a usu. hyphenated cpd
win<lb/>some between syllables (or morphemes) in a single word
iP<lb/>hone word-internal breaks (misplaced according to usual rules*)
gentle<lb/>man may or may not be regarded as a compound
abusive<lb/>tagger between words
(*this is the way that the WSJ breaks "iPhone" at line
ends, for some reason)
though I have to admit that when tagging inscriptions
(or rather, when tagging transcriptions of inscriptions),
my commonest need is to distinguish breaks that should
be treated as word breaks, those that should not be,
and those about which I have doubts.
More information about the tei-council