[tei-council] soft hyphens (again)
Kevin Hawkins
kevin.s.hawkins at ultraslavonic.info
Fri May 21 00:06:43 EDT 2010
I would like to rework the section of the *Best Practices for TEI in
Libraries* dealing with hyphenation (
http://wiki.tei-c.org/index.php/Best_Practices_for_TEI_in_Libraries#Hyphenation
) to take into account our verdict in Dublin for how to handle the "soft
hyphen" problem.
Looking at my notes from Dublin:
http://wiki.tei-c.org/index.php/Draft_minutes_of_2010-04_Council_meeting#hyphenation
it seems we were actually a little vague on the mechanics of the
proposed encoding.
At first we said we would use lb at type to indicate whether a lexical unit
was broken by the hyphen and not leave a hyphen character in the data.
That is, you might have something like:
* <lb type="lexicalboundary" rend="-"/> for a hyphen at the end of a
line where a hyphen would appear in any case, such as:
This is not a run-
on sentence.
* <lb type="nolexicalboundary" rend="-"/> for a hyphen at the end of a
line where a hyphen would not appear had there not been a line break
there, such as:
UTF-8 is a char-
acter encoding for Unicode.
* <lb type="uncertainlexicalboundary" rend="-"/> for a case where you're
not sure which of the two above it should be, such as:
Some people say TEI is a mark-
up language.
We did not agree on values for @type, so I just made up these three.
However, we later discussed values of @rend which could be used for
cases of type="uncertainlexicalboundary". As in the minutes, we agreed
that you might use any of the following:
a) -
b) hyphen
c) soft or hard hyphen
d) ambiguous
While (a) and (b) are effectively equivalent and could be used with any
of the three @type values, (c) and (d) only make sense with
type="uncertainlexicalboundary". However, it seems to me that if @type
is used, there's no need for different values of @rend since there's no
point in being redundant in our declaration of uncertainty.
Perhaps I misunderstood the discussion, or perhaps Lou has unilaterally
resolved these quesions in revisions he might have made in SourceForge
already. But for clarify, let me post the following questions:
1) Is everyone okay with using @type instead of @rend to distinguish
these cases? They are, after all, all rendered the same.
2) Can we come up with better values for @type than
lexicalboundary
nolexicalboundary
uncertainlexicalboundary
? They're a bit unwieldy.
--Kevin
More information about the tei-council
mailing list