[tei-council] soft hyphens (again)

Fri May 21 00:06:43 EDT 2010

I would like to rework the section of the *Best Practices for TEI in 
Libraries* dealing with hyphenation ( 
http://wiki.tei-c.org/index.php/Best_Practices_for_TEI_in_Libraries#Hyphenation 
) to take into account our verdict in Dublin for how to handle the "soft 
hyphen" problem.

Looking at my notes from Dublin:

http://wiki.tei-c.org/index.php/Draft_minutes_of_2010-04_Council_meeting#hyphenation

it seems we were actually a little vague on the mechanics of the 
proposed encoding.

At first we said we would use lb at type to indicate whether a lexical unit 
was broken by the hyphen and not leave a hyphen character in the data. 
That is, you might have something like:

* <lb type="lexicalboundary" rend="-"/> for a hyphen at the end of a 
line where a hyphen would appear in any case, such as:

This is not a run-
on sentence.

* <lb type="nolexicalboundary" rend="-"/> for a hyphen at the end of a 
line where a hyphen would not appear had there not been a line break 
there, such as:

UTF-8 is a char-
acter encoding for Unicode.

* <lb type="uncertainlexicalboundary" rend="-"/> for a case where you're 
not sure which of the two above it should be, such as:

Some people say TEI is a mark-
up language.

We did not agree on values for @type, so I just made up these three.

However, we later discussed values of @rend which could be used for 
cases of type="uncertainlexicalboundary".  As in the minutes, we agreed 
that you might use any of the following:

a) -
b) hyphen
c) soft or hard hyphen
d) ambiguous

While (a) and (b) are effectively equivalent and could be used with any 
of the three @type values, (c) and (d) only make sense with 
type="uncertainlexicalboundary".  However, it seems to me that if @type 
is used, there's no need for different values of @rend since there's no 
point in being redundant in our declaration of uncertainty.

Perhaps I misunderstood the discussion, or perhaps Lou has unilaterally 
resolved these quesions in revisions he might have made in SourceForge 
already.  But for clarify, let me post the following questions:

1) Is everyone okay with using @type instead of @rend to distinguish 
these cases?  They are, after all, all rendered the same.

2) Can we come up with better values for @type than

lexicalboundary
nolexicalboundary
uncertainlexicalboundary

?  They're a bit unwieldy.

--Kevin