[tei-council] Hyphenation discussion

Sebastian Rahtz sebastian.rahtz at oucs.ox.ac.uk
Sun Jan 16 12:37:18 EST 2011


I'm broadly in favour, but some nitpicks below

On 15 Jan 2011, at 17:54, Lou Burnard wrote:

> re-encoded for analysis or other processing. Unicode distinguishes
> three visually similar characters for the hyphen, although it also
> retains the undifferentiated hyphen-minus (U+002D) for compatibility
> reasons. The hard hyphen (U+2010) is distinguished from the minus sign
> (U+2212) which should be used only in mathematical expressions, and
> also from the soft hyphen (U+00AD) which may appear in <soCalled>born
> digital</soCalled> documents to indicate places where it is acceptable
> to insert a hyphen when the document is formatted. </p>

this seems misleading to me. it implies a minus is a sort of hyphen

I'd say 

"Unicode distinguishes four characters visually similar to the hyphen, including  the undifferentiated hyphen-minus (U+002D) for compatibility reasons. The hard hyphen (U+2010) is distinguished from the minus sign (U+2212) which is for use in mathematical expressions, and also from the soft hyphen (U+00AD) which may appear in <soCalled>born digital</soCalled> documents to indicate places where it is acceptable to insert a hyphen when the document is formatted. </p>

> 
> <p>In cases where the <gi>lb</gi> element does not in fact correspond
> with a token boundary, the <att>type</att> attribute should be given a
> special value to indicate that this is a "non-breaking" line

^should^may^

> break. The values proposed by these Guidelines are <val>noBreak</val>
> or (forcompatibility with existing recommendations)
> <val>inWord</val>. A value <val>mayBreak</val> is also available, 

I find the idea of "compatibility with existing recommendations" pretty weird.
Either it is the recommendation or it is not. I'd say

"The value <val>noBreak</val>  is recommended
(this corresponds to older recommendation of <val>inWord</val>). 
A value <val>mayBreak</val> is appropriate for cases where the encoder does not wish (or is unable) to determine
whether the orthographic token concerned is broken by the line ending or not."


--
Sebastian Rahtz      
Information and Support Group Manager, Oxford University Computing Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431

Sólo le pido a Dios
que el futuro no me sea indiferente







More information about the tei-council mailing list