[tei-council] how to encode a hyphen at the end of a line, column, or page when you are encoding hyphens

Lou Burnard lou.burnard at oucs.ox.ac.uk
Wed Jan 5 08:00:19 EST 2011

Well, like Sebastian, I don't think I would attribute the lack of 
response on this issue to any lack of understanding on the part of 
Council members! Myself, I am a bit at a loss to understand what it is 
exactly that needs further explanation. There is a note in the element 
description for <lb> which reads

"The type attribute may be used to characterize the line break in any 
respect, but its most common use is to specify that the presence of the 
line break does not imply the end of the word in which it is embedded. A 
value such as inWord or nobreak is recommended for this purpose, but 
encoders are free to choose whichever values are appropriate. "

There is also an example in 3.10.3 
which reads

"The type attribute may be used on milestone elements such as lb 
<http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-lb.html> and pb 
<http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-pb.html> to 
categorize them in any way. One particularly useful way is to indicate 
whether or not these milestone tags are word-breaking. By default it is 
reasonable to assume that words are not broken across page or line 
boundaries, and that therefore a sequence such as
...sed imp<lb/>erator dixit...
should be tokenized as four words (sed, imp, erator, and dixit). To make 
explicit that this is not the case, a tagging such as the following is 
...sed imp<lb type="nobreak"/>erator dixit...
Where hyphenation appears before a line or page break, the encoder may 
or may not choose to include it, either explicitly using an appropriate 
Unicode character, or descriptively for example by means of the rend 
attribute; see further 3.2 Treatment of Punctuation 
<http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CO.html#COPU>. "

However, it's true that the referenced section on Punctuation doesn't 
seem to mention hyphenation at all, so maybe it would be a good idea to 
add more discussion there.

For me the main issue that needs to be clarified is the interaction 
between <lb/> and whitespace with regard to implicit tokenization. The 
excellent  TEI-L posting from one L. Burnard 
which you mention addresses that at length. Subsequent discussion of the 
issue on TEI-L seems to support the proposals therein too. So maybe what 
I should do is rehash that discussion a bit and bung it into 3.2 
somewhere.  I'll try that anyway, and post a draft here for comment.

On 05/01/11 02:31, Kevin Hawkins wrote:
> So, I fear that Lou is the only current Council member who really
> understands the issues around hyphenation and that I am the only one who
> finds the lack of clear guidance on this question in P5 to be a
> significant problem.  (Martin Mueller is an ally but is not on Council.)
>    This has come up a few times over the past few years on TEI-L, with no
> changes made to P5 except the addition of a suggested value for @type in
> the note attached to the definition of the lb element.
> As I mentioned last month, I find "inWord" and "nobreak" (given in the
> definition of lb) unclear without examples of each.  In addition, as a
> reader I would expect to find in section 3.10.3 and/or 3.2 of P5 a
> discussion of the three hyphen characters mentioned in 3.2: when to use
> each and how they could (or should) be used in combination with the lb,
> cb, and pb elements.
> Lou, since my summary sent to tei-council last month seemed to proposed
> solutions to more problems that we need to solve (or simply raises
> additional problems), could you write a proposal that addresses only the
> narrow question (which, to be honest, I'm not even sure how to state)?
> You might be able to start with your 2010-03-24 message to TEI-L.  In
> fact, maybe this is still the best solution, but I think we need to make
> sure that points raised on TEI-L and at our discussion in Dublin have
> been taken into account.  Not sure whether you want to send to
> tei-council or TEI-L and whether it should go into SourceForge.
> I would be be quite grateful!
> Kevin
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council

More information about the tei-council mailing list