[tei-council] soft hyphens (again)

Martin Holmes mholmes at uvic.ca
Mon Jun 28 12:17:00 EDT 2010


Taking Paul's examples:

<phr>street<lb/>walker</phr>  between components of a non-hyphenated cpd

<phr type="hyphenated">bag-<lb/>lady</phr>  between components of a usu. 
hyphenated cpd

<w>win-<lb/>some</w> between syllables (or morphemes) in a single word

<w>iP-<lb/>hone</w> word-internal breaks (misplaced according to usual 
rules*)

gentle<lb/>man may or may not be regarded as a compound

<w>abusive</w>-<lb/><w>tagger</w> between words

Lou responded to my previous message like this:

> But the issue currently on the table is what to do about LINEBREAKS. As
> I said in an earlier post, it isn't necessarily a hyphen character which
> is used to mark where a word (despite appearances) runs on to the next
> line. It may be something else entirely. It may be nothing at all.

At the risk of another roasting, I still think that the linebreak tag is 
the wrong place to supply information about 
whatever-it-is-that-is-being-broken (word, phrase or whatever) and 
whatever-it-is-that-is-signalling-the-break (hyphen or whatever). The 
linebreak tag says there is a linebreak in the text. The context, and 
the glyph that precedes the linebreak, are not attributes of the linebreak.

I think it would be better to encourage the use of <w>, <phr> and other 
inline-level tags to mark the context of the linebreak. Even if such 
tags are not being used for any other purpose in a text -- or perhaps 
_especially_ if they aren't -- they could be used for exactly this 
purpose, and it's easy for a processor to detect when a 
linebreak-signalling glyph or a linebreak tag occur within such contexts 
and process accordingly.

Cheers,
Martin

On 10-06-27 01:53 PM, Kevin Hawkins wrote:
>       street<lb/>walker  between components of a non-hyphenated cpd
>       bag<lb/>lady  between components of a usu. hyphenated cpd
>       win<lb/>some between syllables (or morphemes) in a single word
>       iP<lb/>hone word-internal breaks (misplaced according to usual rules*)
>       gentle<lb/>man may or may not be regarded as a compound
>       abusive<lb/>tagger between words

-- 
Martin Holmes
University of Victoria Humanities Computing and Media Centre
(mholmes at uvic.ca)
Half-Baked Software, Inc.
(mholmes at halfbakedsoftware.com)
martin at mholmes.com


More information about the tei-council mailing list