[tei-council] how to encode a hyphen at the end of a line, column, or page when you are encoding hyphens

Kevin Hawkins kevin.s.hawkins at ultraslavonic.info
Mon Dec 27 20:51:01 EST 2010


Finally getting back to this!  See below ...

Lou Burnard wrote:
>> I find "inWord" and "nobreak" entirely non-intuitive
> 
> "inWord" seems fairly obvious to me. More significantly perhaps, it was 
> the value which the Epidockers agreed on after a fairly heated debate.

My problem with "inWord" is that, without further explanation, I'm not 
sure whether it only applies to cases like:

UTF-8 is a char-
acter encoding for Unicode.

or also to:

This is not a run-
on sentence.

That is, I'm not sure whether we're talking about orthographic or 
lexical words.

If some explanation can be added to P5 on this point, I'll probably be 
much happier with it.

> Maybe "inToken" or "internal" ?

Without further explanation, I find these opaque too.  You see, "inWord" 
sounds like something internal to a word, and if that's true, how is 
"internal" different?

>> I prefer these values for type=:
>>
>> * lexicalBoundary
>> * noLexicalBoundary
>> * uncertainLexicalBoundary
>>
> 
> I am not comfortable with "lexical" here, because where I come from 
> "lexical entries" may include multiple "tokens". If I treat "apple pie" 
> as a lexical entry, and there happens to be a <lb/> between the "apple" 
> and the "pie" I don't think I'd mark the <lb/> any different from any 
> other. I think we should stick with the idea that line-end hyphenation 
> (or not) is to do with simple minded  orthographic tokens, not tricky 
> things like lexical items.

Point taken.

>> However, these may not be expressive enough for everything you'd like to
>> encode.  Paul Schaffner provided the following examples (which I've
>> annotated):
>>
>> a) street<lb/>walker  -- line break between components of a usually
>> non-hyphenated compound
> 
> Not sure what a "compound" is here. For me, the critical point is 
> whether elsewhere in this text I find, or expect to find, 
> "streetwalker" (in which case the <lb/> is "inWord") or "street walker" 
> (in which case it isn't). And if I don't want to take a stand either 
> way, then it is "undecided".

By "compound", I meant a compound word, such as "policeman", 
"must-have", "ice cream", or "street walker".

Aside from this, my recollection of Dublin has faded significantly, and 
I don't have any strong feelings on this except to give people clear 
instructions they can follow that tells them what to do.  I think that's 
what Martin Mueller is looking for as well.

Kevin


More information about the tei-council mailing list