[tei-council] soft hyphens (again)

Kevin Hawkins kevin.s.hawkins at ultraslavonic.info
Sun Jun 20 17:09:21 EDT 2010


Lou offered three possible solutions:

> I can only think of two possible solutions to this. No, make it
> three.
>
> 1. come up with a better word than "uncertain" for the third case
> (wbsu or wordBreakStatusUnknown?)
>
> 2. use a different attribute @wordBreaking = "true|false|unknown"
>
> 3. redefine the semantics of @type="wordBreaking" to mean just "this
> is probably a word breaker but possibly not"

For consistency with the options I originally presented, solution 3 
should read:

3. redefine the semantics of @type="inWord" to mean just "this
is probably a word breaker but possibly between words"

Should we have a fourth option?

4. redefine the semantics of @type='betweenWords' to mean just "this is
probably between words but possibly a word breaker"

I've been puzzling over whether uncertain cases are more likely to be
confused with "betweenWords" or "inWord" cases but can't figure it out.

To respond to Gabby:

> While (2) is attractive in terms of explicicity and elegance, I am
> tempted to vote for (3) on the grounds that if you're really uncertain
> about the status of a line-break there are other ways to express this
> (<certainty> element inside <lb> anyone?--to resurrect a TEI-L query
> that seems to have been met with defeaning indifference...)

Which TEI-L query is this?

<lb/> is an empty element, so we can't put certainty inside it without 
changing its content model (and that of <cb/> and <pb/>.

> Has anyone ever had any use-case for characterizing linebreaks (and cb,
> pb, etc.) other than by whether they break works or not?

Aside from describing whether they break words, I can't think of anything.


More information about the tei-council mailing list