[tei-council] word-dividing

Lou Burnard lou.burnard at oucs.ox.ac.uk
Tue Jun 30 15:43:28 EDT 2009


Gabriel BODARD wrote:
> Lou Burnard wrote:

>>> (9) lb: should we add an example of the usage of 
>>> lb/type=word-dividing, which currently sits a little uncomfortably in 
>>> the note. I suggest "Cae<lb type="worddiv"/>sari".
>> Don't know what note you're referring to. Don't see the point of the 
>> @type attribute. Haven't done anything.
> 
> This was discussed some months ago, and is the reason @type was allowed 
> on <lb> in the first place. There is currently a note at the bottom of 
> LB that says: "The type attribute may be used to characterize the 
> linebreak in any respect, for example as word-breaking or not." We have 
> literally thousands of examples of this in EpiDoc files, where words are 
> not always tagged explicitly and it's the only way we can be sure to 
> tokenize correctly. I just thought an example would help to clarify the 
> use-case.
> 
> (If people feel strongly that [e.g.] "wordDividing" would be a better 
> recommended value than "worddiv", I'm happy to make that part of our P5 
> upgrade script.)
> 

I don't mind adding examples, but this one confuses me. Isn't the point 
that the <lb/> in your example does NOT divide the word ? so both 
"wordDividing" and "worddiv" seem exactly the opposite of what you want 
here. How about "nowordbreak" or "nwb"?

I know I lost this argument last time, but I still think in practice I'd 
deal with this by putting in whitespace where the <lb> coincided with a 
word boundary and leaving  it out where it didn't!






> Best,
> 
> G
> 



More information about the tei-council mailing list