[tei-council] word-dividing

Daniel Paul O'Donnell daniel.odonnell at gmail.com
Tue Jun 30 16:13:21 EDT 2009


I think "word-dividing" in this case means "splitting individual words 
atwain" rather than "demarcating their boundaries" ;)

In my edition of Cædmon's Hymn I needed to encode space and lb similarly 
explicitly: i.e. indicating whether it fell within the word or between 
words: the stylesheets (such as they were in those days) handled them 
differently depending on the value of @type (which I'd made universal). 
White space wouldn't have done it for me, because I was reformatting the 
data with and without the word-internal spaces and lines depending on 
the view the user selected.

-dan

Lou Burnard wrote:
> Gabriel BODARD wrote:
>   
>> Lou Burnard wrote:
>>     
>
>   
>>>> (9) lb: should we add an example of the usage of 
>>>> lb/type=word-dividing, which currently sits a little uncomfortably in 
>>>> the note. I suggest "Cae<lb type="worddiv"/>sari".
>>>>         
>>> Don't know what note you're referring to. Don't see the point of the 
>>> @type attribute. Haven't done anything.
>>>       
>> This was discussed some months ago, and is the reason @type was allowed 
>> on <lb> in the first place. There is currently a note at the bottom of 
>> LB that says: "The type attribute may be used to characterize the 
>> linebreak in any respect, for example as word-breaking or not." We have 
>> literally thousands of examples of this in EpiDoc files, where words are 
>> not always tagged explicitly and it's the only way we can be sure to 
>> tokenize correctly. I just thought an example would help to clarify the 
>> use-case.
>>
>> (If people feel strongly that [e.g.] "wordDividing" would be a better 
>> recommended value than "worddiv", I'm happy to make that part of our P5 
>> upgrade script.)
>>
>>     
>
> I don't mind adding examples, but this one confuses me. Isn't the point 
> that the <lb/> in your example does NOT divide the word ? so both 
> "wordDividing" and "worddiv" seem exactly the opposite of what you want 
> here. How about "nowordbreak" or "nwb"?
>
> I know I lost this argument last time, but I still think in practice I'd 
> deal with this by putting in whitespace where the <lb> coincided with a 
> word boundary and leaving  it out where it didn't!
>
>
>
>
>
>
>   
>> Best,
>>
>> G
>>
>>     
>
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>   

-- 
Daniel Paul O'Donnell
Associate Professor of English
University of Lethbridge

Chair and CEO, Text Encoding Initiative (http://www.tei-c.org/)
Co-Chair, Digital Initiatives Advisory Board, Medieval Academy of America
President-elect (English), Society for Digital Humanities/Société pour l'étude des médias interactifs (http://sdh-semi.org/)
Founding Director (2003-2009), Digital Medievalist Project (http://www.digitalmedievalist.org/)

Vox: +1 403 329-2377
Fax: +1 403 382-7191 (non-confidental)
Home Page: http://people.uleth.ca/~daniel.odonnell/




More information about the tei-council mailing list