[tei-council] word-dividing
Daniel Paul O'Donnell
daniel.odonnell at gmail.com
Tue Jun 30 16:13:21 EDT 2009
I think "word-dividing" in this case means "splitting individual words
atwain" rather than "demarcating their boundaries" ;)
In my edition of Cædmon's Hymn I needed to encode space and lb similarly
explicitly: i.e. indicating whether it fell within the word or between
words: the stylesheets (such as they were in those days) handled them
differently depending on the value of @type (which I'd made universal).
White space wouldn't have done it for me, because I was reformatting the
data with and without the word-internal spaces and lines depending on
the view the user selected.
-dan
Lou Burnard wrote:
> Gabriel BODARD wrote:
>
>> Lou Burnard wrote:
>>
>
>
>>>> (9) lb: should we add an example of the usage of
>>>> lb/type=word-dividing, which currently sits a little uncomfortably in
>>>> the note. I suggest "Cae<lb type="worddiv"/>sari".
>>>>
>>> Don't know what note you're referring to. Don't see the point of the
>>> @type attribute. Haven't done anything.
>>>
>> This was discussed some months ago, and is the reason @type was allowed
>> on <lb> in the first place. There is currently a note at the bottom of
>> LB that says: "The type attribute may be used to characterize the
>> linebreak in any respect, for example as word-breaking or not." We have
>> literally thousands of examples of this in EpiDoc files, where words are
>> not always tagged explicitly and it's the only way we can be sure to
>> tokenize correctly. I just thought an example would help to clarify the
>> use-case.
>>
>> (If people feel strongly that [e.g.] "wordDividing" would be a better
>> recommended value than "worddiv", I'm happy to make that part of our P5
>> upgrade script.)
>>
>>
>
> I don't mind adding examples, but this one confuses me. Isn't the point
> that the <lb/> in your example does NOT divide the word ? so both
> "wordDividing" and "worddiv" seem exactly the opposite of what you want
> here. How about "nowordbreak" or "nwb"?
>
> I know I lost this argument last time, but I still think in practice I'd
> deal with this by putting in whitespace where the <lb> coincided with a
> word boundary and leaving it out where it didn't!
>
>
>
>
>
>
>
>> Best,
>>
>> G
>>
>>
>
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>
--
Daniel Paul O'Donnell
Associate Professor of English
University of Lethbridge
Chair and CEO, Text Encoding Initiative (http://www.tei-c.org/)
Co-Chair, Digital Initiatives Advisory Board, Medieval Academy of America
President-elect (English), Society for Digital Humanities/Société pour l'étude des médias interactifs (http://sdh-semi.org/)
Founding Director (2003-2009), Digital Medievalist Project (http://www.digitalmedievalist.org/)
Vox: +1 403 329-2377
Fax: +1 403 382-7191 (non-confidental)
Home Page: http://people.uleth.ca/~daniel.odonnell/
More information about the tei-council
mailing list