[tei-council] word-dividing

Gabriel Bodard gabriel.bodard at kcl.ac.uk
Wed Jul 1 12:08:29 EDT 2009


Right. I guess my only objection is that it sounds more like a 
processing instruction than a description of the text. But I take your 
point. Let's see if anyone comes up with any suggestions better than 
either of ours. :-) (It would be nice if what we suggested in the 
example was something that is actually being used... and if we come to a 
consensus I'll recommend changing EpiDoc usage to whatever we use in the 
example in the guidelines.

(If we don't come to a consensus, as you say, no problem.)

G

Lou Burnard wrote:
> Sorry, but I do not follow your logic. "nobreak" says something about 
> the type of <lb> -- it is a "non-breaking" line break.  The implication 
> is that other <lb> (or <cb> etc) s are "breaking" i.e. they are 
> understood not only to mark the start of a line, column etc, but also to 
> break  a word. so that foo<lb/>bar should be considered to be two words. 
> 
> There are breaks between your words conceptually, I hope? If not, what 
> is the point of trying to distinguish types of <lb> anyway?
> 
> If epidockers dont like this though they can always make up their own 
> terminology -- the type value is not constrained by the schema.
> 
> Gabriel Bodard wrote:
>> I'm not sure I like "nobreak", as it doesn't really say anything about 
>> the status of the lb (or, as Dot points out, cb, pb, etc.); especially 
>> since there are never (or rarely) breaks _between_ words in our texts. 
>> The idea behind "worddiv" was that this is a linebreak that appears 
>> mid-word, splitting it atwain, as Dan has it. Let me canvas the EpiDoc 
>> markup list, and see if people there have opinions one way or the other 
>> to contribute to this...
>>
>> G
>>
>> Lou Burnard wrote:
>>   
>>> After much head scratching here in Oxford, we've decided on "nobreak"
>>>
>>> I added a couple more examples and a bit more discussion, taking 
>>> examples from some real projects too. Affected are the definition for 
>>> <lb> and the discussion of milestones in CO.
>>>
>>>
>>>
>>>
>>> Daniel Paul O'Donnell wrote:
>>>     
>>>> I think "word-dividing" in this case means "splitting individual words 
>>>> atwain" rather than "demarcating their boundaries" ;)
>>>>
>>>> In my edition of Cædmon's Hymn I needed to encode space and lb 
>>>> similarly explicitly: i.e. indicating whether it fell within the word 
>>>> or between words: the stylesheets (such as they were in those days) 
>>>> handled them differently depending on the value of @type (which I'd 
>>>> made universal). White space wouldn't have done it for me, because I 
>>>> was reformatting the data with and without the word-internal spaces 
>>>> and lines depending on the view the user selected.
>>>>
>>>> -dan
>>>>
>>>> Lou Burnard wrote:
>>>>       
>>>>> Gabriel BODARD wrote:
>>>>>  
>>>>>         
>>>>>> Lou Burnard wrote:
>>>>>>     
>>>>>>           
>>>>>  
>>>>>         
>>>>>>>> (9) lb: should we add an example of the usage of 
>>>>>>>> lb/type=word-dividing, which currently sits a little uncomfortably 
>>>>>>>> in the note. I suggest "Cae<lb type="worddiv"/>sari".
>>>>>>>>         
>>>>>>>>               
>>>>>>> Don't know what note you're referring to. Don't see the point of 
>>>>>>> the @type attribute. Haven't done anything.
>>>>>>>       
>>>>>>>             
>>>>>> This was discussed some months ago, and is the reason @type was 
>>>>>> allowed on <lb> in the first place. There is currently a note at the 
>>>>>> bottom of LB that says: "The type attribute may be used to 
>>>>>> characterize the linebreak in any respect, for example as 
>>>>>> word-breaking or not." We have literally thousands of examples of 
>>>>>> this in EpiDoc files, where words are not always tagged explicitly 
>>>>>> and it's the only way we can be sure to tokenize correctly. I just 
>>>>>> thought an example would help to clarify the use-case.
>>>>>>
>>>>>> (If people feel strongly that [e.g.] "wordDividing" would be a 
>>>>>> better recommended value than "worddiv", I'm happy to make that part 
>>>>>> of our P5 upgrade script.)
>>>>>>
>>>>>>     
>>>>>>           
>>>>> I don't mind adding examples, but this one confuses me. Isn't the 
>>>>> point that the <lb/> in your example does NOT divide the word ? so 
>>>>> both "wordDividing" and "worddiv" seem exactly the opposite of what 
>>>>> you want here. How about "nowordbreak" or "nwb"?
>>>>>
>>>>> I know I lost this argument last time, but I still think in practice 
>>>>> I'd deal with this by putting in whitespace where the <lb> coincided 
>>>>> with a word boundary and leaving  it out where it didn't!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  
>>>>>         
>>>>>> Best,
>>>>>>
>>>>>> G
>>>>>>
>>>>>>     
>>>>>>           
>>>>> _______________________________________________
>>>>> tei-council mailing list
>>>>> tei-council at lists.village.Virginia.EDU
>>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>>   
>>>>>         
>> _______________________________________________
>> tei-council mailing list
>> tei-council at lists.village.Virginia.EDU
>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>   
> 

-- 
Dr Gabriel BODARD
(Epigrapher & Digital Classicist)

Centre for Computing in the Humanities
King's College London
26-29 Drury Lane
London WC2B 5RL
Email: gabriel.bodard at kcl.ac.uk
Tel: +44 (0)20 7848 1388
Fax: +44 (0)20 7848 2980

http://www.digitalclassicist.org/
http://www.currentepigraphy.org/


More information about the tei-council mailing list