[tei-council] word-dividing

Dot Porter dot.porter at gmail.com
Wed Jul 1 12:16:37 EDT 2009


I don't really understand the concern here. An lb (or cb, or pb) that
appears in the middle of a word physically divides that word, hence
"worddiv". As long as this usage is defined clearly in the Guidelines
("use @type='worddiv' to mark lb, pb or cb that physically divide
words") I don't think there will be any confusion on the part of
users. It's clear. And there's a history of usage, since EpiDoc is
already doing this, and has been. Why mess with something that works?

Dot

On Wed, Jul 1, 2009 at 5:08 PM, Gabriel Bodard<gabriel.bodard at kcl.ac.uk> wrote:
> Right. I guess my only objection is that it sounds more like a
> processing instruction than a description of the text. But I take your
> point. Let's see if anyone comes up with any suggestions better than
> either of ours. :-) (It would be nice if what we suggested in the
> example was something that is actually being used... and if we come to a
> consensus I'll recommend changing EpiDoc usage to whatever we use in the
> example in the guidelines.
>
> (If we don't come to a consensus, as you say, no problem.)
>
> G
>
> Lou Burnard wrote:
>> Sorry, but I do not follow your logic. "nobreak" says something about
>> the type of <lb> -- it is a "non-breaking" line break.  The implication
>> is that other <lb> (or <cb> etc) s are "breaking" i.e. they are
>> understood not only to mark the start of a line, column etc, but also to
>> break  a word. so that foo<lb/>bar should be considered to be two words.
>>
>> There are breaks between your words conceptually, I hope? If not, what
>> is the point of trying to distinguish types of <lb> anyway?
>>
>> If epidockers dont like this though they can always make up their own
>> terminology -- the type value is not constrained by the schema.
>>
>> Gabriel Bodard wrote:
>>> I'm not sure I like "nobreak", as it doesn't really say anything about
>>> the status of the lb (or, as Dot points out, cb, pb, etc.); especially
>>> since there are never (or rarely) breaks _between_ words in our texts.
>>> The idea behind "worddiv" was that this is a linebreak that appears
>>> mid-word, splitting it atwain, as Dan has it. Let me canvas the EpiDoc
>>> markup list, and see if people there have opinions one way or the other
>>> to contribute to this...
>>>
>>> G
>>>
>>> Lou Burnard wrote:
>>>
>>>> After much head scratching here in Oxford, we've decided on "nobreak"
>>>>
>>>> I added a couple more examples and a bit more discussion, taking
>>>> examples from some real projects too. Affected are the definition for
>>>> <lb> and the discussion of milestones in CO.
>>>>
>>>>
>>>>
>>>>
>>>> Daniel Paul O'Donnell wrote:
>>>>
>>>>> I think "word-dividing" in this case means "splitting individual words
>>>>> atwain" rather than "demarcating their boundaries" ;)
>>>>>
>>>>> In my edition of Cædmon's Hymn I needed to encode space and lb
>>>>> similarly explicitly: i.e. indicating whether it fell within the word
>>>>> or between words: the stylesheets (such as they were in those days)
>>>>> handled them differently depending on the value of @type (which I'd
>>>>> made universal). White space wouldn't have done it for me, because I
>>>>> was reformatting the data with and without the word-internal spaces
>>>>> and lines depending on the view the user selected.
>>>>>
>>>>> -dan
>>>>>
>>>>> Lou Burnard wrote:
>>>>>
>>>>>> Gabriel BODARD wrote:
>>>>>>
>>>>>>
>>>>>>> Lou Burnard wrote:
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>>>> (9) lb: should we add an example of the usage of
>>>>>>>>> lb/type=word-dividing, which currently sits a little uncomfortably
>>>>>>>>> in the note. I suggest "Cae<lb type="worddiv"/>sari".
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Don't know what note you're referring to. Don't see the point of
>>>>>>>> the @type attribute. Haven't done anything.
>>>>>>>>
>>>>>>>>
>>>>>>> This was discussed some months ago, and is the reason @type was
>>>>>>> allowed on <lb> in the first place. There is currently a note at the
>>>>>>> bottom of LB that says: "The type attribute may be used to
>>>>>>> characterize the linebreak in any respect, for example as
>>>>>>> word-breaking or not." We have literally thousands of examples of
>>>>>>> this in EpiDoc files, where words are not always tagged explicitly
>>>>>>> and it's the only way we can be sure to tokenize correctly. I just
>>>>>>> thought an example would help to clarify the use-case.
>>>>>>>
>>>>>>> (If people feel strongly that [e.g.] "wordDividing" would be a
>>>>>>> better recommended value than "worddiv", I'm happy to make that part
>>>>>>> of our P5 upgrade script.)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> I don't mind adding examples, but this one confuses me. Isn't the
>>>>>> point that the <lb/> in your example does NOT divide the word ? so
>>>>>> both "wordDividing" and "worddiv" seem exactly the opposite of what
>>>>>> you want here. How about "nowordbreak" or "nwb"?
>>>>>>
>>>>>> I know I lost this argument last time, but I still think in practice
>>>>>> I'd deal with this by putting in whitespace where the <lb> coincided
>>>>>> with a word boundary and leaving  it out where it didn't!
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> G
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> tei-council mailing list
>>>>>> tei-council at lists.village.Virginia.EDU
>>>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>>>
>>>>>>
>>> _______________________________________________
>>> tei-council mailing list
>>> tei-council at lists.village.Virginia.EDU
>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>
>>
>
> --
> Dr Gabriel BODARD
> (Epigrapher & Digital Classicist)
>
> Centre for Computing in the Humanities
> King's College London
> 26-29 Drury Lane
> London WC2B 5RL
> Email: gabriel.bodard at kcl.ac.uk
> Tel: +44 (0)20 7848 1388
> Fax: +44 (0)20 7848 2980
>
> http://www.digitalclassicist.org/
> http://www.currentepigraphy.org/
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>



-- 
*~*~*~*~*~*~*~*~*~*~*
Dot Porter (MA, MSLS)          Metadata Manager
Digital Humanities Observatory (RIA), Regus House, 28-32 Upper
Pembroke Street, Dublin 2, Ireland
-- A Project of the Royal Irish Academy --
Phone: +353 1 234 2444        Fax: +353 1 234 2400
http://dho.ie          Email: dot.porter at gmail.com
*~*~*~*~*~*~*~*~*~*~*


More information about the tei-council mailing list