[tei-council] word-dividing

Dot Porter dot.porter at gmail.com
Wed Jul 1 12:54:04 EDT 2009


Dan, I don't think anyone is suggesting the value be technically
controlled, but we want an example in the Guidelines. And as people
tend to take the Guidelines suggestions quite seriously, it's worth
considering what the suggested value be.

Dot

On Wed, Jul 1, 2009 at 5:45 PM, O'Donnell, Dan<daniel.odonnell at uleth.ca> wrote:
> I also don't understand why we are sweating the att value. Are we really interested in controlling this vocabulary? Why?
>
> -----------
> Daniel O'Donnell
> University of Lethbridge
> (From my mobile telephone)
>
> --- original message ---
> From: "Dot Porter" <dot.porter at gmail.com>
> Subject: Re: [tei-council] word-dividing
> Date: July 1, 2009
> Time: 10:17:9
>
> I don't really understand the concern here. An lb (or cb, or pb) that
> appears in the middle of a word physically divides that word, hence
> "worddiv". As long as this usage is defined clearly in the Guidelines
> ("use @type='worddiv' to mark lb, pb or cb that physically divide
> words") I don't think there will be any confusion on the part of
> users. It's clear. And there's a history of usage, since EpiDoc is
> already doing this, and has been. Why mess with something that works?
>
> Dot
>
> On Wed, Jul 1, 2009 at 5:08 PM, Gabriel Bodard<gabriel.bodard at kcl.ac.uk> wrote:
>> Right. I guess my only objection is that it sounds more like a
>> processing instruction than a description of the text. But I take your
>> point. Let's see if anyone comes up with any suggestions better than
>> either of ours. :-) (It would be nice if what we suggested in the
>> example was something that is actually being used... and if we come to a
>> consensus I'll recommend changing EpiDoc usage to whatever we use in the
>> example in the guidelines.
>>
>> (If we don't come to a consensus, as you say, no problem.)
>>
>> G
>>
>> Lou Burnard wrote:
>>> Sorry, but I do not follow your logic. "nobreak" says something about
>>> the type of <lb> -- it is a "non-breaking" line break.  The implication
>>> is that other <lb> (or <cb> etc) s are "breaking" i.e. they are
>>> understood not only to mark the start of a line, column etc, but also to
>>> break  a word. so that foo<lb/>bar should be considered to be two words.
>>>
>>> There are breaks between your words conceptually, I hope? If not, what
>>> is the point of trying to distinguish types of <lb> anyway?
>>>
>>> If epidockers dont like this though they can always make up their own
>>> terminology -- the type value is not constrained by the schema.
>>>
>>> Gabriel Bodard wrote:
>>>> I'm not sure I like "nobreak", as it doesn't really say anything about
>>>> the status of the lb (or, as Dot points out, cb, pb, etc.); especially
>>>> since there are never (or rarely) breaks _between_ words in our texts.
>>>> The idea behind "worddiv" was that this is a linebreak that appears
>>>> mid-word, splitting it atwain, as Dan has it. Let me canvas the EpiDoc
>>>> markup list, and see if people there have opinions one way or the other
>>>> to contribute to this...
>>>>
>>>> G
>>>>
>>>> Lou Burnard wrote:
>>>>
>>>>> After much head scratching here in Oxford, we've decided on "nobreak"
>>>>>
>>>>> I added a couple more examples and a bit more discussion, taking
>>>>> examples from some real projects too. Affected are the definition for
>>>>> <lb> and the discussion of milestones in CO.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Daniel Paul O'Donnell wrote:
>>>>>
>>>>>> I think "word-dividing" in this case means "splitting individual words
>>>>>> atwain" rather than "demarcating their boundaries" ;)
>>>>>>
>>>>>> In my edition of Cædmon's Hymn I needed to encode space and lb
>>>>>> similarly explicitly: i.e. indicating whether it fell within the word
>>>>>> or between words: the stylesheets (such as they were in those days)
>>>>>> handled them differently depending on the value of @type (which I'd
>>>>>> made universal). White space wouldn't have done it for me, because I
>>>>>> was reformatting the data with and without the word-internal spaces
>>>>>> and lines depending on the view the user selected.
>>>>>>
>>>>>> -dan
>>>>>>
>>>>>> Lou Burnard wrote:
>>>>>>
>>>>>>> Gabriel BODARD wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Lou Burnard wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>> (9) lb: should we add an example of the usage of
>>>>>>>>>> lb/type=word-dividing, which currently sits a little uncomfortably
>>>>>>>>>> in the note. I suggest "Cae<lb type="worddiv"/>sari".
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Don't know what note you're referring to. Don't see the point of
>>>>>>>>> the @type attribute. Haven't done anything.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> This was discussed some months ago, and is the reason @type was
>>>>>>>> allowed on <lb> in the first place. There is currently a note at the
>>>>>>>> bottom of LB that says: "The type attribute may be used to
>>>>>>>> characterize the linebreak in any respect, for example as
>>>>>>>> word-breaking or not." We have literally thousands of examples of
>>>>>>>> this in EpiDoc files, where words are not always tagged explicitly
>>>>>>>> and it's the only way we can be sure to tokenize correctly. I just
>>>>>>>> thought an example would help to clarify the use-case.
>>>>>>>>
>>>>>>>> (If people feel strongly that [e.g.] "wordDividing" would be a
>>>>>>>> better recommended value than "worddiv", I'm happy to make that part
>>>>>>>> of our P5 upgrade script.)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> I don't mind adding examples, but this one confuses me. Isn't the
>>>>>>> point that the <lb/> in your example does NOT divide the word ? so
>>>>>>> both "wordDividing" and "worddiv" seem exactly the opposite of what
>>>>>>> you want here. How about "nowordbreak" or "nwb"?
>>>>>>>
>>>>>>> I know I lost this argument last time, but I still think in practice
>>>>>>> I'd deal with this by putting in whitespace where the <lb> coincided
>>>>>>> with a word boundary and leaving  it out where it didn't!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> G
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> tei-council mailing list
>>>>>>> tei-council at lists.village.Virginia.EDU
>>>>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>>>>
>>>>>>>
>>>> _______________________________________________
>>>> tei-council mailing list
>>>> tei-council at lists.village.Virginia.EDU
>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>
>>>
>>
>> --
>> Dr Gabriel BODARD
>> (Epigrapher & Digital Classicist)
>>
>> Centre for Computing in the Humanities
>> King's College London
>> 26-29 Drury Lane
>> London WC2B 5RL
>> Email: gabriel.bodard at kcl.ac.uk
>> Tel: +44 (0)20 7848 1388
>> Fax: +44 (0)20 7848 2980
>>
>> http://www.digitalclassicist.org/
>> http://www.currentepigraphy.org/
>> _______________________________________________
>> tei-council mailing list
>> tei-council at lists.village.Virginia.EDU
>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>
>
>
>
> --
> *~*~*~*~*~*~*~*~*~*~*
> Dot Porter (MA, MSLS)          Metadata Manager
> Digital Humanities Observatory (RIA), Regus House, 28-32 Upper
> Pembroke Street, Dublin 2, Ireland
> -- A Project of the Royal Irish Academy --
> Phone: +353 1 234 2444        Fax: +353 1 234 2400
> http://dho.ie          Email: dot.porter at gmail.com
> *~*~*~*~*~*~*~*~*~*~*
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>



-- 
*~*~*~*~*~*~*~*~*~*~*
Dot Porter (MA, MSLS)          Metadata Manager
Digital Humanities Observatory (RIA), Regus House, 28-32 Upper
Pembroke Street, Dublin 2, Ireland
-- A Project of the Royal Irish Academy --
Phone: +353 1 234 2444        Fax: +353 1 234 2400
http://dho.ie          Email: dot.porter at gmail.com
*~*~*~*~*~*~*~*~*~*~*


More information about the tei-council mailing list