[tei-council] s vs seg, ticket 578

Gabriel Bodard gabriel.bodard at kcl.ac.uk
Sun Jun 9 06:52:18 EDT 2013


I'm inclined to agree with Lou on this: <s> and <seg> are pretty clearly 
distinct in both semantics and content model, and any redefinition would 
seem only to muddy this clarity, which I've never found confusing 
(although I've never felt the need to tag sentences, either). A sentence 
of a special case of an arbitrary segmentation of text, and sentences in 
normal prose both tessellate and don't nest (except I suppose in quoted 
speech).

On the other hand, I have two small reservations:

1. if a corpus linguist were to tell me that in certain contexts 
"sentences" (much less "sentence-like divisions"[*]) may nest, I'd have 
to concede that she knows better than I, and maybe this should be 
allowed in markup.

2. the fact that <s> and <seg> would then have similar models doesn't 
seem a very major objection, since the same could be said of <phr> and 
<seg>: phrases may self-nest (quite rightly), but that doesn't reduce 
the utility of this element as a semantic specialization of <seg>.

[* any strained definition of "sentence-like division" wouldn't convince 
me so much--why not use <seg> or propose a new specialised element if 
it's really something different from a sentence?]

Is there a reason, by the way, that <s> is only prevented from 
self-nesting via a schematron rule, rather than being prohibited in 
ODD/RNG directly?

G

On 08/06/2013 11:15, Lou Burnard wrote:
> On 06/06/13 01:14, Piotr Bański wrote:
>>
>> It's easy to look at <s>, however, as both a span and a node in the
>> syntactic constituent analysis, and this may be the entry point to the
>> "controversy".
>>
>
> Well, that depends on what you mean by "easy". Yes, it is easy if you
> entirely ignore the three or four times in the Guidelines where it is
> explicitly pointed out that this is not what it's for, and also if you
> overlook the existence of <seg> which does do this!
>> Isn't it good to realize a problem that has been lurking there for
>> years. My remark concerned <s> seen from a syntactic constituent
>> perspective -- in many cases, you want to make sure that it can be
>> self-nesting:
>
> The lurking problem, if there is one, is that people don't always read
> what it actually says about how to use these elements...
>
>>
>> [S [S Jim likes wine ] but [S Jenny prefers beer] ]
>>
>> You don't want this in typical sentence-boundary annotation, where you
>> want <s> elements exhaustively covering the entire text. But when you
>> try to make it perform both duties (it's a bit like with that sketch
>> of a cube where you don't know if the particular corner is sticking
>> out towards you or rather away from you), problems may pop up.
>>
>
> But why try to make it do both duties, when there is another element
> provided exactly so you don't have to?
>
>> One quick solution seems to abandon the recommendation to use <s> in
>> the syntactic constituency context, but how to do that, other than by
>> cruelly never permitting
>>
>> <s> <phr><w/><w/></phr> <w/> <phr> <w/> </phr> </s>
>>
>> or, in the very same vein,
>>
>> <s> <s/> <w/> <s/> </s>  (see above for an example)
>>
>> .. I don't know. (Because what's above seems an attractive way to
>> quickly annotate syntactic structure, so why not permit it).
>>
>
> Indeed it is. Which is why there is an example showing use of <s> with
> <w> and <phr> in exactly this way!
>
>
>> Maybe conditionally, by saying in one place in the Guidelines that on
>> the span-based perspective, you don't typically want <s> to self-nest,
>> and in the chapter on syntactic structure, by allowing it.
>>
>
> I really don't think it would be wise to make the properties of an
> element chapter-dependent!
>> Another solution is to bite it outright and allow <s> to self-nest
>> across the board, and to delegate the introduction of a possible ban
>> on self-nesting of <s> to the particular implementer -- a clean
>> customization, wouldn't it be.
>
> I stand by my assertion that this is just wrong. It would not be a clean
> customization because it violates the semantic constraint that <s> is
> for end-to-end segmentation which (I am getting tired of saying this) is
> explicitly stated several times in the Guidelines
>
>>
>> There is a precedent, from another corpus linguist, in a rather
>> well-tested format:
>>
>> http://www.cs.vassar.edu/CES/dtd2html/cesDoc/s.html
>>
>
> Don't get me started....
>

-- 
Dr Gabriel BODARD
Researcher in Digital Epigraphy

Department of Digital Humanities
King's College London
26-29 Drury Lane
London WC2B 5RL

E: gabriel.bodard at kcl.ac.uk
T: +44 (0)20 7848 1388

http://www.digitalclassicist.org/
http://www.currentepigraphy.org/



More information about the tei-council mailing list