[tei-council] s vs seg, ticket 578

Lou Burnard lou.burnard at retired.ox.ac.uk
Sat Jun 8 06:15:28 EDT 2013


On 06/06/13 01:14, Piotr Bański wrote:
>
> It's easy to look at <s>, however, as both a span and a node in the 
> syntactic constituent analysis, and this may be the entry point to the 
> "controversy".
>

Well, that depends on what you mean by "easy". Yes, it is easy if you 
entirely ignore the three or four times in the Guidelines where it is 
explicitly pointed out that this is not what it's for, and also if you 
overlook the existence of <seg> which does do this!
> Isn't it good to realize a problem that has been lurking there for 
> years. My remark concerned <s> seen from a syntactic constituent 
> perspective -- in many cases, you want to make sure that it can be 
> self-nesting:

The lurking problem, if there is one, is that people don't always read 
what it actually says about how to use these elements...

>
> [S [S Jim likes wine ] but [S Jenny prefers beer] ]
>
> You don't want this in typical sentence-boundary annotation, where you 
> want <s> elements exhaustively covering the entire text. But when you 
> try to make it perform both duties (it's a bit like with that sketch 
> of a cube where you don't know if the particular corner is sticking 
> out towards you or rather away from you), problems may pop up.
>

But why try to make it do both duties, when there is another element 
provided exactly so you don't have to?

> One quick solution seems to abandon the recommendation to use <s> in 
> the syntactic constituency context, but how to do that, other than by 
> cruelly never permitting
>
> <s> <phr><w/><w/></phr> <w/> <phr> <w/> </phr> </s>
>
> or, in the very same vein,
>
> <s> <s/> <w/> <s/> </s>  (see above for an example)
>
> .. I don't know. (Because what's above seems an attractive way to 
> quickly annotate syntactic structure, so why not permit it).
>

Indeed it is. Which is why there is an example showing use of <s> with 
<w> and <phr> in exactly this way!


> Maybe conditionally, by saying in one place in the Guidelines that on 
> the span-based perspective, you don't typically want <s> to self-nest, 
> and in the chapter on syntactic structure, by allowing it.
>

I really don't think it would be wise to make the properties of an 
element chapter-dependent!
> Another solution is to bite it outright and allow <s> to self-nest 
> across the board, and to delegate the introduction of a possible ban 
> on self-nesting of <s> to the particular implementer -- a clean 
> customization, wouldn't it be.

I stand by my assertion that this is just wrong. It would not be a clean 
customization because it violates the semantic constraint that <s> is 
for end-to-end segmentation which (I am getting tired of saying this) is 
explicitly stated several times in the Guidelines

>
> There is a precedent, from another corpus linguist, in a rather 
> well-tested format:
>
> http://www.cs.vassar.edu/CES/dtd2html/cesDoc/s.html
>

Don't get me started....



More information about the tei-council mailing list