[tei-council] Chapter 17 - Simple Analytic Mechanisms

Thu Jan 31 14:20:37 EST 2008

Brett Zamir wrote:
> Have you considered giving the chapters a reordering? I'd think (as 
> Lou indicated as well) that chapters 22 and 23 really might be put 
> together in a block with the first core chapters to emphasize their 
> general and basic nature and maybe some other changes.
>
> Here's my own suggestion:
>
> *Dealing with the basic infrastructure:
> *
> #1 The TEI Infrastructure
> #2 The TEI Header
> #3 Elements Available in All TEI Documents
> #4 Default Text Structure
> #15 Corpora and Language Corpora /<Might go under genres, but I think 
> it deals enough with a fundamental infrastructure issue to merit going 
> here>/
> #5 Representation of Non-standard Characters and Glyphs
> #13 Names, Dates, People, and Places
> #22 Documentation Elements
> #23 Using the TEI
>
> *Special Features:*
>
> #16 Linking, Segmentation
> #14 Tables, Formulae, and Graphics
> #19 Graphs, Networks, and Trees
> #20 Non-hierarchical Structures
> *
> Meta-information:*
>
> #21 Certainty and Responsibility
> #17 Simple Analytic Mechanisms
> #18 Feature Structures
> *
> Special Genres:*
>
> #6 Verse
> #7 Performance Texts
> #8 Transcriptions of Speech
> #9 Dictionaries
>
> *  Ancient Texts Genre:  (though these might perhaps also go under 
> Meta-information)
> *    #10 Manuscript Description
>     #11 Representation of Primary Sources
>     #12 Critical Apparatus

The current order is what we arrived at after quite a bit of debate: Two 
factors prevailed in establishing the current order though -- we didn't 
want to introduce a two level hierarchy into the body of the text (4 
levels of subdivision is already too many...) -- and we didn't really 
expect many readers  to begin at the beginning, go on to the end, and 
then stop. It's a reference manual, for dipping into. That said, there 
is a clear progression from general to increasingly specific topic 
matter from beginning to end, with the USE chapter as a coda. However, 
your proposed re-ordering is certainly feasible -- may I suggest that 
you post it as a feature request on source forge for further discussion?

>
>
> *17.1 Linguistic Segment Categories
> *
> 1) Out of curiosity, anyone actually go down to the phonemic 
> representation level in TEI? If so, why no tag?
>
I'm not aware of anyone having done this. Such a segmentation would of 
course interfere with other levels of linguistic analysis (there is some 
reference to this problem in the chapter on transcribed speech if I 
remember rightly).

> 2) When the docs state, "the <gi>c</gi> element can contain only plain 
> text, and will often contain only a single character", is this
> because a combining diacritic and its base form might be allowable 
> together (as it presumably should be, especially since the guidelines 
> recommend using these over precombined forms)? If so, might the 
> reference stating "Should only contain a single character or an entity 
> that represents a single character" be emended to refer to such 
> combination characters as well?
>

I've made the descriptions consistent.

> 3) One example here has clauses of type "finite-declarative" and 
> "declarative-finite". Any problem with that?
>
Err, not as far as I know. They mean different things.

> 4) Might the line,
>
> "The  lemma attribute may be used to specify  the lemma, that is the 
> head- or base- form of an inflected verb or noun, for example"
>
> be changed to:
>
> "The  lemma attribute may be used to specify  the lemma, that is the 
> head- or base- form of an inflected form (or of a non-standard form). 
> For example,"
>
> In our texts, we plan to use <w lemma> to indicate what the standard 
> form of a non-standard transliteration is...
>
I am not sure what you mean by "non-standard" here, but lemmatization is 
definitely not the same thing as regularization.  Using the @lemma 
attribute to regularise nonstandard orthography sounds like 
attribute-abuse to me... you should be using the <reg> element for this 
purpose.

> *17.3 Spans and Interpretations*
>
> An example states "other spans identified by DTL here". Who is DTL? 
> Should there be a @resp on the <spanGrp>?
>
D. Terence Langendoen, who originally drafted much of this chapter. And 
yes, there should!

> *17.4 Linguistic Annotation
> *
> 1) Is whitespace inevitable between <w ana="#NN1">victim</w> and <w 
> ana="#POS">'s</w> as there was whitespace in the CLAWS output? If so, 
> do you want to add mention of the shortcoming that this adds?
This is quite a headache (and is commented on elsewhere I think); 
mentioning it again here would distract from the main point of 
discussion which is the analysis codes themselves.

>
> 2) Why does the line, "However, analysis into phrase and clause 
> elements can be superimposed on the word and morpheme tagging in the 
> preceding illustration." begin with "However"? Might this be clarified?
>
Stylistic tic. I have deleted it.

> 3) I changed the line "*These mechanisms all depend to a greater or 
> lesser degree *on the ability to associate a unique identifier with 
> any element in a TEI-conformant text, and then to specify that 
> identifier as the target of a pointing element of some kind." to 
> "*Many of these mechanisms will depend *on the ability to associate a 
> unique identifier with any element in a TEI-conformant text, and then 
> to specify that identifier as the target of a pointing element of some 
> kind." since XPointer doesn't necessitate use of identifiers at all.
Actually, since we are using XPointer, the whole sentence needs revision 
to indicate that other kinds of pointer would work too.

>
> For note 69, why is it required that the whole text be segmented into 
> <s> if it is segmented?
Because that's how <s> is defined. It provides end-to-end segmentation 
of the whole text. That's what it's for.