[tei-council] <said> proposals available
Syd Bauman
Syd_Bauman at Brown.edu
Sun Jul 15 22:32:29 EDT 2007
The two separate proposals for the new <said> element are now
available in http://www.tei-c.org/Drafts/said/ (or will be as soon as
the server syncs). The two proposals are in files called
said-asis
said-q-hi
and each has .odd source and derived .doc.html, .rnc, and .rng files.
The two proposals are very similar. The only difference is how the
<q> element is defined. In the latter ("said-q-hi") proposal, the
Guidelines are explicit that <q> can be used for any of the various
underlying reasons that gets represented with quotation marks. I
prefer this proposal, in part because I think lots of people already
use <q> this way.
Here is a quick executive summary:
<said> is for direct speech (or its discursive equivalents: e.g.
reported thought or speech, dialog, etc.), whether real or
contrived, typically as part of the current text, although I
suppose one could imagine otherwise. Most common usage is
likely to be a character's spoken words in a novel or a
person's spoken words reported in a non-fiction article. In
English prose it will very often be associated with phrases
like "he said", or "she asked". <said> is not a viable child
of <cit>.
<quote> is for material that is quoted from sources outside the text,
whether correctly or not, whether real or contrived, whether
originally spoken or written. Most common usage is likely to
be quoting passages from other documents. May be used in a
dictionary for real or contrived examples of usage. <quote>
is still a viable child of <cit>.
--------- said-asis: ---------
<q> is for passages quoted from elsewhere; in narrative, either
direct or indirect speech or something being quoted from outside
the text; in dictionaries, real or contrived examples of usage.
<q> is still a viable child of <cit>, for those who don't use the
more specific <quote>.
--------- said-q-hi: ---------
<q> is for any of a number of features when differentiating among
them is not desired, e.g. because it is economically not feasible
or simply not of interest for the current purpose. Items that may
be encoded this way include
- representation of speech or thought
- quotation
- technical terms and glosses
- passages mentioned, not used
- authorial distance
and perhaps even
- from a foreign language
- linguistically distinct
- emphasized
- any other use of quotation marks in the source
Some tangentially related items I noticed:
* I think the example with <list type="speakers"> should be re-worked
so that the who= attributes are pointing to <person>s, but as I
don't speak French (that is French, right? -- there should really
be an xml:lang= on the <egXML>, no?) I am not a good choice to do
that work.
* In the last example of the section, the word "language" is encoded
as a <mentioned>, but I don't think that's right. I'm not very
confident about what *is* right, but I'd prefer <term> to
<mentioned>. (I suppose we could ask the co-author of the source
for the example, Terry Langendoen, who chaired the committee on
text analysis and interpretation back in the early 1990s :-)
Some unrelated changes I've made in the ODD:
* lowercase 'm' -> uppercase 'M' in description of em dash
* "the quotation is marked up as part of a concurrent but independent
hierarchy" changed to "the quotation is marked up using stand-off
markup", as we don't do concurrent hierarchies any more
* "the quotation boundaries are represented by empty milestone tags"
to "the quotation boundaries are represented by empty segment
boundary delimiter elements", as they're *not* milestones!
More information about the tei-council
mailing list