[tei-council] quotation marks, quotes, etc.

Syd Bauman Syd_Bauman at Brown.edu
Sun Apr 15 18:44:20 EDT 2007


I'd like to address Trac ticket #304.

  Passages offset by quotation marks in the source may be encoded as
  a specific type of feature, e.g. mentioned not used (<mentioned>),
  authorial distancing (<soCalled>), quotation (<quote>), speech or
  thought (<q>); or may be encoded as "taken from elsewhere, details
  unknown or unsaid" (<q>).
  The problem here is that <q> is overloaded, serving two purposes.
  Need to develop a proposal to leave <q> as a generic (perhaps even
  more generic?) element, and introduce a new element for the "speech
  or thought" function.

I think the way forward here is pretty clear, and Lou & I agree to
the basic game-plan sketched out above. But there are still a couple
of potentially controversial issues. So here is a slightly more
detailed proposal, followed by questions.

* Retain <quote> as it is: passage attributed to some agency external
  to the text, i.e. a quotation from a written source. Remains a
  member of model.quoteLike, also a member of new model.quoted.
* New element <quo> for direct speech or thought. (I.e., not a
  quotation of a written source, not authorial distancing, not an
  example in a dictionary entry.) A member of new model.quoted.
* Change semantics of <q> to be a bit more broad, basically covering
  anything that was indicated in the source with quotation marks, but
  about which the encoder does not wish to say more. Essentially
  syntactic sugar for <hi rend="quotation marks">. A member of
  model.hiLike. 
* <cit> remains as is, becomes a member of new model.quoted

This system has the advantage of a clean break between quoting of
passages external to the text and direct speech or thought of, e.g.,
a character. But it also permits <q> to be used quite loosely, which
is good, because it reflects what lots of projects already do. 

That is,
- quotation could be encoded with <quote> or <q>
- that which a character speaks could be encoded with <quo> or <q>
- authorial distance could be encoded with <soCalled> or <q>
- words mentioned not used could be encoded with <mentioned> or <q>
- dictionary examples could be encoded with <quote> or <q> (what if
  they are contrived?)
- a filename that appeared in quotes could be encoded as <name> or
  <q>
- a filepath that appeared in quotes could be encoded as <ident> or
  <q> 
- a newly introduced term could be encoded as <term> or <q>

You can well imagine projects that encode all this stuff with <q> on
the first pass (because it is easier and thus less expensive), and
then on a second pass convert to more nuanced encoding for those
aspects they care about, and not others.

Questions:

* Does anyone strongly object to the basic idea?

* Is the name <quo> OK? (If not, please provide a suggested
  alternative :-)

* Have I got the model divisions correct?




More information about the tei-council mailing list