[tei-council] quotation marks, quotes, etc.

Dan O'Donnell daniel.odonnell at uleth.ca
Wed Apr 18 14:04:00 EDT 2007


I understand the point of this much more clearly now and see that it is
not necessarily creating artificial or uncommon distinctions. For
reasons that have never been clear to me (which may be where the problem
lies) q:quote is one of the distinctions I always need to look up, and I
can see me having to do the same with q:quote:said. But I think that's
me rather than the element.

So basically the idea is:

a) if you don't care about distinguishing anything that might appear in
quotations, or can't decide use q

b) If you do care, use quote for things that are documentable in
principle--i.e. quotations of speech or text or whatever from somewhere
else.

c) If you are reporting what was said, written, or thought
spontaneously, use said.

<p>President Bush read <quote>underestimate</quote> from the
teleprompter, but said <said>misunderestimate</said>. He got the next
bit right, though, when he said <said>us</us> which agreed exactly with
the teleprompter's <quote>us</quote>.</p>

<p>When I said <quote>The insurgency's in its last throes</quote>, I
didn't say how long they'd last.</p>


I think that it is useful to keep open the possibility of marking
indirect speech, though of course that could be done using <seg>, since
I can think off the top of my head, as I was saying to Syd, of a
research project that could usefully mark direct and indirect speech for
analysis.

-dan



On Wed, 2007-18-04 at 05:44 -0400, Syd Bauman wrote:
> In the following, I'm going to use "said" as the name of the proposed
> element for direct speech or thought only. The name is not carved in
> stone, but I think Sebastian, Dan, Julia and I all like it better
> than anything else that has been suggested.
> 
> 
> > I really do not agree with Syd on this issue 
> > As I understand it, ...
> 
> I'm curious then, Lou, whether you've changed your mind or it's just
> that I did such a bad job explaining the proposal, you didn't
> recognize it as the one to which we had already agreed! No matter,
> really, but I suspect I must have done a very poor job of explaining
> it, for your recap does not match the proposal well at all. Let me try
> again.
> 
> 
> There were actually 2 proposals rolled up in my previous e-mail, so I
> will separate them explicitly here.
> 
> In all cases here typographic distinction is no more important for
> these elements than it is for <soCalled> et al. That is, it is not
> relevant in the abstract, although it is important to some encoders &
> projects.
> 
> 
> --- 1 ---
> Lou: this first proposal is exactly what we spoke at length about on
> the phone ~3 weeks ago. You were behind this proposal as long as we
> leave <q> as the general-purpose element.
> 
> <said> is for direct speech (or its discursive equivalents: e.g.
>        reported thought or speech, dialog, etc.), whether real or
>        contrived, typically as part of the current text, although I
>        suppose one could imagine otherwise. Most common usage is
>        likely to be a character's spoken words in a novel or a
>        person's spoken words reported in a non-fiction article. In
>        English prose it will very often be associated with phrases
>        like "he said", or "she asked". <said> would not be a viable
>        child of <cit>.
> 
> <quote> is for material that is quoted from sources outside the text,
>         whether correctly or not, whether real or contrived, whether
>         originally spoken or written. Most common usage is likely to
>         be quoting passages from other documents. May be used in a
>         dictionary for real or contrived examples of usage. <quote>
>         would continue to be a viable child of <cit>.
> 
> <q> is for passages quoted from elsewhere; in narrative, either
>     direct or indirect speech or something being quoted from outside
>     the text; in dictionaries, real or contrived examples of usage.
>     <q> would continue to be a viable child of <cit>, for those who
>     don't use the more specific <quote>.
> 
> That is, <q> remains exactly as it was in P4; <quote> remains as it
> was in P4; <said> takes the role of only the "direct speech or thought"
> subset of what <q> handles.
> 
> There are of course cases where people speak quotations or quotations
> include direct speech, in which case <quote> can go in <said> or
> vice-versa. 
> 
> Some people assert that they can't see the distinction between <quote>
> and the proposed <said>, and I'm sorry to say I simply can't
> understand this point of view. While there must be some difficult edge
> cases, as a general rule I do not think it is difficult to tell these
> two phenomena apart at all. I just picked up a mid-20th century
> science fiction novel and flipped through the pages. I had no trouble
> at all differentiating (the vast majority were <said>). I read an
> article in the NY Times over the weekend -- again, no trouble at all
> differentiating. I just asked Julia, and she said that in the creation
> of the entire WWP corpus so far there has never been a case where an
> encoder (most of whom are undergraduates) has had any difficulty making
> the distinction. There are 17,104 occurrences of <q> or <quote>
> start-tags in the distributed WWP corpus[2]. She said "it's harder to
> tell what is and isn't a <list> than it is to [differentiate between
> <said> and <quote>]".
> 
> There is one definite hole in this proposal: as it is worded, the
> encoding of indirect speech (e.g. the bit about grapes in "He said
> that he doesn't like grapes") is forced into <q>. The only two
> reasonable solutions I see are:
> A. add indirect speech to semantics of <said>
> B. remove indirect speech from semantics of <q>
> Personally, I really don't care. I have never seen anyone encode
> indirect speech ever, but that doesn't mean there aren't cases out
> there.
> 
> Here is a passage from an article about an anti-war protest that was
> held in Boston while we were in Sofia at the 2005 MM.
> 
>    <p>One demonstrator carried a sign that read, <quote>Bush Wants
>    Your Children For Cannon Fodder,</quote> and another ...</p>
>    <p>[Cindy Sheehan] mentioned a woman who had once e-mailed her
>    after she cursed the Bush administration.</p>
>    <p><said>She said, <said>Cindy, don't you want to use a little
>    nicer language, because you know there might be people sitting on
>    the fence that you offend,</said></said> Sheehan told the crowd.
>    <said>And do you know what I said? I said, <said>Damn it, why is
>    anybody on that fence still?</said></said>
>    <p><said>A lot of people will come up to me and say, <said>My
>    country right or wrong,</said></said> Sheehan added later.
>    <said>And you know what I say? When my country is wrong, it is so
>    wrong, and it is mandatory for us to stop it, to stop the killing,
>    to stop the people in power.</said>
>    -- http://www.commondreams.org/headlines05/1030-05.htm
> 
> I will also note that off the top of my head I can't really see why an
> example of usage in a dictionary would be <q> instead of <quote>.
> Laurent?
> 
> 
> --- 2 ---
> The second proposal is essentially an offshoot or amplification of the
> first. It stems from my observation that there are people out there
> who not only don't want to differentiate between <said> and <quote>,
> but who don't want to make the distinctions between <soCalled>,
> <mentioned>, <term>, etc., either. In many cases these encoders don't
> use any element (they just transcribe the quotation marks), but IIRC
> at least one library project used <q> for all of them. This second
> proposal leaves <said> and <quote> as they were in the first proposal,
> but expands the catchment of <q> to include all of these often-
> enclosed-in-quotation-marks sorts of phrase level phenomena:
> 
> <q> id for any of a number of features when differentiating among
>     them is not desired, e.g. because it is economically not feasible
>     or simply not of interest for the current purpose. Items that may
>     be encoded this way include
>     - representation of speech or thought
>     - quotation
>     - technical terms and glosses
>     - passages mentioned, not used
>     - authorial distance
>     and perhaps even
>     - from a foreign language
>     - linguistically distinct
>     - emphasized
>     - any other use of quotation marks in the source
> 
> 
> Notes
> -----
> [1] "Outside the text" here does not necessarily mean "not a
>     descendant of my ancestor <text> element", but rather something
>     quite a bit less precise, more along the lines of "not from 'round
>     here". I.e., something in chapter 1 may be a <quote> even if the
>     thing being quoted is a passage from chapter 3 of the same
>     document. This, of course, blurs the line a bit, and may be worth
>     consideration. 
> [2] I did not say "elements" because many of these <q> and <quote>
>     elements may be partial elements which, via next= and prev=, are
>     only part of a complete aggregate element.
> 
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
-- 
Daniel Paul O'Donnell, PhD
Chair, Text Encoding Initiative <http://www.tei-c.org/>
Director, Digital Medievalist Project <http://www.digitalmedievalist.org/>
Associate Professor and Chair of English
University of Lethbridge
Lethbridge AB T1K 3M4
Vox: +1 403 329 2378
Fax: +1 403 382-7191
Homepage: http://people.uleth.ca/~daniel.odonnell/




More information about the tei-council mailing list