9.349 encoding and interpretation

Humanist (mccarty@phoenix.Princeton.EDU)
Sun, 3 Dec 1995 12:23:49 -0500 (EST)

Humanist Discussion Group, Vol. 9, No. 349.
Center for Electronic Texts in the Humanities (Princeton/Rutgers)
http://www.princeton.edu/~mccarty/humanist/

[1] From: "C. M. Sperberg-McQueen" (202)
<U35395%UICVM.BitNet@pucc.Princeton.EDU>
Subject: Re: 9.343 encoding

[2] From: Lou Burnard <lou@vax.ox.ac.uk> (52)
Subject: Re : encoding and interpretation

[3] From: Patrick Durusau <pdurusau@emoryu1.cc.emory.edu> (89)
Subject: Re: 9.343 encoding

--[1]------------------------------------------------------------------
Date: Sat, 02 Dec 95 10:58:20 CST
From: "C. M. Sperberg-McQueen" <U35395%UICVM.BitNet@pucc.Princeton.EDU>
Subject: Re: 9.343 encoding

On Sat, 2 Dec 1995 00:04:59 -0500 (EST) Ian Lancashire said:
>I agree that SGML concerns encoding syntax and TEI proposes a specific
>tagset that interprets textual phenomena. However, I don't fall into
>the confusion he thinks. SGML syntax is interpretative.

I don't have time to do this topic justice, but I'd like to make two
quick points on this topic.

It is certainly true that SGML markup reflects an interpretation of the
text, or even (as Ian Lancashire said in an earlier posting), "SGML
imposes an interpretation." But the restriction to SGML is puzzling and
unwarranted. It's like saying "In Toronto, fire consumes oxygen and
thus you can only light a match when there's sufficient oxygen around."
This is true as far as it goes, but needlessly restrictive. It's not
just in Toronto that fire consumes oxygen, and it's not just
representation of text in SGML that reflects, or imposes, interpretation
of the text. All electronic representation of text -- like all printed
representation of text -- expresses an understanding, and therefore
necessarily an interpretation of the text represented. (In pathological
cases, we may prefer to say that it represents a gross *failure* to
understand the text, but that is merely another way of saying that it
reflects a poor, implausible interpretation.) That is, it is not just
SGML, but representation in general, that has hermeneutic implications
and preconditions.

>SGML by its very nature demands that an editor interpret a text.
> ...
>Look at Charles Goldfarb's SGML Handbook (1990), pp. 7-8.
> ...
> [SGML] Markup should describe a document's structure
> and other attributes rather than specify processing
> to be performed on it, as descriptive markup need be done only
> once and will suffice for all future processing.
>
>What Goldfarb dismisses as procedural markup is often the *only*
>markup a scholarly editor can supply in good conscience because the editor
>does not know what the author's intentions were. Such things as "the
>skipping of vertical space, the setting of a tab stop, and the offset,
>or "hanging indent", style of formatting", etc. (p. 7) -- convert
>these elements if you will into the basic features of layout of early
>books -- are fundamental to many conservative scholarly editions.
>Goldfarb dismisses them as unimportant.
>
>Lou Burnard's and Michael Sperberg-McQueen's introduction to
>SGML in TEI P3 rightly stresses SGML tagging as interpretative.
>TEI adduces, for example, italics as typical procedural markup and
>emphasis as typical descriptive markup *for the same textual phenomenon*.
>
>Series such as the Malone Society editions -- any conservative or
>diplomatic editorial convention -- cannot use SGML as it is defined
>by its authors for this reason. Scholarly editors are justifiably
>reluctant under some circumstances to encode italics as anything but
>italics.

Prof. Lancashire is here falling prey, I think, to a common confusion of
two distinct sets of polar opposites -- a confusion encouraged by many
careless writers on markup. He opposes procedural markup and
descriptive markup, when it would be better (in my opinion) to
distinguish first procedural from declarative markup, and separately to
distinguish presentational from analytic markup. Procedural markup is
often presentational, and vice versa, while declarative and analytic
markup were developed, introduced, and championed by substantially the
same people, who (like Goldfarb) promptly conflated the two concepts,
leading to the confusion visible both in Prof. Lancashire's note and
less expansively in Dr. Goldfarb's book.

(To make matters worse, linguists also distinguish descriptive
grammar from prescriptive grammar, and this opposition, too, is
relevant to discussions of the TEI, though it has nothing to do
with 'descriptive' markup.)

But the two concepts are quite different, and the difference is
important for the proper understanding of what SGML does and does not do
-- and even more important for understanding why Prof. Lancashire is
wrong to suggest that the TEI Guidelines make it impossible to encode
italics as italics, or to encode diplomatic transcriptions of textual
material.

Procedural markup can be interpreted only as instructions to a program
or device of some kind to perform this or that action. Declarative
markup can be interpreted, by contrast, not as instructions to do
something, but simply as a claim that something or other is true of a
particular passage or location in a text. SGML markup is inherently
declarative, because SGML provides no imperative or procedural
interpretation for markup, and no means to allow an SGML application to
impose such an interpretation either. This is in marked contrast to
other markup systems (Word Perfect, TeX, Office Document Architecture),
which general provide only procedural interpretations for their markup.

As with declarative programming languages, the key to making declarative
markup useful in a computer is to provide one or more procedural
interpretations which derive from the declarative interpretation and
coexist with it. Students of Prolog will be familiar with this
coexistence; SGML systems differ from Prolog in that the procedural
interpretation of Prolog code is fixed by the language definition, while
the procedural interpretation of SGML markup is not given by, and is not
even expressible in, SGML itself. Instead, the procedural
interpretation is given by style sheets or other processing guides; this
is the key to the reusability of SGML documents.

The distinction between presentational markup and what I call analytic
markup is different. Presentational markup describes, or imposes, the
typographic presentation of the text, while analytic markup identifies
the features of the text which are signaled by the typographic
conventions. Italics, says the University of Chicago Manual of Style,
may be used to signal rhetorical emphasis, or the title of a book, or
the mention (not the use) of a word, e.g. in a linguistic discussion.
To identify italics as italics one may use presentational markup; to
identify italics as signaling the title of a book, or the mention of a
word, one may use analytic markup.

In most pre-SGML systems, presentational markup is provided with a
procedural interpretation. But that does not mean that declarative
markup cannot be used for presentational features. (Goldfarb, in the
passage quoted, uses the word 'should', and not the word 'must'. He's a
careful man, and it's wise to pay attention to his modals.)

It is entirely possible to define presentational markup in purely
declarative terms: not 'shift into italics' but 'this phrase is in
italics'. Not 'go to the top of the next page' but 'this section begins
at the top of a page.' This point should be well known to readers of
Humanist, because during the development of the TEI Guidelines (I think
it must have been 1988 or 1989) there was a vigorous debate on Humanist
over whether the TEI should commit itself publicly and permanently to
'descriptive' markup, in the course of which the concept of
'descriptive' markup was analysed and found to involve both declarative
(non-procedural) markup and analytic (non-presentational) markup. The
TEI is committed to declarative markup, but for the reasons cited by
Prof. Lancashire, among others, the TEI does not require analytic
markup, and provides mechanisms to record, in declarative form, the
kinds of typographic information described by Prof. Lancashire.

To record italics simply as italics, or boldface as boldface, without
analysis, all one need to in a TEI document is tag the italics as <hi
rend=italics>italics</hi> and the boldface as <hi rend=bold>bold</hi>.

The HI element, and the global REND attribute have the advantage of
allowing the encoder to distinguish typographic features to whatever
level of detail is desired: some scholars may wish only to distinguish
italics from boldface and roman, while others may need to identify the
specific type face in the source (Garamond vs. Palatino), while some
might conceivably need to record quite subtle distinctions between
different versions of the same face (which cutting? worn type or new?
etc.) The level of detail needed will vary with the scholarly interests
of the encoder: as Prof. Lancashire has been arguing in recent
postings, it is essential to reserve as much freedom to the encoder as
possible, for that reason.

Since the TEI has no trouble at all recording the typographic
information mentioned by Prof. Lancashire, one may well ask what
objection he can have to the TEI or to other uses of SGML. (In San
Diego a year ago, Prof. Lancashire said that he was using SGML himself,
to record the physical structure of books -- has something happened to
persuade him that what he was doing was impossible?) I don't know the
answer.

Section 6.3.1 does recommend that analytic tagging be preferred to
presentational tagging where it is both economically feasible and
intellectually appropriate, on the grounds that for analytic purposes it
is more generally useful. Since the instances cited by Prof. Lancashire
are, by his own account, cases where analytic tagging would be
intellectually inappropriate, I would have expected him to have no
objection to this recommendation.

Some have argued (most memorably Paul Fortier) that analytic markup is
never appropriate, that the TEI ought to provide ONLY presentational
tags, and that to allow any other kind of markup was illegitimate.
Since subscribers to this logic cannot be objecting to the TEI's
providing the presentational tags they say they need, they can only be
objecting to the fact that presentational tags are not required of all
encoders. But as Prof. Lancashire has lately been arguing, the type of
markup you use reflects what you want to do. Since many scholars are
interested not in the history of bookmaking and typography, but in the
history of poetry, drama, rhetoric, the novel, etc., they may
legitimately prefer to mark, in their electronic texts, not typographic
but rhetorical and literary phenomena.

The TEI is the only markup scheme I know which goes to such lengths to
provide the necessary instrumentarium for scholars of such widely
divergent assumptions and interests.

In conclusion, just a couple of minor points.

The claim that Goldfarb dismisses presentational information as
unimportant surprises me; I don't remember anything in Goldfarb's work
which even remotely resembles such a claim, and without a more specific
citation I am reluctant to accept it.

>Discarding the requirement that SGML means descriptive markup
>(a basic principle asserted in its definitive manual) would be a good
>thing for scholarship. HTML, luckily for us, to a large degree does
so.

If descriptive markup is inherent in SGML, then it must also inhere in
HTML, which like the TEI is an SGML application.

>In my opinion, SGML was designed for authors of texts, people with
>absolute authority over its interpretation.

This is historic fact: SGML was designed for and by publishing groups.
But the encoding of historical documents, and the provision of
non-procedural information like linguistic analysis was explicitly
foreseen by some of the developers, who pushed hard for a clean
declarative language in order to make SGML usable for work other than
publishing by living authors. It is a tribute to the quality of their
work that SGML is in fact not merely usable for scholarly work, but
towers above all other possibilities.

SGML is superior to other available markup systems in large part because
it makes it possible to express one's understanding of the text more
subtly and with less distortion than other systems. This has given it
the reputation of being more 'interpretive' and less 'objective' than
other markup. The key difference, however, is not that SGML is
interpretive while other markup is not, but that SGML allows the encoder
to be more honest and specific about the interpretation of the text than
other markup systems do.

We should all be wary of the implicit interpretive pressures exerted by
our markup languages. But being wary only when using SGML, and assuming
that other markup systems are not equally interpretive, is like assuming
that, once you have gone outside the Toronto city limits, it's safe to
light a cigarette even inside an oxygen tent.

-C. M. Sperberg-McQueen
ACH / ACL / ALLC Text Encoding Initiative
University of Illinois at Chicago
u35395@uicvm.uic.edu / u35395@uicvm

"Clarity, Precision and Ease of use does not mean Confinement, Verbosity
and Futility." -Jean Pierre Gaspart

--[2]------------------------------------------------------------------
Date: Sat, 02 Dec 1995 18:45:21 +0000
From: Lou Burnard <lou@vax.ox.ac.uk>
Subject: Re : encoding and interpretation

Yes, SGML is interpretative -- but only in exactly the same way as any
other intellectual effort is. One Malone Society Editor looks at a
smudged, badly printed, page and says "Yes, that is a turned-e, and the
rest of the word is in italics". Another MSE looks at the same page and
says "No, it's an a and it only looks like italics because it's so
badly inked". I call that interpretation. As Professor Lancashire likes
to quote authorities, here is my authority:

"The prima facie evidence is limited to ink on paper, from
which can be inferred an arrangement of typefaces and types
during printing. The relationship of printed image to typeface
and to type is concrete, and our inferences about composition
are necessarily tied to an actual historic event"

[Randall McLeod "Spellbound" in Playtexts in old spelling, AMS Press, 1984]

Note McLeod's use of the word *infer*. We see the ink on the page. We
know that at some historic moment a particular constellation of type
metal and ink caused it. We INFER something about that constellation,
and what underlay it.

SGML encoding, whether of the TEI flavour or of any other, allows you
to make explicit your inferences. It's in the nature of things that
those inferences should cover a very wide range -- from "compositor
intended to use an italic typeface" at the one extreme to "author was
thinking of his mother at this point" at the other. For diplomatic
editions, inferences at the former end of the scale are likely to be
preferred to those at the other. That does not make them any the less
interpretative. (And indeed, I have read persuasive studies in which
interpretations at either end of this continuum cross-fertilize and
reinforce each other)

It is also mistaken to suggest that the TEI scheme does not allow you to
encode interpretative acts related to the physical appearance of a text.
On the contrary, if you actually read the section on highlighting text,
you will find the point made explicitly enough --

"...cases do exist in which it is not economically feasible to mark
the underlying function of highlighting... as well as cases in
which it is not intellectually appropriate (as in the transcription
of some older materials, or in the preparation of material for the
study of typographic practice)" [TEI P3, p 145]

The present TEI scheme consequently includes a number of simple low
level facilities for encoding such things as font changes, pagination
etc. (Some of them indeed included as a result of some rather more
positive input from Prof Lancashire in his role as a member of the TEI
Advisory Board). It also includes a more detailed tag set for the
encoding of inferences relating to the physical appearance of
manuscript sources, concerning which my former colleague Peter Robinson
has written at some length in his excellent work "The SGML Encoding of
Primary Sources" (OHC, 1994). Alas, the funding for TEI workgroups
expired before one on detailed analytic bibliography could proceed
beyond the planning stage. The formation of such a group now requires
only a little will power and some intellectual effort: it would be able
moreover to build on the cumulated experience of scores of TEI users
worldwide, all of whom have apparently been able to adapt the scheme to
their own needs without any of the ideological problems which appear to
bedevil Professor Lancashire.

Lou Burnard

--[3]------------------------------------------------------------------
Date: Sun, 3 Dec 1995 09:27:43 -0500 (EST)
From: Patrick Durusau <pdurusau@emoryu1.cc.emory.edu>
Subject: Re: 9.343 encoding

In his response to my post distinguishing SGML and the encoding standards
based upon SGML, Lancashire raises several issues concerning the use of
"descriptive markup" as defined by SGML. I agree that the original physical
layout of a textual witness is important to any scholarly edition of that
witness. I think I have shown by example below that such information is
not lost by the use of SGML based encoding standards. I treat the more
general objections to the use of such encoding standards first and then
turn to the examples of encoding difficulties offered by Lancashire.

Lancashire writes:

<omissions>

> SGML by its very nature demands that an editor interpret a text.
> SGML does not impose one specific interpretation; it demands that
> an interpretation *be made*. Yet scholarly editors often
> cannot encode a text in the way SGML requires. All they can do is
> to reproduce what they see on the page.

I assume here that Lancashire is arguing that present SGML based encoding
standards are inadequate to encode the features in original materials
that are extremely important to scholarly editions of those texts. I
have noted below possible encoding mechanisms for the specific examples
cited in his response. There are a number of encoding issues that remain
to be addressed for textual witnesses, but I have yet to see one cited
that could not be addressed within the framework of the TEI Guidelines.
(I do not see his statement as contrasting SGML, an extensible
metalanguage, as demanding "interpretation" versus present scholarly
editors whose present methods are not interpretive. I prefer to avoid the
erection of strawmen issues on this important topic.)

> What Goldfarb dismisses as procedural markup is often the *only*
> markup a scholarly editor can supply in good conscience because the editor
> does not know what the author's intentions were. Such things as "the
> skipping of vertical space, the setting of a tab stop, and the offset,
> or "hanging indent", style of formatting", etc. (p. 7) -- convert
> these elements if you will into the basic features of layout of early
> books -- are fundamental to many conservative scholarly editions.
> Goldfarb dismisses them as unimportant.

Procedural vs. Descriptive Markup

The procdural markup "dismissed" by Goldfarb is that which instructs a
particular software program on the layout of textual material for display
or printing. Encoding the appearance of written material is not
"procedural markup" in the same sense as used by Goldfarb. That
interpretation of the SGML standard was followed in the TEI Guidelines in
its tags for the encoding of primary sources. (see Guidelines for
Electronic Text Encoding and Interchange, vol.1, pp. 529-557)

The TEI Guidelines address the issue of significant space in an original
text in section 18.2.5 of the Guidelines. The examples which Lancashire
cites from Goldfarb, skipping vertical space, tab stops, offset or
hanging idents, can all be encoded with the <space> tag. That tag allows
the encoder to indicate the location of the space, whether the space is
horizonal or vertical, the extent of the space in an appropriate unit and
the person who identified and measured the space.

> Series such as the Malone Society editions -- any conservative or
> diplomatic editorial convention -- cannot use SGML as it is defined
> by its authors for this reason. Scholarly editors are justifiably
> reluctant under some circumstances to encode italics as anything but
> italics.

The TEI Guidelines have a specific mechanism for the encoding of italics
which is the element known as highlighting. Ex. <hi> When the
highlighting tag is used, the encoder can specify in what way the text
was rendered that made it distinct from the surrounding text with the
rend attribute. For example, if I were quoting the following example
from Lancashire's response and the title SGML Handbook appeared in
italics in his post, present encoding practices under the TEI Guidelines
would allow the following encoding:

<q>Look at Charles Goldfarb's <hi rend=italics>SGML Handbook</hi>(1990),
pp. 7-8</q>

(In the interest of clarity, a number of TEI tags that would normally be
used in this instance have been omitted.)

As I noted above, work remains to be done on the development of encoding
guidelines for textual witnesses. Working groups were formed at the
Hebrew Lexicography Consultation at the AAR/SBL Annual Meeting in
Philadelphia, November, 1995, to address issues arising from the encoding
of textual witnesses of interest to biblical scholars. Announcements for
those groups are due to appear on the AIBI-L discussion list within the
next few weeks. (The AIBI-L discussion list can be found at
listserv@uottawa.bitnet or listserv@acadvm1.uottawa.ca, Gregory
Bloomquist, owner.) I am sure that all interested persons would be
welcome additions to these working groups. For those scholars interested
in textual witnesses for other areas or more general encoding issues, I
would suggest joining the discussions found on the TEI-L discussion list.
(The TEI-L discussion list can be found at listserv@uicvm.uic.edu)

I have purposely avoided discussion of the use of style sheets, the
proposed DSSSL, Document Style Semantics and Specification Language and
the Hytime standard as being beyond the general interest to the members of
this list. However, scholars who seek to encode primary materials should
be aware that much progress has been made in the use of SGML since
Goldfarb wrote the "definitive manual." SGML based encoding should not be
rejected on the basis of outdated or incomplete information. If you find
a textual feature that resists encoding under one of the SGML based
encoding standards, I am sure that other scholars would be interested to
learn of your discovery.

Patrick Durusau