9.343 encoding

Humanist (mccarty@phoenix.Princeton.EDU)
Sat, 2 Dec 1995 00:04:59 -0500 (EST)

Humanist Discussion Group, Vol. 9, No. 343.
Center for Electronic Texts in the Humanities (Princeton/Rutgers)
http://www.princeton.edu/~mccarty/humanist/

[1] From: Ian Lancashire <ian@epas.utoronto.ca> (49)
Subject: SGML as interpretation

Patrick Durusau says that I "confuse the use of SGML with the
adoption of specific editorial principles" and points out that SGML is
not "synonymous with various specific guidelines" (like TEI) but is a
method for the documentation of editorial guidelines.

I agree that SGML concerns encoding syntax and TEI proposes a specific
tagset that interprets textual phenomena. However, I don't fall into
the confusion he thinks. SGML syntax is interpretative.

Durusau goes on to say, "I am not quite sure how anyone could reach the
conclusion that SGML `imposes interpretations on texts.' A particular
encoding of a text could certainly impose an interpretation, but that
is the responsibility of the editor and not the method used for the encoding."

I'm unrepentant. I'm uncertain how anyone could come to Durusau's view
after having read any SGML manual.

SGML by its very nature demands that an editor interpret a text.
SGML does not impose one specific interpretation; it demands that
an interpretation *be made*. Yet scholarly editors often
cannot encode a text in the way SGML requires. All they can do is
to reproduce what they see on the page.

Look at Charles Goldfarb's SGML Handbook (1990), pp. 7-8.
It distinguishes generalized markup like SGML from procedural
markup in this way:

[SGML] Markup should describe a document's structure
and other attributes rather than specify processing
to be performed on it, as descriptive markup need be done only
once and will suffice for all future processing.

What Goldfarb dismisses as procedural markup is often the *only*
markup a scholarly editor can supply in good conscience because the editor
does not know what the author's intentions were. Such things as "the
skipping of vertical space, the setting of a tab stop, and the offset,
or "hanging indent", style of formatting", etc. (p. 7) -- convert
these elements if you will into the basic features of layout of early
books -- are fundamental to many conservative scholarly editions.
Goldfarb dismisses them as unimportant.

Lou Burnard's and Michael Sperberg-McQueen's introduction to
SGML in TEI P3 rightly stresses SGML tagging as interpretative.
TEI adduces, for example, italics as typical procedural markup and
emphasis as typical descriptive markup *for the same textual phenomenon*.

Series such as the Malone Society editions -- any conservative or
diplomatic editorial convention -- cannot use SGML as it is defined
by its authors for this reason. Scholarly editors are justifiably
reluctant under some circumstances to encode italics as anything but
italics.

Discarding the requirement that SGML means descriptive markup
(a basic principle asserted in its definitive manual) would be a good
thing for scholarship. HTML, luckily for us, to a large degree does so.

In my opinion, SGML was designed for authors of texts, people with
absolute authority over its interpretation. For that reason,
SGML can certainly be used by presses to encode any work published
with the help of the author of that work.

Ian Lancashire