9.395 encoding & TEI

Humanist (mccarty@phoenix.Princeton.EDU)
Mon, 18 Dec 1995 22:04:33 -0500 (EST)

Humanist Discussion Group, Vol. 9, No. 395.
Center for Electronic Texts in the Humanities (Princeton/Rutgers)
http://www.princeton.edu/~mccarty/humanist/

[1] From: Ian Lancashire <ian@epas.utoronto.ca> (66)
Subject: Re: Ian Lancashire's address (fwd)

[To assure my colleagues, especially Richard Giordano, that messages to
me at ian@epas.utoronto.ca (note *ca*) still work fine. Frustration
-- as Willard might have guessed -- comes from trying to reach me
at ian@epas.utoronto.edu (note *edu*). Canada is still independent.]

Richard Giordano's proposal sounds fine with me. I really should try
to respond to some of the feedback since my last message, shouldn't I,
but I haven't *yet* read Lou's last contribution.

Still, a few more thoughts.

If any user's SGML tag-set and DTD are interpretative, then isn't
the TEI tagset also interpretative? Does it define -- as widely
claimed -- "a standard form for the interchange of textual material"
(TEI P3, p. 11)? It does define a form. Is it a standard one? Is it
likely that any SGML tagset represent a standard interpretation if our
colective experience is that nobody's interpretation every becomes
standard? In fact, as my recent golf magazine says in defence of its
own policy (to publish contradictory advice by wildly disagreeing
teachers on how to swing a golf club), the American way of life is
built on diversity.

The Vassar meeting in 1987 drew up principles of action for TEI, only
some of which were met. For example, TEI says it did not carry out 3.b-c:

b. define a metalanguage for the description of text-encoding
schemes,
c. describe the new format and representative existing schemes
both in that metalanguage and in prose.

Are these things really important?

To me they are. Instead of first discussing what a humanities encoding
scheme should provide -- for example, by looking at 20 years of
real encoding practice (by groups like TLG, TLF, CETEDOC, the OCP community,
etc., and I include WordCruncher and TACT users in the "etc.") -- TEI
seized on SGML as the final solution. TEI P3 does not discuss
the previous history of textual encoding either in the humanities or
in computational linguistics. Why not? TEI P3 does not discuss why
SGML was chosen as the format. Was there serious discussion of why
humanities encoding schemes really developed the way they had over 20
years and -- especially -- why they hadn't evolved into something like SGML?

There is little doubt that SGML syntax offers more encoding options
than any previous encoding scheme. Yet, are the assumptions built into
SGML suitable for scholarly editing or even for text as literary
critics understand it?

TEI P3 and the SGML community -- focusing (understandably) on the
public's need to get a cooperative electronic library online -- took
SGML, presumably because it made work easier. After all, if someone
else had devised an encoding interchange format for documents and got it
approved by the ISO and the American Publishers' Association, among
many other very credible organizations, can TEI really have gone wrong?

What troubles me most about SGML the syntax is that it is
interpretative itself. Its rigid assumption that only one structure
can be recognized (by SGML browsers, editors, etc.) at a time, and
that no more than two structures can be encoded in any document, jars
with what the humanities sees in texts. Most texts we study were plainly
not written by authors who understood SGML or even who agreed with the
SGML community that texts could have any dominant structure, let alone
a hierarchical one.

If our job is to analyze these texts, doesn't this thought make anyone
besides myself uneasy?

I want to thank Robin for directing interested people to my
contribution to the Calgary conference. Yes, I still think the TEI
Guidelines are brilliant and the best thing written yet on encoding;
but I suspect that it, like many brilliant people, overlooks
the obvious.

-- 
Ian Lancashire
Professor of English, New College
Director, Centre for Computing in the Humanities
Univ. of Toronto, Toronto, Ont. M5S 1A1, CANADA
Voice: (416) 978-8279; FAX: (416) 978-6519
E-mail: ian @ epas.utoronto.ca