3.393 encoding a la TEI (98)

Willard McCarty (MCCARTY@VM.EPAS.UTORONTO.CA)
Thu, 24 Aug 89 18:39:35 EDT

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Willard McCarty: "3.394 hieroglyphics grammar? Dante Project? (45)"
Previous message: Willard McCarty: "3.392 on Bitnet lists (149)"

Humanist Discussion Group, Vol. 3, No. 393. Thursday, 24 Aug 1989.

Date: 24 August 1989 09:17:53 CDT
From: "Michael Sperberg-McQueen 312 996-2477 -2981" <U35395@UICVM>
Subject: The Arcane TEI - a reassurance, I hope, for Bob Kraft

For a week or so, I've owed Bob Kraft an answer to his query about how
the TEI was coming along on recommendations for encoding textual
variants--and now today comes comes his note confessing to a
not-so-secret worry lest papyrology, epigraphy, and similar (dare I say
'arcane'?) disciplines not receive adequate attention in the TEI. Now I
really must answer.

Well, the proof of the pudding will be in the eating, and I won't ask
Bob to take it on faith that the TEI guidelines will be perfect on that,
or on any other score. But this I can say: papyrology and so on are
not being forgotten, and we will do our best to ensure that at the end
of the day the TEI guidelines provide a sound, reliable, portable basis
for "full, flexible, and compatible encoding" of papyrological,
epigraphic, codicological, and paleographic information from coffee
stains to archaic and archaizing spellings. How much of this will be
pre-prescribed and how much will fall into the class of user-definable
extensions can only be discovered in the course of work. But there are
too many philologists involved with the project to allow such critical
and challenging topics to fall by the wayside, and I think I can promise
that they won't.

Bob is right, too, about the need to develop a sufficiently
comprehensive set of encoding guidelines that most individual
applications fall out of the general set as subsets and special cases.
That is precisely the approach we are taking, and that is why our topics
include such a range of problems from those usually addressed by
publishers and other information-industry types, through those of
computational linguists, to those of traditional philologists,
not-so-computational linguists, literary scholars, philosophers, and
historians of varied stripes. The industrial concerns and the purely
scholarly concerned have much broader overlap than is commonly realized,
and no scheme addressing only one set of concerns will be as useful (or,
I think, as soundly based) as one addressing the full range as far as
possible. (Neither industrial organizations nor academic organizations
seem to realize this; both seem distinctly uneasy at being yoked with
the other, even metaphorically, in this project.)

The logic of seeking out the most general and comprehensive cases
first is also behind the planning of the project. The committee on
text representation is focusing first of all on the most general
problems common to a wide variety of texts (because they must set
the basic framework for a generally applicable comprehensive encoding
scheme). This means that some problems like those recently discussed
on Humanist aren't in the main stream of work during the first cycle
of work, but ensures that when they are taken up they will fit into
a sound general framework.

Equally important for the difficult tasks of encoding non-textual
accidentals (like coffee stains) is the work of the committee on
analysis and interpretation. During the first phase of the project
(through June, 1990) this committee is focusing on linguistic issues,
not because we are interested primarily or only in linguistics, but
because as far as we can see linguistic encoding presents the most
tightly constrained sets of interrelated demands on the expressive
power of the encoding scheme.

Linguistics, that is, seems to present examples of every kind of
technical difficulty for an encoding formalism: multiple layers of
analysis, each constrained by theory, which must interact cleanly in
ways also constrained by theory. Sometimes the linguistic units nest
and sometimes they don't; sometimes they are contiguous and sometimes
they aren't. When we have encoding conventions adequate to these
problems, I think the conventions needed to encode even very complex
papyrological and codicological data will be far easier to create, since
the techniques needed (co-indexing of non-contiguous segments, for
example) will be well developed already. That doesn't mean there won't
be problems. But the linguistic problems will create a firm basis for
developing papyrological encodings.

And, as advertised, the Text Encoding Initiative is interested in
cooperating with projects involved in encoding corpora, to allow the
draft guidelines to be tested before publication, and to receive
feedback from people working in the field. Projects interested in
such cooperation should contact the head of the steering committee
(Nancy Ide, IDE@VASSAR) or one of the editors (C. M. Sperberg-McQueen,
U35395@UICVM, or Lou Burnard, LOU@VAX.OX.AC.UK -- Janet readers will
I hope know how to un-reverse that node-address).

More reports on progress will follow after things calm down here.

-Michael Sperberg-McQueen
ACH / ACL / ALLC Text Encoding Initiative
University of Illinois at Chicago

Next message: Willard McCarty: "3.394 hieroglyphics grammar? Dante Project? (45)"
Previous message: Willard McCarty: "3.392 on Bitnet lists (149)"