From:	CBS%UK.AC.RUTHERFORD.MAIL::CA.UTORONTO.UTCS.VM::POSTMSTR 14-JAN-1989 09:53:36.32
To:	archive
CC:	
Subj:	

Via: UK.AC.RUTHERFORD.MAIL; Sat, 14 Jan 89   9:50 GMT
Received: from UKACRL by UK.AC.RL.IB (Mailer X1.25) with BSMTP id 9331; Sat, 14
          Jan 89 09:50:04 GM
Received: from vm.utcs.utoronto.ca by UKACRL.BITNET (Mailer X1.25) with BSMTP
          id 1923; Sat, 14 Jan 89 09:49:54 G
Received: by UTORONTO (Mailer X1.25) id 0407; Fri, 13 Jan 89 14:46:36 EST
Date:     Fri, 13 Jan 89 14:46:07 EST
From:     "Steve Younker (Postmaster)" <POSTMSTR@CA.UTORONTO.UTCS.VM>
To:       archive@UK.AC.OXFORD.VAX

=========================================================================
Date:         1 December 1987, 00:21:22 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: Sterling Bjorndahl - Claremont Grad. School
             <BJORNDAS@CLARGRAD>
Subject:  range of discussion on HUMANIST
 
     I appreciate the concern about HUMANIST's self-editorial
policy.  There is a spirit missing from HUMANIST's discussions that
has been present in other discussions I have been a member of.  On the
other hand, I can remain a member of HUMANIST in good conscience
because it does not take much time away from my other duties, which
are considered by others here to be of a higher priority.
     Two specific examples:  I was a member of the info-c discussion
on ARPA (linked with the sister discussion group on usenet).  This was
a free-flowing discussion with frequent cries from subscribers asking
submitters to control themselves.  There was a great deal of redun-
dancy and even inanity mixed with a few nuggets of valid and even
brilliant discussion.  Although I enjoyed it immensely on the whole, I
had to quit because I couldn't afford that many hours of extra reading
per week.  After a while, the returns just weren't great enough to put
up with the noise.
     On the other end of the editorial spectrum was the arpanet RISKS
digest.  A "digest" means that the moderator is also an editor.  All
submissions are sent to him, and he exercises editorial judgement on
everything submitted.  Once or twice a week, as volume dictates, the
collected and edited submissions are mailed in one package to all sub-
scribers, with a refreshing dash of humour added.  The kind of give-
and-take conversations that have been referred to can still happen in
this environment, because the moderator is essentially benign unless
serious redundancies and/or inanities occur.  (I believe that the mod-
erator was getting full credit for his work in this, and it was proba-
bly a part of his job description.)  Nevetheless, even so edited, the
volume became more than I could deal with effectively (despite the
fascinating subject matter, by the way: risks to the public from com-
puters and automated systems).
     So although I find HUMANIST occasionally on the "dead" side, I
have no trouble maintaining my subscription since it does not demand
too much of my time. Discussions happen in private, and if I want to
get in on them I can contact the initiator. I admit, I wouldn't mind
seeing a bit more activity in HUMANIST on occasion, and I think people
with issues of broad interest (such as the recent discussion on the
OED) should feel free to bring these issues forward.  But if HUMANIST
has to err, I would rather it err on the dead side, lest I be forced
to resign.  Let my vote be so registered.
 
     Sterling Bjorndahl
     Institute for Antiquity and Christianity
     Claremont Graduate School
     Claremont, California
=========================================================================
Date:         1 December 1987, 09:25:37 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: Jim Cerny <J_CERNY@UNHH>
Subject:  The Dirty Dozen ... Plus???
 
This is just to add to the warning passed on by Stuart
Hunter about Lehigh's direct experience with a virus in
some publicly obtained copies of COMMAND.COM.
 
There are apparently a number of other programs that have
had work done to their genes to turn them into malignant
viruses.  They have come to be called "The Dirty Dozen,"
though there are more than a dozen.
 
These have been described in a number of computer center
newsletters in the last year or so.  The most recent description
I've seen was "Beware The Dirty Dozen: Software That Destroys,"
CAUSE/EFFECT, v. 10, n. 6, November 1987, pp. 44-45.  (which
is reprinted from the "Technical Update" publication at the
Univ. Cincinnati Computing Center, September 1, 1987).
 
        Jim Cerny
        University Computing, Univ. N.H.
        J_CERNY@UNHH
=========================================================================
Date:         1 December 1987, 09:34:18 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: Dr Abigail Ann Young  <young@utorepas>
Subject: Discussions [34 ll, counting this one]
 
Well, pace Sterling Bjorndahl, I don't find HUMANIST
on the dead side and I don't want to!  But I think I know
exactly what he's talking about.  Recently it seems that
queries or opinions appear and then die in electronic silence.
In fact, there seems in general to be less discussion now
than there was a few months ago.  I don't know to what to
attribute this.  It could  reflect a need on the part of those
of us who teach or provide services to students to prepare for
and then deal with the demands of a new academic session.
It could be that no-one has very much to say at the moment.
But I have wondered recently whether we were all feeling a
reluctance to say much brought on by our worthy moderator's
urgings towards self-editing (with the consequent responsiblity
of editing and posting a resulting conversation, if any) and
our new awareness of the cost factor for the Antipodes at least.
I certainly find the current "full" discussion on the details
about the electronic OED interesting and a nice change, even
though I had already found out a lot of it at the Waterloo
conference, and I wish that I'd kept my query about the Rutgers
database general now too.  So I am glad that Willard has passed
on what others have had to say, and I think perhaps we should try
out for a bit making all discussion general.  We could make
use of a subject line to indicate the topic of a posting, and
whether it were part of an on-going discussion, thus enabling
those who need to clear their readers quickly to ignore
discussions which were not of interest to them.
 
Abigail Young
Research Associate,
Records of Early English Drama
University of Toronto
young at utorepas
=========================================================================
Date:         1 December 1987, 11:14:50 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: William J. McCarthy <MCCARTHY@CUA>
Subject:  discussions (15 lines)
 
I would like to express my approval of the contents of
the recent message about discussions on HUMANIST.
Although I have no interest in scanning the
turgid "flames" of the digitally deranged, it seems much
more than unlikely that HUMANISTs will inundate one another
with drivel; and, I am content to attempt to follow the
threads of the discussions on my own. Certainly it >is< easy
enough to dispatch into oblivion (I have set up a macro to
just that purpose) any piece of mail in which one has no
interest.
 
As it now stands, HUMANIST seems a touch too formal.
=========================================================================
Date:         1 December 1987, 14:24:09 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: "Michael Sperberg-McQueen"   <U18189@UICVM>
Subject:  CD ROMs, micro- and mainframe computing with large corpora
 
A late contribution to the discussion provoked by Abigail Young
about CDs as a medium of data distribution.  [60 lines or so.]
 
I think Dr. Young hit the nail on the head with the question "Are
there people out there waiting with bated breath for the new OED
on CD ROM?"  Because certainly if we're not excited about the OED
as a group, then we're not as a group going to be very excited about
anything.
 
Yes, I AM waiting with bated breath for an electronic OED, but I was
far more excited to learn it would be available on tape than I was
to hear about the CD ROM version.  I like and use my PC, and I hope
someday to be able to work with massive textual corpora on it, but
at least for the moment I think magnetic tape is a far better medium
for distribution.  For one thing, I don't have a CD ROM drive, and
I don't know anyone who does, except for Bob Kraft and a classicist
here who has a Ibycus micro on loan but does her Greek word processing
on our mainframe.  Tape drives, on the other hand, will be available
at any school in the country.  For another, tape drives allow me to
change the data -- add to it, enhance it, reduce its size -- and
make another copy.  CD ROM doesn't.  For that reason alone, I'll
wait for WORM before buying a new drive for my PC.  And finally,
mainframes seem to me by and large better at dealing with large
quantities of data.  That is changing, to be sure.  But I can edit
the Nibelungenlied in storage on the mainframe, and extract every
occurrence of the name 'Sivrit' in a couple of seconds.  My PC
with its 640 Kbytes can only hold a fourth or so of the Nibelungenlied
in RAM at a time.  To be sure, a micro-Ibycus could also find all
the occurrences of 'Sivrit' in a few seconds -- if the Nibelungenlied
were on a CD ROM.  But it's not, and there aren't enough Germanic
philologists in the country to make it economically feasible to
make one.
 
Nor do I WANT a frozen, unalterable text of the Nibelungenlied.  I
want to be able to index it, to add parsing information or scansions
to the file so I can search on them, and so on.  Not to mention the
need to correct typos in the transcription and add manuscript
variants.  For all this, we need erasable media, not CD ROMs.
 
Magnetic tapes do have the drawback, for some users, that they are
typically readable only on mainframes.  (There are PC-based 9-track
tape drives, but they aren't real common.)  And many humanists don't
like working on mainframes.  Even for those users, however, the
local academic computing center should be, and almost always is,
in a position to read the tape and help the user download the data
to a microcomputer.  No, it's not always easy.  And no, it's not
always fast.  A megabyte an hour or so.  But the chances are good
the academic computer center knows how to do it, and does it regularly.
All the ones I've ever known as a user or staff member do.
 
There may be centers that do NOT provide this kind of service,
although I have never seen one and never heard of one.  But if
they exist, those centers should be DRIVEN to provide support
for humanities computing, support for microcomputing, and support
for data exchange between mainframes and micros.  If they are not
providing these services, they are not doing their job.
 
Given the kind of support computer centers ought to be providing
for humanist users, and given the kind of flexible text humanistic
work seems to need, I think CD ROMs look much less promising as
a means of data distribution than WORM disks and magnetic tape,
and in some cases floppy disks.
 
All of which is just one user's opinion.
 
-Michael Sperberg-McQueen
 University of Illinois at Chicago (U18189 at UICVM)
=========================================================================
Date:         1 December 1987, 14:26:50 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: Marshall Gilliland <GILLILAND@SASK>
Subject:  Subject line comments (25 lines)
 
Oh, my, the HUMANIST subject lines may get long.  Now, in addition to
the honest-to-goodness subject, and to the number of lines in the
message, Abigail Young suggests
 
"we could make use of a subject line to indicate the topic of a posting, and
whether it were part of an on-going discussion, thus enabling those who need to
clear their readers quickly to ignore discussions which were not of interest to
them."
 
Maybe we serious, dull writers can use such an augmented subject line as a
place to pun?  But woe is me for, alas, my mailer does not accept long subject
lines.  Can it be that some people will have to read the beginning of the
message to learn what we want to ignore?  Will we be like this lady:
 
        Lizzi Borden took an axe
        And plunged it deep into the VAX
        Don't you envy people who
        Do all the things you want to do?
 
(Thanks to Jerry Whitnell in California for the ditty.)
 
Maybe we'll relax a bit as our marking gets frantic and we hear the carols of
the season.
 
Marshall Gilliland       U of Saskatchewan
=========================================================================
Date:         1 December 1987, 15:55:42 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: "James H. Coombs" <JAZBO@BROWNVM>
Subject:      Concordance for Mac
 
Does anyone know of concordance programs for the Mac?  Thanks.  --Jim
=========================================================================
Date:         1 December 1987, 15:58:31 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: "Michael Sperberg-McQueen"   <U18189@UICVM>
Subject:  Text encoding guidelines -- progress report (225 lines)
 
A followup on the current status of the ACH effort to formulate
guidelines for text encoding practices.
 
   ******************************************************************
   * NOTE: The following encoding conventions have been used to     *
   *       represent French accents throughout this message:        *
   *                                                                *
   *   To Represent Accents  --  Pour la representation des accents *
   *    /       acute accent - accent aigu                          *
   *    `       grave accent - accent grave                         *
   *                                                                *
   * The accent codes are typed    Les codes pour les accents se    *
   * AFTER the letter, and are     trouvent APRES la lettre qu'ils  *
   * used with both upper and      modifient, et s'utilisent avec   *
   * lower case letters.           les majuscules aussi bien que    *
   *                               les minuscules.                  *
   ******************************************************************
 
 
On November 12 and 13, 1987, 31 representatives of professional
societies, universities, and text archives met to consider the
possibility of developing a set of guidelines for the encoding of texts
for literary, linguistic, and historical research. The meeting was
called by the Association for Computers and the Humanities and funded
by the National Endowment for the Humanities.  The list of participants
is appended to this document.
 
The participants heartily endorsed the idea of developing encoding
guidelines. In order to guide such development, they agreed on
the following principles:
 
 
       The Preparation of                 Re/daction des directives
     Text Encoding Guidelines             pour le codage des textes
 
                         Pougheepsie, New York
                            13 November 1987
 
1.  The guidelines are intended   1.  Le but des directives est de cre/er
    to provide a standard format      un format standard pour l'e/change
    for data interchange in           des donne/es utilise/es pour la
    humanities research.              recherche dans les humanite/s.
 
2.  The guidelines are also       2.  Les directives sugge/reront
    intended to suggest principles    e/galement des principes pour
    for the encoding of texts         l'enregistrement des textes
    in the same format.               destine/s a` utiliser ce format.
 
3.  The directives should         3.  Les directives devraient
 
  a.  define a recommended          a.  de/finir une syntaxe recommande/e
      syntax for the format             pour exprimer le format,
 
  b.  define a metalanguage         b.  de/finir un me/ta-langage
      for the description               de/crivant les syste`mes de
      of text-encoding schemes,         codage des textes,
 
  c.  describe the new format       c.  de/crire par le moyen de ce
      and representative                me/talangage, aussi bien qu'en
      existing schemes both in          prose, le nouveau syste`me de
      that metalanguage and             codage aussi bien qu'un choix
      in prose.                         repre/sentatif de syste`mes
                                        de/ja` en vigueur.
 
4.  The guidelines should         4.  Les directives devraient proposer
    propose sets of coding            des syste`mes de codage utilisables
    conventions suited for            pour un large e/ventail
    various applications.             d'applications.
 
5.  The guidelines should         5.  Sera incluse dans les directives
    include a minimal set of          l'e/nonciation d'un syste`me de
    conventions for encoding          codage minimum, pour guider
    new texts in the format.          l'enregistrement de nouveaux textes
                                      conforme/ment au format propose/.
 
6.  The guidelines are to be      6.  Le travail d'e/laboration des
    drafted by committees on:         directives sera confie/ a` quatre
                                      comite/s centre/s sur les sujets
                                      suivants:
 
  a.  text documentation            a.  la documentation des textes,
 
  b.  text representation           b.  la repre/sentation des textes,
 
  c.  text interpretation           c.  l'analyse et l'interpre/tation
      and analysis                      des textes
 
  d.  metalanguage definition       d.  la de/finition du me/talangage et
      and description of                son utilisation pour de/crire le
      existing and proposed             nouveau syste`me aussi bien que
      schemes                           ceux qui existent de/ja`.
 
    co-ordinated by a steering        Ce travail sera coordonne/ par un
    committee of representatives      comite/ d'organisation ou`
    of the principal                  sie`geront des repre/sentants des
    sponsoring organizations.         principales associations qui
                                      soutiennent cet effort.
 
7.  Compatibility with existing   7.  Dans la mesure du possible, le
    standards will be maintained      nouveau syste`me sera compatible
    as far as possible.               avec les syste`mes de codage
                                      existants.
 
8.  A number of large text        8.  Des repre/sentants de plusieurs
    archives have agreed in           grandes archives de textes en form
    principle to support the          lisible par machine acceptent en
    guidelines in their function      principe d'utiliser les directives
    as an interchange format.         en tant que description des formats
    We encourage funding agencies     pour l'e/change de leurs donne/es.
    to support development of         Nous encourageons les organismes
    tools to facilitate this          qui fournissent des fonds pour la
    interchange.                      recherche de soutenir le
                                      de/veloppement de ce qui est
                                      ne/cessaire pour faciliter cela.
 
9.  Conversion of existing        9.  En convertissant des textes
    machine-readable texts to         lisibles par machine de/ja`
    the new format involves the       existants, on remplacera
    translation of their              automatiquement leur codage actuel
    conventions into the syntax       par ce qui est ne/cessaire pour les
    of the new format.  No            rendre conformes au format nouveau.
    requirements will be made for     Nul n'exigera l'ajout
    the addition of information       d'informations qui ne sont pas
    not already coded in the          de/ja` repre/sente/es dans ces
    texts.                            textes.
 
                                         (trad. P. A. Fortier)
 
                            ******************
 
The further organization and drafting of the guidelines will be
supervised by a steering committee selected by the three sponsoring
organizations:  ACH (the Association for Computers and the Humanities),
ACL (the Association for Computational Linguistics), and ALLC (the
Association for Literary and Linguistic Computing).  Drafts of the
guidelines will be submitted for comment to an editorial committee with
representatives of all participating organizations (in addition to the
sponsors, thus far:  the Modern Language Association, the Association
for Computing Machinery Special Interest Group for Information
Retrieval, and the Association of American Publishers; the following
groups have indicated interest informally but have not yet formally
pledged participation, in most cases pending a foraml vote: the
Linguistic Society of America, the Association for Documentary Editing,
the American Philological Association. The American Anthropological
Association, plus several organizations within Europe, are now being
asked to consider participation.
 
The interchange format defined by the guidelines is expected to be
compatible with the Standard Generalized Markup Language defined
by ISO 8859, if that proves compatible with the needs of research.  The
needs of specialized research interests will be addressed wherever it
proves possible to find interested groups or individuals to do the
necessary work and achieve the necessary consensus.  Formation of
specific working groups will be announced later; in the meantime, those
interested in working on specific problems are invited to contact
either Dr. C. M. Sperberg-McQueen, Computer Center, University of
Illinois at Chicago (M/C 135), P.O. Box 6998, Chicago IL 60680 (on
Bitnet: U18189 at UICVM), or Prof. Nancy Ide, Dept. of Computer
Science, Vassar College, Poughkeepsie NY 12601 (on Bitnet:  IDE at
VASSAR).
 
                                                 - N.I., C.M.S-McQ
 
------------------------------------------------------------------------------
 
                    List of Participants
 
  NOTE: Association names are given following the names of their
        representatives at this meeting.
 
   Helen Aguera, National Endowment for the Humanities
   Robert A. Amsler, Bell Communications Research
   David T. Barnard, Department of Computing and Information Science,
      Queen's University, Ontario
   Lou Burnard, Oxford Text Archive
   Roy Byrd, IBM Research
   Nicoletta Calzolari, Istituto di linguistica computazionale, Pisa
   David Chestnutt  (Assoc. for Documentary Editing, American Historical
      Assoc.), Department of History, University of South Carolina
   Yaacov Choueka (Academy of the Hebrew Language), Department of
      Mathematics and Computer Science, Bar-Ilan University
   Jacques Dendien, Institut National de la Langue Francaise
   Paul A. Fortier, Department of Romance Languages, University of
      Manitoba
   Thomas Hickey, OCLC Online Computer Library Center
   Susan Hockey  (Association for Literary and Linguistic Computing),
      Oxford University Computing Service
   Nancy M. Ide (Association for Computers and the Humanities),
      Department of Computer Science, Vassar College
   Stig Johansson, International Computer Archive of Modern English,
      University of Oslo
   Randall Jones  (Modern Language Association), Humanities Research
      Computing Center, Brigham Young University
   Robert Kraft, Center for the Computer Analysis of Texts, University of
      Pennsylvania
   Ian Lancashire, Center for Computing in the Humanities, University of
      Toronto
   D. Terence Langendoen (Linguistic Society of America), Graduate
      Center, City University of New York
   Charles (Jack) Meyers, National Endowment for the Humanities
   Junichi Nakamura, Department of Electrical Engineering, Kyoto
      University
   Wilhelm Ott, Universitaet Tuebingen
   Eugenio Picchi, Istituto di linguistica computazionale, Pisa
   Carol Risher (American Association of Publishers), American
      Association of Publishers, Inc.
   Jane Rosenberg, National Endowment for the Humanities
   Jean Schumacher, Centre de traitement e/lectronique de textes,
      Universite/ catholique de Louvain a` Louvain-la-neuve
   J. Penny Small (American Philological Association), U.S. Center for
      the Lexicon Iconographicum Mythologiae Classicae, Rutgers
      University
   C.M. Sperberg-McQueen, Computer Center, University of Illinois at
      Chicago
   Paul Tombeur, Centre de traitement e/lectronique de textes,
      Universite/ catholique de Louvain a` Louvain-la-neuve, Belgium
   Frank Tompa, New Oxford English Dictionary Project, University of
      Waterloo
   Donald E. Walker (Association for Computational Linguistics), Bell
      Communications Research
   Antonio Zampolli, Istituto di linguistica computazionale, Pisa, Italy
 
[end of message]
=========================================================================
Date:         1 December 1987, 16:22:58 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: Dr Abigail Ann Young  <YOUNG at UTOREPAS>
Re: CD-ROMs & other media; on-going [body of message 26 ll inclusive]
 
(Was that too long, Marshall?)
 
Does anyone have any information on WORM drives?  A
non-HUMANIST colleague told me he had heard about them
at an IBM-sponsored conference and that they were the
best thing since sliced bread, basically.  I've also heard
that a disk for an IBM WORM drive would be capable of being
written to only once, which would certainly make such a disk
only slightly more useful than a CD-ROM, and considerably less
useful than a magnetic tape.
 
I am always suspicious of new devices which will revolutionize
my life and save me time, trouble, etc.  I think it is because
I tended to believe the Popular Science/Mechanics picture of
the future when I was a child.  But a WORM drive & disk capable
of multiple disk writes as well as reads sounds very, very
appealing.
 
Abigail Ann Young
Research Associate,
Records of Early English Drama
University of Toronto
young at utorepas
=========================================================================
Date:         2 December 1987, 00:17:29 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor:  "James H. Coombs" <JAZBO@BROWNVM>
Subject:      Sonar; Mac; concordance vs. retrieval (54 lines)
 
I asked about concordance programs for the Mac.  Someone sent me the review of
Sonar and a couple of others have mentioned it.  The review does not say
anything about concording texts with Sonar, however.  I have never used one of
these retrieval programs.  I have used WatCon and have written a concordance
program for the IBM PC (for multiple versions of the same text).  IS Sonar
appropriate for generating concordances? concordances that will be printed and
distributed?  Does it properly handle lines of poetry, for instance? and give
columns of lines with locations?  I assume that WordCruncher from BYU can do
such, since it is a descendent of a concording program (unless there is an
equivocation on "concord" here, and please let us all know if there is).
 
I am in the process of designing a retrieval engine and browser for the
American Heritage Dictionary.  When I think of retrieval programs, I think of
inverted indices, hash tables, and the like.  "Use this information to go find
X and then let's Y it."  That, to me, is a typical retrieval action, and the
access is typically random.
 
Concording, however, at least in the traditional sense, is sequential and
exhaustive.  One COULD use a retrieval application to concord a text, but it
would be very inefficient and would probably require additional programming
anyway.  One would have to have a means to call the retrieval engine
iteratively for every word in the text as well as the means to format and write
the results someplace.
 
Are WordCruncher and Sonar dual applications?  In order to index, one has to
perform much of the same processing as is required for concording (process
sequentially and exhaustively, split words out of lines, stop words,
lemmatize?, cross reference (See also xxx)?).  Well, some of the routines are
the same anyway, at least to the extent that the developer of one type of
application would have a start on developing the other.  It begins to sound
like integrated systems a la Symphony vs. 1-2-3.  Does the system that offers
both really do both jobs well?  Or, first I guess, are there systems that
offer both?
 
--Jim
 
Dr. James H. Coombs
Software Engineer, Research
Institute for Research in Information and Scholarship (IRIS)
Brown University
jazbo@brownvm.bitnet
=========================================================================
Date:         2 December 1987, 12:12:08 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor:  Bob Kraft  <KRAFT@PENNDRLN>
Subject:  CD-ROM & WORM  [88 lines]
 
The recent observations by Abigail Young and Michael
Sperberg-McQueen on CD-ROM and WORM technologies call for
some comment from the "pro" (and experienced) side. I hope
to keep them brief, just to pinpoint some of the issues.
Michael's comments seemed to me to miss many crucial points,
and did not reflect the attitudes or situation of numerous
people with whom I am in regular contact.
 
1. The difference between CD-ROM and WORM for this discussion
is negligible, as Abigail suspected. Right now, WORM drives
are more expensive and less tested publicly, but cheaper to
produce a single disk. But once you have that single WORM disk,
which currently costs about $65, there is no price advantage
to making multiple copies (50 copies would cost $3250). With
the CD-ROM, it might cost $3000 to master but each additional
copy would cost very little (perhaps $ 7 each for 100). Thus
it would be much cheaper to make 100 copies of a CD-ROM than
100 copies of a WORM disk at present. And the CD-ROM holds
more than twice as much as the WORM disks with which we are
working. So WORM is fine for limited production or in-house
purposes, CD-ROM is better for larger distribution, etc.
Neither can be changed once they are mastered, although
WORM can be mastered in stages, while CD-ROM is a once for all
mastering process.
 
2. Are people anxiously waiting for data distributed on CD-ROM?
In my experience, YES. We have many advance orders for the CCAT
CD-ROM, and more inquiries. Ted Brunner can report on the TLG
experience. What sorts of people are asking? Obviously, IBYCUS SC
owners (about 130 machines) who are set up to use CD-ROM as part of
the package; Librarians, who need massive amounts of data in a
bibliographically controlled context (static is good, in this
setting!); the mass of individual scholars/students who are not
in a tape-oriented environment such as Michael describes (his
experience is not at all typical, even at the ideal level, of
the majority of people with whom I am in contact -- people in
small colleges, seminaries, or operating individually, with
no access to a real mainframe or effective consultation).
 
3. What is attractive to these inquirers? Several fairly obvious
things. (1) Amount of material available -- e.g. all of Greek
literature through the 6th century on the TLG disk! (2) Price
of the material (on tape, the TLG data cost over $4000; on
CD-ROM, it is about 10% of that) (3) Convenience of storage,
access, etc. -- I would rather download from a CD-ROM than
from a tape drive, any day. It is the old roll vs codex issue
once again (microfilm vs microfiche, etc.). (4) Quality control --
what is on the CD-ROM may have errors, but at least they can be
identified and controlled (and corrected in a later release);
I don't have to wonder whether my dynamic file has become
corrupted (as happens more than I want to admit). (5) Speed of
access to large bodies of data -- even if the programs are not
yet in place and it will take 20 times as long to search a
large CD-ROM file on the IBM than on IBYCUS, it is at least
possible to do the search (or to search multiple files, in
various configurations), which is extremely difficult in any
other manner short of a dedicated mini.
 
I am rambling and apologize. Much more needs to be said, but I
need to finish preparing ID tables for the CCAT CD-ROM if it is
to be mastered by the end of the year! Perhaps it would not be
feasible economically to put the Nibelungenlied on its own
CD-ROM, but to have it as a small part of a CD-ROM with all sorts
of other texts is what we are talking about! That is not only
feasible, but it seems to me highly desirable, IBYCUS or not.
And I can still download what I want to edit, or manipulate, etc.
I lose none of that capability. But I gain by having the original
fixed at hand for comparison, etc.
 
Libraries will rapidly be CD-ROM centered, and that is as it ought
to be. Hopefully computer centers will not be bypassed by this
exciting and useful development!
 
Bob Kraft
=========================================================================
Date:         2 December 1987, 14:41:59 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: Jim Cerny <J_CERNY@UNHH>
Subject:  Summary of responses on KJV Bible for Macintosh (incl.
          [152 lines]
 
Thanks to everyone who responded to my recent inquiry about the
availability of the King James Version of the Bible for the
Apple Macintosh.  I've tried to acknowledge or quote from
all the responses (as of 01-Dec) in the summary that follows.
 
======================================================================
John J. Hughes (XB.J24@STANFORD) had the most definitive answer,
reflecting no doubt the research for his book "Bits Bytes and Bible
Studies".  Robin C. Cover (ZRCC1001@SMUVM1) referenced this book
and Marshall Gilliland (GILLILAND@SASK) and Tim Seid (ST401742@BROWNVM)
mentioned sources that Hughes lists.
        Hughes wrote:
----------------------------------------------------------------------
There are several companies that sell King James Versions of the
Bible for Macintoshes. Here are their names, addresses, and so
forth. The first program is reviewed in detail in chapter 3 of
BITS, BYTES, & BIBLICAL STUDIES (Zondervan, 1987).
 
  THE WORD Processor
  Bible Research Systems
  2013 Wells Branch Parkway, Suite 304
  Austin, TX 78728
  (512) 251-7541
  $199.95
  Requires 512K; includes menu-driven concording program
  CP/M version available for Kaypros.
 
  MacBible
  Encycloware
  715 Washington St.
  Ayden, NC 28513
  (919) 746-3589
  $169
  128K; text files that may be read by MacWrite
       and Microsoft Word.
 
  MacScripture
  Medina Software
  P.O. Box 1917
  Longwood, FL 32750-1917
  (305) 281-1557
  $119.95
  128K; text files designed to be used with MacWrite.
 
 
 
=======================================================================
Marshall Gilliland  (GILLILAND@SASK) pointed to a very unexpected
source, i.e., one of the DECUS (DEC Users Society) tapes.  We are an
active VAX/VMS site and we did indeed have the tape.  It is on VAX
System SIG Symposium tape VAX86D (from the Fall 86 DECUS meeting in
San Francisco).  In uncompressed form the files take about 9000
VAX disk blocks (roughly 5 MB).  It is all in upper case.  Presumably
could be downloaded to a PC, but don't think I will attempt that!
        Gilliland wrote, in part:
-----------------------------------------------------------------------
If you have VAX equipment there and get DECUS
tapes then ask one of your systems people for the copy of the ascii text of
the KJ Bible that was on a DECUS tape not too long ago (I think in 1987).
 
Marshall Gilliland
English Dept.
U. of Saskatchewan
=======================================================================
Tim Seid (ST401742@BROWNVM) pointed me to CCAT (Center for Computer
Analysis of Texts) and Bob Kraft (KRAFT@PENNDRLN) from CCAT also
responded.  Bob Kraft also sent me several files about CCAT and its
services and I've tacked CCAT's info-file at the end of this summary
... "old hands" may be aware of CCAT's electronic newsletter, ONLINE
NOTES, but it was new to me and their info-file tells how to
subscribe.
        Bob Kraft wrote:
-----------------------------------------------------------------------
I have not seen my MAC person (Jay Treat) since your inquiry
about the KJV arrived, but I am reasonably sure that it is
already available from CCAT for the MAC, or will be very soon.
We have been distributing the KJV and RSV (along with the Greek
and Hebrew texts of the Bible) to IBM types for over a year now,
and all these materials will be on our soon to be released
CD-ROM. Most of it has been ported to the MAC as well.
I will send you an order form and other information separately.
Bob Kraft
=======================================================================
Ronald de Sousa (DESOUS@UTORONTO) mentioned the possibility of using
DIALOG services.
        de Sousa wrote:
-----------------------------------------------------------------------
You'll probably get some satisfactory answers, but in the meantime I
wonder whether you you that the cheap after-hours service of DIALOG
Info Services, called "Knowledge Index", has the King James full text
on line, and can be searched using the search options of that service.
I seem to recall that for $200 you'd get about 8 hourse of search time
-- quite enough for a limited project. Of course, the same is
available on DIALOG itself, with somewhat more sophisticated options..
=======================================================================
Roger Hare (R.J.HARE@EDINBURGH.AC.UK) responded from JANET that
Catspaw Inc. has the King James Bible.  They specialize in supporting
PC-based implementations of SNOBOL and related products, as I recall.
        Roger Hare wrote:
-----------------------------------------------------------------------
Catspaw do a version of the King james Bible for 50 dollars. My
catalogue dosen't say what machine it's for, but if you have access to a
maniframe perhaps you could get it onto your Macintosh via file
transfers?
 
their address is:
 
Catspaw Inc.
PO Box 1123
Salida
Colarado
81201
USA.
 
Roger Hare.
=======================================================================
Finally, Chuck Bush (ECHUCK@BYUADMIN) mentioned that they have
the King James Bible at the Humanities Research Center at Brigham
Young University and I presume he could supply more details.
        Chuck Bush wrote:
-----------------------------------------------------------------------
At BYU we do have the text of the King James Bible in machine readable
form.  The original data is on a mainframe, but we have downloaded it
to PC disks etc. for those who have ordered it in other forms.  I have
a copy of it on a Macintosh Bernoulli cartridge from which it would be
relatively easy to copy it to some other Macintosh medium--even floppies.
 
However, this is just the TEXT.  There isn't any software to access it
conveniently.  Sonar is the only text retrieval software I know of for
the Macintosh and I don't think it would be very satisfactory.  For one
thing, it couldn't give you chapter and verse references.
 
Chuck Bush   <ECHUCK@BYUADMIN>
Humanities Research Center
Brigham Young University
=======================================================================
Interested HUMANISTs should also consult the guide to external services
of the Center for Computer Analysis of Texts (CCAT), Univ. of
Pennsylvania, available from Jack Abercrombie (JACKA@PENNDRLS.BITNET)
=========================================================================
Date:         2 December 1987, 20:29:00 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Vox populi (46 lines)
 
Dear Colleagues:
 
My thanks to the several people who offered their views on the
conversational style of HUMANIST. The majority of speakers have clearly
voiced a preference for a somewhat more open manner of conversational
exchange than has been the rule so far. For what it's worth, I welcome
this change without reservation, since HUMANIST is by design ruled
chiefly by its members rather than by its editor.
 
Until an absolutely foolproof method of screening out junk mail is
found, I will continue to have all submissions to HUMANIST sent first to
me and will forward the ones of human origin to the membership. This
means very little work for a very large improvement in the quality of
the environment.
 
One of the interesting (but, I guess, not surpising) characteristics of
HUMANIST is the number of members who never say anything -- yet continue
to put up with the large volume of mail. I imply no criticism
whatsoever, for there are many noble and practical reasons for remaining
silent. Nevertheless, I suspect that some members may occasionally have
something to say but wonder if what they have to say is worthy. In
general the advice I follow is, say it and see what happens. One
possibility for the diffident is to send in a contribution with a note
attached asking my advice, for whatever it's worth.
 
Please let me know if anything about HUMANIST bothers you or otherwise
seems to need improvement. The ListServ software (written and maintained
on a voluntary basis by a remarkable person who lives in Paris) we
cannot fundamentally alter. It has certain characteristics that some may
consider flaws but that seem to me merely features to be exploited in
the best possible way. Locally HUMANIST is supported by my Centre and by
the good will of our Computing Services, i.e., by two busy people.
There's not much that can be done given these resources, but some changes
can be made without much effort -- like the screening of junk mail.
 
In short, lead on!
 
Yours, W.M.
_________________________________________________________________________
Dr. Willard McCarty / Centre for Computing in the Humanities
University of Toronto / 14th floor, Robarts Library / 130 St. George St.
Toronto, Canada M5S 1A5 / (416) 978-4238 / mccarty@utorepas.bitnet
=========================================================================
Date:         2 December 1987, 22:53:10 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: Sebastian Rahtz <CMI011@IBM.SOUTHAMPTON.AC.UK>
 
Heres one for the eager punters; a colleague of mine wants
to study the New Kingdom El-Amarna literature (Egypt, mid 14th C BC).
Anybody care to say if someone has already typed in such stuff
onto the computer? apologies if its obvious...
 
sebastian rahtz  computer science university southampton uk
=========================================================================
Date:         2 December 1987, 23:23:34 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: "Michael Sperberg-McQueen"   <U18189@UICVM>
Subject:  CD ROMs, mainframes
 
Many thanks to Bob Kraft for his cogent remarks about CD ROMs.  I seem
to have given a rather scrooge-like impression in my most recent posting
about CD ROMs and PCs, which does not reflect my positive opinion of
PCs.
 
Yes, CD ROMs are ideal for certain kinds of data distribution,
especially for (a) stable data and (b) large numbers of recipients.
For humanistic research applications with those characteristics,
they are also obviously good ideas.  WORM disks, or better yet
eraseable mass storage devices, would make many of the same
advantages available for non-static data and small numbers of
recipients.  But neither description fits all research fields.
 
I am less convinced that institutional support for faculty use of
mainframes and microcomputers is untypical in North America.  This
is an empirical question, and I would like to put it up for discussion:
what is the situation at the sites represented on HUMANIST with
regard to:
 
    (a) support for humanities computing formally provided by
the institution via centralized or specialized facilities,
    (b) faculty-student computing on mainframes or minis
    (c) institutional support for microcomputing
    (d) institutional support for mainframe-micro data transfer.
 
It is possible that Bob Kraft is right and my experience is
untypical.  But it seems also possible that Penn and CCAT get so
much business from people without mainframe access because those
who do have local computer centers get their help locally.  It
would be useful, I think, for all of us if we could get some idea
of the facts in this area.  The ACH Special Interest Group for
Humanities Computing Resources (the sponsor of HUMANIST) did
plan once to distribute a questionnaire to gather this information
but the final questionnaire design seems to have been delayed,
so let's caucus informally now.
 
Michael Sperberg-McQueen, University of Illinois at Chicago
=========================================================================
Date:         2 December 1987, 23:34:33 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Electronic OED -- for the blind?
 
Contributor: Norman Zacour   <ZACOUR at UTOREPAS>
 
I have a blind, computerized friend, a professor of English and a
professional writer, who got very excited when I passed on to him the
recent messages from HUMANISTS about plans for making the OED available
in electronic form.  He had visions - no joke intended - of consulting
it through his speech synthesizer on his PC.  His enthusiasm was
dampened by the planned use of colour to display certain types of
information.  Does anyone happen to know if the OED has any plans for
handicapped users?  I suppose that there are still architects who design
monumental buildings without ramps for wheelchairs, but perhaps...
=========================================================================
Date:         3 December 1987, 09:55:14 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
Comments:     Contributor:  Robin C. Cover <ZRCC1001@SMUVM1>
From:         MCCARTY@UTOREPAS
Subject:      Al-Amarna Correspondence (in MRT format) [96 lines]
 
 
Sebastian Rahtz asked whether the El-Amarna letters exist in digitized
format somewhere.  I doubt whether many HUMANISTS are interested in
west-semitized Akkadian texts, but this query (and its answer) provides
an opportunity to tell a sad and familiar tale...and perhaps an
opportunity for someone to come forward with better news than I have to
tell...
 
The good news (for our assyriologist friend in the UK) is that
Knudtzon's edition of the El-Amarna letters is in machine-readable
format.  I have used the massive printed "concordances" (two tomes, each
about 7 inches thick).  These printouts originated at UCLA, so the best
bet is to contact Giorgio Buccellati at the department of Near Eastern
Studies, who might make tapes or diskettes available.   UCLA has a
growing corpus of MRT material for the ancient Near East, and in time it
will be available publically as part of Buccellati's hypermedia project
for Mesopotamia (Computer-Aided Analysis of Mesopotamian Materials);
some materials are currently available from Undena, and Buccellati
passed out sample diskettes of digitized Eblaite texts at AOS.
 
The sad tale I mentioned earlier is as follows: Du Cerf (a Paris
publisher) recently released a superb volume in its series Litteratures
anciennes du Proche-Orient on the El-Amarna letters.  Its
author/translator is William Moran of Harvard University, recognized as
a (probably THE) leading Amarna scholar, who has been putting together
this polished volume over the past 30-odd years.  His translations are
based upon extensive museum collations of the tablets, together with
restorations that can be made only by someone so familiar with the
"idioms" of international diplomacy (in the 14th century B.C.E) as
Professor Moran is.  So, the MRT
edition we *REALLY* want is Moran's, not the 1915 edition of Knudtzon.
But you won't find it published on diskette with this Du Cerf volume
(which does not even have transliterated original text).  According to
the publishers, it would not be cost-effective to publish the original
text on paper, and as for a MRT edition of the text....well...
 
Shortsightedness like this has to stop, but who is responsible for
"stopping it?"  A single individual (as in this case, Moran) probably
can do very little to force publishers to change their ways.  But how
about collective bargaining....we publish such scholarly materials ONLY
with publishers that are sensitive about the future of scholarship, and
about the precious treasure we have in ancient literature.  This means
placing premium value on original texts in machine-readable form -- only
thus are they truly useful and accessible to modern scholarship -- and
making these texts available in the public domain.  I suspect that this
problem is more acute for orientalists than for classicists and other
humanities-literary subspecialty areas; we have special orthographies
and printing problems which are expensive and demanding.  But my
suggestion is that we must encourage and demand higher standards of
cooperation from publishers such that valuable (priceless!) human
efforts are not lost on a Macintosh diskette after it passes from the
departmental secretary or word-processing pool to the publisher.  Does
anyone else share this point of view?  Am I too idealistic?
 
While I am in a lament mode, I might as well refer to another problem
that needs attention: the problem of coding standards.  There are
several efforts underway internationally to "encode" ancient Near
Eastern texts in transliteration
(Toronto - RIM; UCLA; Rome; Helsinki; etc), but to my knowledge there
are no agreed-upon standards.  In the case of purely alphabetic scripts,
the problem is frustrating but not fatal, since we can use
consistent-changes programs to standardize the data for archiving.  In
the case of syllabic (logographic; heiroglyphic) scripts -- Akkadian,
Sumerian, Hittite, Elamite, Egyptian -- the plethora of transliteration
schemes is more problematic.  No-one sends this kind of data with an
SGML prologue, so the best we can hope is that the encoding is
consistent and that we can unravel the format codes.  If anyone knows
about efforts to introduce standards for transliteration and
format-coding, would you kindly let me know?  I understand that the
committee for encoding standards (Nancy Ide; Michael Sperberg-McQueen)
recently funded by NEH will not initially address the needs of
orientalists.  If there are other orientalists "out there" on the
HUMANIST reader list -- should we organize ourselves?
 
Apologies to all if this is arcane, recondite or just downright boring.
I'd like to know if anyone out there shares some of my frustrations, or
sees solutions.
 
Professor Robin C. Cover
3909 Swiss Avenue
Dallas, TX  75204
(214) 296-1783
=========================================================================
Date:         3 December 1987, 09:58:21 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
Comments:     Contributor: Brendan O'Flaherty  <AYI017@IBM.SOUTHAMPTON.AC.UK>
From:         MCCARTY@UTOREPAS
Subject:      E-mail to Australia
 
Can anyone tell me if e-mail to the Antipodes (ie Australia) has a charge?
and if so who pays---the sender if outside Australia or the Receipient?
Thanks in advance.
=========================================================================
Date:         3 December 1987, 13:36:52 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      The Thesaurus Linguae Graecae (TLG) on CD-ROM
 
The following has been contributed by Theodore Brunner, Director
of the TLG Project, from a memo circulated to all TLG customers.
Anyone wishing to arrange for a license agreement should contact
Professor Brunner, Thesaurus Linguae Graecae, University of
California at Irvine, Irvine, CA 92717 U.S.A., telephone: (714)
856-7031, e-mail: TLG@UCIVMSA.bitnet. The license per CD-ROM ,
including a copy of the printed TLG Canon, is not expensive: ini-
tial registration fee (plus first year fee) is $200 to institu-
tions and $120 to individuals; annual fee $100 to institutions,
$60 to individuals; optional one-time payment for 5 years $500 to
institutions, $300 to individuals. (All prices are in US $.)
 
_________________________________________________________________
TLG CD-ROM CUSTOMERS:
 
We have been receiving numerous questions related to TLG CD ROM
dissemination plans and policies; here is miscellaneous informa-
tion on these subjects:
 
l. To date, the TLG has produced two CD ROMs, disk "A" and disk
"B". Disk "A" contains approximately 27 mlllion words of TLG
text, as well as an electronic version of the TLG Canon. Disk "B"
contains the same 27 million words of text, the TLG, and an Index
to the TLG texts on the CD ROM.
 
Disk "A" also contains miscellaneous non-TLG materials, including
some Latin, Coptic, and Hebrew texts, some epigraphical
materials, as well as portions of the Duke Data Bank of
Documentary Papyri.  The non-TLG materials were included on TLG
CD ROM "A" for one reason only: this disk was produced (as was CD
ROM "B") primarily for experimental purposes, i.e., to aid in the
development of software resources designed to enhance utilization
of the (relatively new) CD ROM data storage medium.
 
Neither disk "A" nor disk "B" reflects the High Sierra format
standard (established after both of these CD ROMs were produced.
 
2.  In short order, the TLG will release a new CD ROM, disk "C".
This disk will contain approximately 41.5 million words of TLG
text, an index to this text material, and the TLG Canon.
 
Individuals and institutions already holding license to "A" or
"B" disks are entitled to receive "C" disks free of charge. This
(as provided for in the license agreement governing use of TLG
ROMs) will be on an exchange basis, i.e., disks previously issued
by the TLG must be returned to the TLG prior to the issuance of a
"C" disk. TLG LICENSEES SHOULD NOT RETURN THEIR "A" OR "B" DISKS
UNTIL DISK "C" IS OFFICIALLY RELEASED. [Notice will appear on
HUMANIST when disk "C" is ready.]
 
3.  Questions have been raised about the absence of non-TLG
material on the "C" disk.  The TLG controls and licenses only its
own materials, and license agreements previously executed pertain
to the TLG materials on the disks only.  Current TLG CD ROM
licensees may, of course, continue to use their ("A" or "B")
 
disks throughout the course of their license period; they will
not be issued "C" disks, however, until they have returned their
earlier CD ROM versions to the TLG.
 
It is the case, however, that the Packard Humanities Institute
(PHI) will be releasing its own CD ROM in the very near future;
this disk will contain Latin, Coptic, Hebrew, and epigraphical
materials, as well as a significant portion of the Duke
papyrological data bank.  It can be assumed that individuals and
institutions desirous of these materials can make arrangements
with PHI to gain access to them on a PHI disk.  Further informa-
tion on this subject can be obtained by contacting
 
      John Gleason, Packard Humanities Institute, P.0. Box 1330
      Los Altos, CA 94022 U.S.A.
 
4.  We have received numerous requests for technical documenta-
tion related to the forthcoming TLG CD ROM "C".  The internal
organization of the text files and of the I.D. table files will
be identical to the organization of these files on TLG CD ROM
"A".  The file directory and author table will be reorganized to
reflect the High Sierra standard. More detailed documentation is
currently being prepared and should be ready for distribution in
the near future.
 
Theodore F. Brunner, Director
November 8, 1987
_________________________________________________________________
=========================================================================
Date:         3 December 1987, 15:00:56 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
Comments:     Contributor: "Michael Sperberg-McQueen"   <U18189@UICVM>
From:         MCCARTY@UTOREPAS
Subject:      Enlightening the publishers, encoding Semitic (65 lines)
 
Three cheers for Robin Cover's idea of group pressure to bring
publishers to their senses regarding the preservation and
distribution of machine-readable materials.  Some publishers, to
their credit, are already alert to the issues involved--or so say
people who should know.  But there are still an awful lot of
them out there who behave the way Renaissance printers did with
Carolingian manuscripts:  mark it up, print it, and throw it out.
Anything we can do to preserve the fruits of scholarly labors, we
should do.
 
It would also be useful to have a better developed system of
text archives in North America -- either a network of regional or
discipline-based archives, or one central archive that would
take anything (the way Oxford does).  The latter would be
appealing because fewer texts might fall through cracks in the
system, but specialized collections would remain important because
they can do more intensive work on their holdings, the way Penn's
CCAT does.  A central North American text archive, acting in
concert with the European archives, might also be in a position
to help exert the kind of group pressure on publishers that
Robin Cover suggests.
 
Making the publisher's texts usable, by documenting as far as
possible the usual systems of typesetting codes found in the
publishing industry, is one goal of the ACH/ACL/ALLC initiative
for text-encoding guidelines.  (That goal is not wholly explicit
in the final document I posted here a couple of days ago, but it
was discussed at length during the planning meeting at Vassar and
clearly is important to a lot of people.)
 
The consensus of the planners at Vassar was also that transliteration
practices, and conventions for the encoding of character sets, should at
least be documented as far as possible in the guidelines.  Many
participants were leery of making specific recommendations for the
representation of specific characters, since local hardware features
and requirements can vary so widely.  Nevertheless, the experts
present agreed that it would not be insuperably difficult to provide
adequate documentation for the encoding of scripts which, like
Semitic scripts, provide special challenges to most commonly
available hardware.
 
That means that the guidelines can and should contain full information
on practices for encoding texts of interest to Orientalists--if the
Orientalists will document their existing practices.  If they can
also agree on common recommendations for future work, that consensus
can and should also be documented.  The same goes for any and all
other specialized interests.  These guidelines will belong to the
humanities computing community as a whole, and I hope the community
will work together to make them as complete and useful as we can.
 
Again, I reiterate the invitation:  anyone interested in helping
formulate the guidelines, either in general or with respect to some
specific question (e.g. the encoding of Akkadian, or the encoding of
numismatic materials, or the encoding of manuscript variants, or the
prosodic transcription of oral texts, or the encoding of hypertext
materials, or ...), should please contact Nancy Ide or myself.  This
invitation will be periodically renewed, as details for the formal
arrangement of the drafting committees are set, but if you let us
know now, we will have a better idea of how much interest there is,
and what kinds of special problems are on people's minds.
 
Michael Sperberg-McQueen, University of Illinois at Chicago
 
P.S. The opinions here expressed are as always mine, not necessarily
those of my employer, or the ACH, or the guidelines steering committee.
=========================================================================
Date:         3 December 1987, 19:16:39 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
Comments:     Contributor: David Nash <nash@cogito.mit.edu>
From:         MCCARTY@UTOREPAS
Subject:      E-mail to Australia (24 lines)
 
E-mail involving ACSNet (Australia, through the international
gateways, or even domestically between sites I think) has a charge for
the Australian end (whether sender or receiver).  It was something
like 10c/message plus 2c/line about a year ago.  Apparently many
institutions do not (yet?) pass on the charge to individual users.
 
The official position could presumably be got from
postmaster@munnari.oz, i.e. <munnari!postmaster@uunet.uu.net>
 
David Nash
Center for Cognitive Science
20B-225 MIT
Cambridge MA 02139
=========================================================================
Date:         3 December 1987, 19:20:14 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
Comments:     Contributor:  Laine Ruus <USERDLDB@UBCMTSG.BITNET>
From:         MCCARTY@UTOREPAS
Subject:      Archives (75 lines)
 
In response to Prof. Cover's impassioned plea, I can
only say that it IS possible, with some concerted effort
to force publishers to change their ways.
 
The American Sociological Association has recently, as
of Sept 1, 1987, in fact, begun to require of all periodicals
published under their aegis, that any computer readable files,
(both data and software) BE CITED in the bibliography.
There is an effort under way now to convince other academic
publishers to follow suit.
 
There are a number of reasons for citation of computer files:
(a) computer-readable files are intellectual property in their
own right, quite as much as publications in
other media, eg on paper, film, audio-tape, canvas, etc.
(This has been recognized by that most conservative institution,
the American Library Association, since the late 1970s.)
The authors (properly called 'principal investigators'), producers,
publishers, editors, and translators, of computer-readable files
deserve for their labours the same acknowledgement and recognition
as do the authors, composers, etc of intellectual property
in more traditional media.
(b) the citation of source materials in the bibliographies of
publications acknowledges the source materials used in the
research process, thus enabling ones peers to follow the same
line of reasonsing, using the same source materials, to (hopefully)
come to the same conclusions, thus corroborating
our initial reasoning - ie the peer review process.
(c) once computer-readable files are cited in bibliographies, they
will get picked up in the citation indices, and thus eventually
come to the attention of tenure committees. Thus individual
'authors' of these things will in time receive their due
academic brownie-points.
 
But citing computer-readable files is not enough. There must
also be a mechanism for preserving them for posterity and
making them available to others for secondary analysis.
 
Researchers are reluctant to make 'their' files available
to others for fear that they will not receive their due
acknowledgement (- the polite reason). Mandatory citation of
computer files in publications should help reduce this fear.
 
Many researchers are not aware that there in fact exists a
network of local data archives/data libraries in
academic institutions throughout the United
States and Canada, as well as a well developed system of
national data archives in Europe, most recently in Hungary,
Israel, and the USSR. Granted, these data archives primarily
concentrate on 'social science' data files, primarily because
that is the field from which the initial impetus for
their creation came. However, this orientation is not cast
in stone. And most of these data archives/libraries could
with appropriate overtures, be convinced that there
are other user communities that also need their services.
The social scientists just happen to have been among the
earliest and most vociferous. The point being that there
is already an institutional framework, staffed by knowledgeable
and experienced people who with very little effort could
provide the network of text archives that humanists seem
to want - all they want is a little proding.
------------------------------------------------------------
Laine Ruus, University of British Columbia Data Library
userDLDB@ubcmtsg.bitnet
=========================================================================
Date:         3 December 1987, 19:22:30 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
Comments:     Contributor:   STEPHEN@VAX.OXFORD.AC.UK
From:         MCCARTY@UTOREPAS
Subject:      E-mail to Australia
 
There is a relay at ULCC (UK) called EAN which
links with ACSnet - the fact that you do not register before
submitting suggests it is 'free': you may be able to learn
further from mailing an enquiry to laision@uk.ac.ean-relay
 
EAN can also link you to other European sites as well - maybe
to addresses 'missing' from EARN
 
stephen@uk.ac.oxford.vax
=========================================================================
Date:         4 December 1987, 13:02:56 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributed by Bob Kraft <KRAFT@PENNDRLN>
Subject:  CD-ROMs
 
Just to supplement Ted Brunner's information on the TLG
CD-ROM, regarding the non-TLG materials such as were
included on TLG disk "A" -- the present plan is for the
Packard Humanities Institute (PHI) jointly with the
Center for Computer Analysis of Texts (CCAT) at Penn
to produce an "experimental" CD-ROM at the heart of
which will be various Latin texts (being prepared at PHI),
Greek Papyri (Duke) and Inscriptions (Cornell, Princeton
Institute for Advanced Study), and a variety of biblical
and related materials in various languages (Hebrew, Greek,
Latin, Coptic, Syriac, Aramaic, Armenian) as well as sample
files from various other sources and projects (e.g. Dante
Commentary project, Milton Latin project, Kierkegaard in Danish,
Arabic poetry, some word lists, etc.). I call this disk a
"Sampler," and it is scheduled to be ready for distribution
by the end of this month (December). Again, the aim is to
give scholars, software developers, etc., a body of
consistently formatted (more or less!) materials on which to work
in various directions and at little cost. There will be a
notice on HUMANIST when the PHI/CCAT joint CD-ROM "Sampler"
is ready for distribution!
 
Bob Kraft for CCAT
=========================================================================
Date:         4 December 1987, 13:10:19 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributed by "James H. Coombs" <JAZBO@BROWNVM>
Subject: Enlightening the publishers, encoding Semitic (65 lines)
 
Michael Sperberg-McQueen has suggested that we need a text archive in North
America.  Is that a generally felt need?  What could a text archive here offer
that Oxford does not offer?  Certainly, shipping would be faster and cheaper,
but is there something more substantial?  Or are there real hardships now?
Or, could our needs be addressed by some adjustments in the services that
Oxford provides---such that we might better discuss our needs with Oxford
instead of duplicating their efforts.
 
If we DO need an archive in North America, who should institute and manage it?
What is the proper sort of organization?  And what's in it for them?  Will it
be a costly burden?  Or are we willing to pay for materials in order support
such a facility?  Would it be commercial or non-profit?
 
--Jim
 
Dr. James H. Coombs
Software Engineer, Research
Institute for Research in Information and Scholarship (IRIS)
Brown University
jazbo@brownvm.bitnet
=========================================================================
Date:         4 December 1987, 13:17:33 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributed by James H. Coombs <JAZBO@BROWNVM>
Subject:      ACH text markup
 
Some thoughts on guidelines for text markup, in response to
Michael Sperberg-McQueen's note.
 
1) Markup must be descriptive.
 
2) Delimiters should be '<' and '>' in conformance with the default of the
   new SMGL standard.
 
3) Markup/tag attributes should be allowed, and attribute names should be
   descriptive.
 
4) There should be no attempt at establishing a "closed" tag set.  The current
   AAP SGML application allows for definition of new tags, but it does not
   support such definition in a practical way.  The consequence is that people
   will use "list items," for example, when they should be using "line of
   poetry."  Within these guidelines, it can only be healthy to provide a
   list of tags that people should choose from when tagging certain entities.
 
   The point of this is that we cannot predict what textual elements will be
   of significance for what researchers.  We have to allow for the discovery
   of textual elements that no one has categorized previously.  At the same
   time, there is no point in having 30 different tags for "line of poetry."
   The guidelines should make clear that DESCRIPTION is paramount and that
   the use of particular tags is secondary.
 
5) In so far as possible, there should be requirements for minimal tagging.
   It would be a mistake to fail to tag "verse paragraphs" and "book" in
   *Paradise Lost*, for example, and any version that does not provide such
   tags must be considered inadequate and, ultimately, rejected.
 
6) There can be no limit placed on "maximal" tagging.  If a researcher needs
   every word tagged, we must allow for this.  It is a trivial matter to
   ignore or strip out such tagging.  Researchers with such needs cannot,
   at least for now, reasonably expect that others will provide such
   exhaustive tagging.
 
   Putting (5) and (6) together, we have a principle of base-level tagging
   with as much additional information as the original researchers care to
   provide.  Where there are common needs that may not be shared by the
   original researcher, it may still be appropriate to require that those
   common needs be met.  For example, the original researcher may not need
   to know about verse paragraphs, but we should still require that they be
   appropriately tagged.
 
7) Referential markup should be used in place of "special" characters, such
   as accented characters.  If a particular configuration supports an acute
   accent, for example, in hardware, the researcher may take advantage of
   those facilities.  When checking the document into an archive or passing
   it on to others, however, the acute accent must be translated to
   "&aacute;" (or whatever the SGML standard specifies---don't have my copy
   at hand).
 
 
This is off the top of my head, but enough for now.  I have other ideas on
this stuff, but they can come out if discussion ensues.  I am interested in
the project, but I don't have the time or money to travel to meetings right
now.
 
I also get the feeling from the preliminary document that you posted that
people are re-inventing SGML.  We already have, in SGML, a metalanguage for
generating descriptive markup languages.  I don't think that we need Document
Type Definitions right now, but even they might turn out to be useful once
SGML is established and SGML-support tools become widespread.
 
I haven't provided any defense of descriptive markup or SGML here.  We discuss
the advantages of these systems in "Markup Systems and the Future of Scholarly
Text Processing," *Communications of the ACM*, November 1987--- written with
Allen H. Renear and Steven J. DeRose.
 
Interested in any and all comments!  --Jim
 
Dr. James H. Coombs
Software Engineer, Research
Institute for Research in Information and Scholarship (IRIS)
Brown University
jazbo@brownvm.bitnet
=========================================================================
Date:         4 December 1987, 16:03:15 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      "International Educational Computing"
 
          POSSIBLE COURSE 3551 - SUMMER, 1988 - R. G. RAGSDALE
 
The  European  Conference  on  Computers  in  Education is being held in
Lausanne, Switzerland, July 24-29, 1988.  When the World  Conference  on
Computers  in  Education was held there in 1981, a substantial number of
OISE students attended, some as a portion of a  course  offered  by  Bob
McLean.
 
I propose to offer a course, 3551 - International Educational Computing:
An Interaction of Values and  Technology,  which  would  take  place  in
Switzerland,  around  the  dates of the conference.  Permission to offer
the course formally depends on several factors, including the number  of
students  likely  to  attend.   Plans are incomplete at this time, but a
projection of the plans indicates the following  format,  assuming  that
all necessary arrangements for housing, classroom, etc., can be made.
 
 
       The  course  participants  will  meet  together July 18-22 to
    study previous research and theory  on  values  and  technology,
    methods  for  evaluating  the  effects  of  technology, and case
    studies in business and education of technology-value conflicts.
    The  daily  schedule  will  have  more formal sessions (lecture,
    seminar)  in  the  mornings  and  less  formal  sessions  (group
    discussions)  in  the  early  evening,  with afternoons free for
    individual study or other activities (scheduled class  time  for
    each  day  will  be  four  hours, probably two and a half in the
    morning, one and a half in the  evening).    During  this  week,
    participants  will select and prepare for the issue(s) they plan
    to study during the conference.
 
       At the conference, each participant will focus on one or more
    topics,  such as a particular age range, subject matter area, or
    type of computer application.  They will collect  material  from
    the  formal  sessions,  but  also  from informal interviews with
    others attending the conference, both presenters and  those  who
    are only attending.
 
       August  1  is  a  Swiss  national  holiday  (which all course
    participants should enjoy), so the remaining sessions will  take
    place August 2-5, following the same schedule as the first week.
    During this time the results of the previous  week's  activities
    will  be  presented  and  group  feedback  will obtained.  Final
    papers will be due in mid-September.
 
       Preliminary  arrangements  for  accommodation  and  classroom
    space  have  been  made  at  Aiglon College, an English boarding
    school in Chesieres, Switzerland, about one hour  from  Lausanne
    by train and bus.  Room rates include the "taxe de sejour" which
    gives access to the recreational facilities of Villars, such  as
    the swimming pool, ice skating, etc.
 
_________ ____
Estimated_Cost
 
Based  on  1987 prices, the airfare to Geneva is $927, room and board is
860SF (Swiss Francs) for 20 days, and  the  conference  registration  is
280SF (higher after January 31).  At current exchange rates, these items
total  almost  $2,000.    A  better  estimate   would   include   ground
transportation,  other  likely  expenses (chocolate, etc.), and possible
price increases.  It seems extremely unlikely  that  necessary  expenses
would exceed $2,500.
 
Anyone who is interested in participating in this course should indicate
this to me  in  writing  (including,  if  possible,  your  "estimate  of
certainty").
=========================================================================
Date:         6 December 1987, 11:02:37 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributed by Nancy Ide <IDE@VASSAR>
Subject: TEXT MARK-UP (73 lines)
 
 
I recently responded to Jim Coombs' remarks concerning the principles
developed at Poughkeepsie as a basis for the development of a standard
for encoding machine-readable texts.  He suggested that we make our discussion
"public," in the spirit of recent remarks on HUMANIST, and so I will briefly
describe what has been said and put forth my reply.
 
I inidicated to Jim that much of what he says is very much in the spirit
of the discussions at Poughkeepsie among the 31 participants.  This shou ld
be made clearer in the minutes of the meeting, which Lou Burnard has drawn
up and which will be available from him or me in a few days.  Especially,
we intend to make the standard extensible to accomodate the unforeseen needs
of individual projects.
 
I also indicated that the standard will *recommend* a minimum set of tags for
texts, which is stated in the principles under number 5, I believe. We had a
lively discussion on this topic (actually, all of the discussions we very
lively!) at the Poughkeepsie meeting, with some disagreement about specifying a
minumum.  This is why *recommend* is in emphasis.  The feeling at the meeting
was that we can *require* nothing, but we can do our best to "guide the
perplexed" and provide some idea of what it makes sense to encode regardless of
how the text is originally intended to be used.  I should point out here that
among participants in the Poughkeepsie meeting, there were two clear
perspectives on the whole issue of encoding texts: one saw most encoding as a
future endeavor, and the other was focused on texts already encoded. One's
opinion concerning whether most texts have been encoded already or have yet to
be encoded obviously affects opinion on the importance of specifying a minimum
set of tags for encoded texts.
 
Jim responded to me suggesting that we could refuse to accept texts that had
been encoded without the "minimum" tags we might expect.  He made all of the
excellent arguments for insisting that certain tags be included *anytime* a tex
is encoded.  But the problem here is that I am not sure who the "we" who is to
do this refusing actually is.  If someone does not provide the minimum tags but
has encoded the collected works of some obscure author I am interested in, will
I refuse to accept the text?  If I am an archive, should I refuse to take the
text--that is, is it better to have an inadequately tagged text or none at all?
Admittedly, in some cases it may be better to start from scratch and re-enter a
text, if the existing version is pitifully done. But most of the time it will be
easier to go in and mark whatever I need to mark in the existing version than to
re-enter the text entirely.
 
Similarly, we cannot expect archives to ensure that their texts contain a
minimum tag set.  This was a point of considerable concern to the keepers of
archives present at the meeting, and led to the final agreement that only the
tags that are present (whatever they may be) in a text that is distributed by an
archive will conform to the standard.  This requirement in itself will
necessitate the writing of programs to perform tranlsation to the new scheme,
another topic addressed at some length and for which there seems to be support.
However, note that the principles indicate that texts now contained in the
archive need not be converted retrospectively. Naturally, although this is not
required we hope that it will occur in many cases.
 
So, the guidelines that will be developed will recommend a minimum set of
tags---especially, for those things that are easily encoded when the source text
is at hand and which are also obviously of use in most types of analysis.
However, it does not appear to me that it is reasonable to require such tagging.
We can only hope that the recommendation is enough to inspire most researchers
to provide the minimum set of tags when they encode new texts.
 
Nancy M. Ide
ide@vassar.bitnet
=========================================================================
Date:         6 December 1987, 11:10:43 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributed by Nancy Ide <IDE@VASSAR>
Subject:  more on mark-up (34 lines)
 
In my earlier message I neglected to summarize my reply to Jim Coombs
concerning SGML.  We have every expectation that the standard we devise will
be an application of SGML, but until we know fully our needs it is not
prudent to commit ourselves to SGML.  We know, for instance, that while it
is possible to define multiple parallel hierarchies in SGML it is not
entirely straightforward, and such parallel hierarchies are likely to be
used extensively in encoding machine-readable texts intended for literary,
linguistic, and historical analysis.  We hope that in any event the standard
will be compatible with SGML, which, as Jim points out, is bound to become
widely accepted and used.
 
Also, Jim had some concern about our defining a meta-language, since SGML
(the abstract syntax) is in fact a meta-language for describing a mark-up
scheme.  The concrete syntax of SGML is one mark-up scheme described by
this abstract syntax.  However, our goal is to provide a meta-language in
which *all* existing mark-up schemes can be described (which may prove
to be impossible), and it seems to us that the abstract syntax of SGML is
inadequate for this task.  The abstract syntax of SGML was not intended
for this purpose, it should be noted.
 
Nancy M. Ide
ide@vassar.bitnet
=========================================================================
Date:         6 December 1987, 11:15:37 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributed by C. S. Hunter <CSHUNTER at UOGUELPH>
Subject: Use of electronic communictions (29 lines)
 
Willard notes the high percentage of "silent participants" on HUMANIST.  My
experience with computer conferencing systems makes his note not at all
surprising.  At the University of Guelph we have had our CoSy conferencing
system available free of charge to all faculty for some years now.  Only
about 40 % of the faculty actually took us up on the offer of a free account
on the system.  Of that 40 %, only 25 % (or less) actively use the system
more than once a week.  The ratio of active to passive participants
on the system is something like 1 : 9.  The same is roughly true on the
student system, where only about 10 % of the registered users are actual active
participants.   We are now studying the phenomenon to determine what factors
contribute to the individual use or non-use of computer-mediated communication
among academics.
 
.
 
C. Stuart Hunter,
University of Guelph
cshunter@uoguelph
=========================================================================
Date:         6 December 1987, 11:41:45 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      E-mail to Australia
 
Contributed by Emmanuel Tov <HUET@HUJIPRMB.bitnet>
 
 
IN REPLY TO THE QUESTION OF BRENDAN O'FLAHERTY (3 DEC) I CAN TELL YOU THAT
MAIL FROM SYDNEY (MACQUARIE UNIV.) TO ISRAEL AND EUROPE AND THE U.S. IS FREE
AS WELL AS REVERSE MAIL.
 
EMANUEL TOV
=========================================================================
Date:         6 December 1987, 16:58:56 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributed by "James H. Coombs" <JAZBO@BROWNVM>
Subject: Text encoding
 
[In reply to Nancy Ide's points about SGML and related matters. The
inset paragraphs quote from her messages. -- ed.]
 
         We have every expectation that the standard we devise will be
         an application of SGML, but until we know fully our needs it
         is not prudent to commit ourselves to SGML.
 
A minor philosophical point, I guess: I don't think that we CAN know our needs
fully.  We need standards that accommodate needs that cannot be predicted
today.  The practical consequence of this observation, which I'm sure Nancy
would agree with, is that one should seek a "productive" system instead of
a system that satisfies everything on a list, and one should not spend a lot
of time developing the list.
 
         We know, for instance, that while it is possible to define
         multiple parallel hierarchies in SGML it is not entirely
         straightforward, and such parallel hierarchies are likely to be
         used extensively in encoding machine-readable texts intended
         for literary, linguistic, and historical analysis.
 
What are "multiple parallel hierarchies"?  I can guess, but I want to be sure
that I understand the problem.  In a most documents, we have, for example,
pragmatic and syntactic hierarchies.  One has no difficulty marking up
documents for both at the same time (although one does not normally mark
up the latter descriptively).  Pragmatically, we have things like
 
  [         [         [         ] [         ] ] ]
   CHAPTER   SECTION   PARAGRAPH   PARAGRAPH
 
Syntactically, we might have
 
  [  [  ] [   [  ] ] ]
   S  NP   VP  NP
 
So far as I know, there are no difficulties in marking up both types of
hierarchies.  One could argue that we really have a single hierarchy here,
but, conceptually at least, we have two different domains: pragmatics and
syntax.  Well, this distinction is bound to be controversial, to say the
least!  This is probably the wrong list for a discussion about syntax vs.
pragmatics, etc.  I can try other examples, but I'm still guessing.  And I'm
still wondering what the difficulty is in encoding them under SGML.
 
         However, our goal is to provide a meta-language in which *all*
         existing mark-up schemes can be described (which may prove to
         be impossible), and it seems to us that the abstract syntax of
         SGML is inadequate for this task.
 
What is the practical value of a metalanguage that generates all markup
languages?  I would think that it would be so abstract as to be of no
value.
 
I suspect that this is part of the goal of salvaging work that has been
inadequately coded.  I believe that we will be better off if we worry
less about the past and plan more for the future.  I suppose that it's
true that publishers have typesetting tapes in their basements, and that
we could use those tapes.  I think that we have to accept that those
tapes are of little value until someone converts the coding to
descriptive markup.  I have the typesetting tape for the American
Heritage Dictionary (sorry, can't distribute it); no one wasted time
trying to figure out how to use that tape as it is now.  I know of
several projects that are based on that tape, and all required
conversions.  Ideally, the tape would have been converted once and for
all (and it apparently has been now).
 
Whether it's a dictionary or a literary text, we can expect that
inadequate coding will cause considerable work for anyone attempting to
use the database.  A metalanguage that includes procedural markup as
well as descriptive markup will not help in such a case, because one
still has to map procedural markup onto descriptive markup in order to
be able to work with meaningful entities (definition, paragraph, etc.).
Since procedural markup tends to be performed somewhat arbitrarily and
does not normally provide a one-to-one relationship between entity and
markup, there is no metalanguage that will help a researcher perform the
necessary conversions.
 
What we really need is a sensible and dynamic standard.  I don't think
that anyone would argue that that standard should be anything other than
descriptively based.  Since we are going to have to convert texts to
descriptive markup in order to use them anyway, why not just develop the
standard and convert as necessary.  Trying to save the past is just
going to retard development.
 
I haven't mentioned SGML so far.  Is there a problem with SGML?  I have
heard complaints, and we addressed them in our article.  No one expects
individual scholars to master the full syntax and to generate Document
Type Definitions (DTD).  What we want is accurate and consistent
descriptive markup.  In our experience at Brown, people have no
difficulties mastering the principles of descriptive markup.  We can
leave the development of DTDs to experts.
 
--Jim
 
 
Dr. James H. Coombs
Software Engineer, Research
Institute for Research in Information and Scholarship (IRIS)
Brown University
jazbo@brownvm.bitnet
=========================================================================
Date:         6 December 1987, 17:12:16 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributed by "James H. Coombs" <JAZBO@BROWNVM>
Subject:      Markup: on requirements
 
My thanks to Nancy Ide for moving the discussion out to HUMANIST.  Things
have fallen a little out of sequence, but the ideas are more important
than the sequence anyway.  I have also heard from Michael Sperberg-McQueen,
and I hope that he will post his very informative note as well.  If this
discussion becomes aggravating for the majority of HUMANISTs and there is
enough interest, then perhaps we can form a separate mailing list.
 
So, here is my (unedited) reply to the issue of requirements.
 
         While we may not be able to require that people conform to a
         standard fully, we can refuse to accept inadequate texts.
         There is an atmosophere of poverty now such that we are
         anxious to have whatever we can get our hands on.  At the
         extreme, even now most of us would reject a text that is all
         in upper case and contains errors---it turns out to be easier
         to do it oneself.  If we consider what things will be like or
         could be like in a few years though, I think it's appropriate
         to say that there are certain minimal standards (or one must
         comply with with a standard).  First, we don't accept just
         anything for other scholarly documents.  Second, we will have
         more alternatives for sources.  Third, we want high quality
         sources so that people won't have to keep reworking or
         entirely redoing.  If I can't count on a text from a particular
         archive to meet my needs, what is my motivation for bothering
         with that archive; and what is the motivation for the
         archive's existence?  I certainly would not want to see it
         supported by public funds.
 
         I don't think that this places an inordinate burden on
         individual researchers.  For the most part, I'm sure that it's
         considerably less burdensome than ensuring that one's
         bibliography, for example, accords with the MLA style sheet
         (and what bibliography unambiguously does?).
 
         --Jim
 
I should elaborate briefly.  First, I have/had a tape of Milton's
*Paradise Lost*; it was so bad that I would prefer to start from
scratch.  Second, I think that we have a right to expect archives to set
and maintain certain standards.  Perhaps they don't want to accept that
responsibility right now.  If not, then I think that we should be
planning to develop and support a good archive.  Does such an archive
need several programmers for text validation and maintenance?  Then they
should have the support to hire them.  Let's centralize the expense as
much as possible.  Currently, we have no idea who is entering what and
how they are doing it.  Even if we could get people to go to the archive,
the current approach means that many people are going to have to
massage texts into useful formats, and every project will have to ensure
that the text is accurate.  It's as if we all had to revise our copies
of *Paradise Lost* and then go proof read them before we could use them.
Finally, I have texts that I have entered, marked up, and proof read, but
I'm reluctant to check them into an archive that is inconsistent at best.
Whatever professional credit I might get for the contribution---well, let's
say that the effort is somewhat discredited by the state of the archive.
It's like publishing a book with XYZ press instead of ABC.  I would be happy
to send it off to someone who provides full services and validates text,
and I would be happy to make any necessary corrections.  To reverse the
roles, I am reluctant to acquire a text from an archive that makes no
guarantees.  After all, in the process of keyboarding a text, I get to read
it, and the time goes quickly.  It's the proofreading that is burdensome, and
I still have to proofread.  (Or do I get to say that I used X's text, and X
is going to accept the responsibility for errors.)
 
--Jim
 
Dr. James H. Coombs
Software Engineer, Research
Institute for Research in Information and Scholarship (IRIS)
Brown University
jazbo@brownvm.bitnet
=========================================================================
Date:         6 December 1987, 17:24:17 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      ACL Applied Natural Language Conference (833 lines)
 
 
The following is republished from IRLIST, the Information Retrieval
List. -- ed.]
 
--------------------------------------------------------------------------
The printed version of the following program and registration information
will be mailed to ACL members early in December.  Others are encouraged
to use the attached form or write for a booklet to the following address:
Dr. D.E. Walker (ACL), 445 South Street - MRE 2A379, Morristown, NJ 07960,
USA, or to walker@flash.bellcore.com, specifying "ACL Applied" on the
subject line.
 
                             ASSOCIATION
                                 FOR
                      COMPUTATIONAL LINGUISTICS
 
       SECOND CONFERENCE ON APPLIED NATURAL LANGUAGE PROCESSING
 
                         9 - 12 February 1988
 
          Austin Marriott at the Capitol, Austin, Texas, USA
Tutorials: Joe C. Thompson Conference Center, University of Texas at Austin
 
                           ADVANCE PROGRAM
 
                              Features:
               Six introductory and advanced tutorials
            Three days of papers on the  state-of-the-art
                   Distinguished luncheon speakers
                     A panel of industry leaders
                     Exhibits and demonstrations
 
 
REGISTRATION : 7:30am - 3:00pm, Tuesday, 9 February,
  Joe C. Thompson Conference Center, University of Texas at Austin, 26th
  and Red River.
                7:00pm - 9:00PM, Tuesday, 9 February
                8:00am - 5:00pm, Wednesday, 10 February
                8:00am - 5:00pm, Thursday, 11 February
                8:00am - 12:00n, Friday, 12 February
  Austin Marriott at the Capitol, 701 East 11th Street
 
 
EXHIBITS :  10:00am - 6:00pm, Wednesday, 10 February
            10:00am - 6:00pm, Thursday, 11 February
             9:00am - 12:00n, Friday, 12 February
  Austin Marriott at the Capitol
 
 
TUTORIALS: TUESDAY, FEBRUARY 9, 1988
  Joe C. Thompson Conference Center, University of Texas at Austin, 26th
  and Red River.
 
8:30 12:30 INTRODUCTION TO NATURAL LANGUAGE PROCESSING
          James Allen, University of Rochester
 
8:30 12:30 MACHINE-READABLE DICTIONARIES: A COMPUTATIONAL LINGUISTICS
          PERSPECTIVE
          Bran Boguraev, Cambridge University, and
          Beth Levin, Northwestern University
 
8:30 12:30 SPOKEN LANGUAGE SYSTEMS: PAST, PRESENT, AND FUTURE
          Salim Roucos, BBN Laboratories, Inc.
 
1:30 5:30 THE TECHNOLOGY OF NATURAL LANGUAGE INTERFACES
          Carole Hafner, Northeastern University
 
1:30 5:30 THE ROLE OF LOGIC IN REPRESENTING MEANING AND KNOWLEDGE
          Bob Moore, SRI International
 
1:30 5:30 MACHINE TRANSLATION
          Sergei Nirenburg, Carnegie Mellon University
 
 
RECEPTION: 7:00pm - 9:00pm, Tuesday, 9 February
  Austin Marriott at the Capitol, 701 East 11th Street
 
 
                           GENERAL SESSIONS
        WEDNESDAY, FEBRUARY 10, 1988
 
9:00 9:15       OPENING REMARKS AND ANNOUNCEMENTS
                Norman Sondheimer, General Chair (USC/Information Sciences
                        Institute)
                Bruce Ballard, Program Chair (AT&T Bell Laboratories)
                Jonathan Slocum, Local Arrangements Chair (MCC)
                Donald E. Walker, ACL Secretary-Treasurer (Bell Communications
                        Research)
 
        SESSION 1: SYSTEMS
 
9:15 9:40       The Multimedia Articulation of Answers in a Natural Language
                Query System
                Susan E. Brennan (Hewlett Packard)
 
9:40 10:05      A News Story Categorization System
                Philip J. Hayes, Laura E. Knecht and Monica J. Cellio
                (Carnegie Group)
 
10:05 10:30     An Architecture for Anaphora Resolution
                Elaine Rich and Susann Luper-Foy (MCC)
 
        SESSION 2: GENERATION
 
11:00 11:25     The SEMSYN Generation System: Ingredients, Applications,
                Prospects
                Dietmar Roesner (Universitaet Stuttgart)
 
11:25 11:50     Two Simple Prediction Algorithms to Facilitate Text Production
                Lois Boggess (Mississippi State University)
 
11:50 12:15     From Water to Wine: Generating Natural Language Text from
                Today's Applications Programs
                David D. McDonald (Brattle Research Corporation) and
                Marie M. Meteer (Bolt, Beranek and Newman)
 
12:15 2:00      LUNCHEON
                Guest Speaker: Grant Dove
                Chairman and CEO of MCC.  Prior to joining MCC in July l987,
                Mr. Dove had been with Texas Instruments for 28 years,
                having served as Executive Vice President since l982.
 
        SESSION 3: SYNTAX AND SEMANTICS
 
2:00 2:25       Improved Portability and Parsing Through Interactive
                Acquisition of Semantic Information
                Francois-Michel Lang and Lynettte Hirschman (Unisys)
 
2:25 2:50       Handling Scope Ambiguities in English
                Sven Hurum (University of Alberta)
 
2:50 3:15       Responding to Semantically Ill-Formed Input
                Ralph Grishman and Ping Peng (New York University)
                  and
                Evaluation of a Parallel Chart Parser
                Ralph Grishman and Mahesh Chitrao (New York University)
 
        SESSION 4: MORPHOLOGY AND THE LEXICON
 
3:45 4:10       Triphone Analysis: A Combined Method for the Correction of
                Orthographical and Typographical Errors
                Koenraad DeSmedt (University of Nijmegen) and
                Brigette van Berkel (TNO Institute of Applied Computer
                  Science)
 
4:10 4:35       Creating and Querying Hierarchical Lexical Databases
                Mary S. Neff, Roy J. Byrd, and Omneya A. Rizk
                (IBM Watson Research Center)
 
4:35 5:00       Cn yur cmputr raed ths?
                Linda G. Means (General Motors)
 
5:00 5:25       Building a Large Thesaurus for Information Retrieval
                Edward A. Fox, J. Terry Nutter (Virginia Tech), Thomas Ahlswede,
                Martha Evens (Illinois Institute of Technology), and
                Judith Markowitz (Navistar International)
 
6:30 ****      RECEPTION
               Microelectronics and Computer Technology Corporation (MCC)
 
 
        THURSDAY, FEBRUARY 11, 1988
 
        SESSION 5: SYSTEMS
 
8:30 8:55       Application-Specific Issues in NLI Development for
                a Diagnostic Expert System
                Karen L. Ryan, Rebecca Root and Duane Olawsky (Honeywell)
 
8:55 9:20       The MULTIVOC Text-to-Speech System
                Olivier Emorine and Pierre Martin (Cap Sogeti Innovation)
 
9:20 9:45       Structure from Anarchy: Meta Level Representation of
                Expert System Predicates for Natural Language Interfaces
                Galina Datskovsky Moerdler (Columbia University)
 
        SESSION 6: TEXT PROCESSING
 
10:15 10:40     Integrating Top-Down and Bottom-Up Strategies in a Text
                Processing System
                Lisa F. Rau and Paul S. Jacobs (General Electric)
 
10:40 11:05     A Stochastic Parts Program and Noun Phrase Parser for
                Unrestricted Text
                Kenneth W. Church (AT&T Bell Laboratories)
 
11:05 11:30     A Tool for Investigating the Synonymy Relation in a Sense
                Disambiguated Thesaurus
                Martin S. Chodorow, Yael Ravin (IBM Watson Research Center)
                and Howard E. Sachar (IBM Data Systems Division)
 
11:30 11:55     Dictionary Text Entries as a Source of Knowledge
                for Syntactic and Other Disambiguations
                Karen Jensen and Jean-Louis Binot (IBM Watson Research Center)
 
12:00 1:45      LUNCHEON
                Guest Speaker: Donald E. Walker
                Manager of Artificial Intelligence and Information Science
                Research at Bell Communications Research, and
                Secretary-Treasurer of ACL and IJCAII..
 
        SESSION 7: MACHINE TRANSLATION
 
1:45 2:10       EUROTRA: Practical Experience with a Multilingual Machine
                Translation System under Development
                Giovanni B. Varile and Peter Lau (Commission of the
                European Communities)
 
2:10 2:35       Valency and MT: Recent Developments in the METAL System
                Rudi Gebruers (Katholieke Universiteit Leuven)
 
3:00 5:00       PANEL: Natural Language Interfaces: Present and Future
                Moderator: Norman Sondheimer (USC/Information Sciences
                        Institute)
                Panelists: Robert J. Bobrow (BBN Laboratories),
                                Developer of RUS
                           Jerrold Ginsparg (Natural Language Inc.),
                                Developer of DataTalker
                           Larry Harris (Artificial Intelligence Corporation),
                                Developer of Intellect
                           Gary G. Hendrix (Symantec), Developer of Q&A
                           Steve Klein (Singular Solutions Engineering)
                                Co-Developer of Lotus HOW
 
5:00 6:00       RECEPTION
                Austin Marriott at the Capitol
 
 
        FRIDAY, FEBRUARY 12, 1988
 
        SESSION 8: SYSTEMS
 
8:30 8:55       Automatically Generating Natural Language Reports
                in an Office Environment
                Jugal Kalita and Sunil Shende (University of Pennsylvania)
 
8:55 9:20       Luke: An Experiment in the Early Integration of Natural
                Language Processing
                David A. Wroblewski and Elaine A. Rich (MCC)
 
9:20 9:45       The Experience of Developing a Large-Scale Natural
                Language Text Processing System: CRITIQUE
                Stephen D. Richardson and Lisa C. Braden-Harder
                (IBM Watson Research Center)
 
        SESSION 9: MORPHOLOGY AND THE LEXICON
 
10:15 10:40     Computational Techniques for Improved Name Search
                Beatrice T. Oshika (Sparta), Bruce Evans (TRW),
                Janet Tom (Systems Development Corporation), and Filip Machi
                (UC Berkeley)
 
10:40 11:05     The TICC: Parsing Interesting Text
                David Allport (University of Sussex)
 
11:05 11:30     Finding Clauses in Unrestricted Text by Stochastic and
                Finitary Methods
                Eva Ejerhed (University of Umea)
 
11:30 11:55     Morphological Processing in the Nabu System
                Jonathan Slocum (MCC)
 
        SESSION 10: SYNTAX AND SEMANTICS
 
1:30 1:55       Localizing Expression of Ambiguity
                John Bear and Jerry R. Hobbs (SRI International)
 
1:55 2:20       Combinatorial Disambiguation
                Paula S. Newman (IBM Los Angeles Scientific Center)
 
2:20 2:45       Canonical Representation in NLP System Design:
                A Critical Evaluation
                Kent Wittenburg and Jim Barnett (MCC)
 
 
 
 
REGISTRATION INFORMATION AND DIRECTIONS
 
PREREGISTRATION MUST BE RECEIVED BY 25 JANUARY; after that date, please
wait to register at the Conference itself.  Complete the attached
``Application for Registration'' and send it with a check payable to
Association for Computational Linguistics or ACL to Donald E. Walker
(ACL), Bell Communications Research, 445 South Street MRE 2A379,
Morristown, NJ 07960, USA; (201) 829-4312; walker@flash.bellcore.com;
ucbvax!bellcore!walker.  If a registration is cancelled before 25
January, the registration fee, less  $15 for administrative costs, will
be returned.  Full conference registrants will also receive lunch on
the 10th and 11th.   Registration includes one copy of the Proceedings,
available at the Conference.  Copies of the Proceedings at $20 for
members ($30 for nonmembers) may be ordered on the registration form or
by mail prepaid from Walker.
 
TUTORIALS : Attendance is limited.  Preregistration is encouraged
to ensure a place and the availability of syllabus materials.
 
RECEPTIONS : The Microelectronics and Computer Technology Corporation
(MCC) will host a reception for the conference at its site on
Wednesday evening.  To aid in planning we ask that you complete the
RSVP on the registration form.  In addition there will be receptions
at the conference hotel on Tuesday evening and Thursday afternoon.
 
EXHIBITS AND DEMONSTRATIONS : Facilities for exhibits and system
demonstrations will be available.  Persons wishing to arrange an
exhibit or present a demonstration should contact Kent Wittenburg,
MCC, 3500 W. Balcones Center Drive, Austin, TX 78759; (512)338-3626;
wittenburg@mcc.com as soon as possible.
 
HOTEL RESERVATIONS : Reservations at the Austin Marriott at the
Capitol MUST be made using the Hotel Reservation Form included with
this flyer.  Reservations subject to guest room availability for
reservations received after 25 January 1988.  Please mail to:
        Austin Marriott at the Capitol
        Attn: Reservation Office
        701 East 11th Street
        Austin, Texas 78701
        (512) 478-1111
 
AIR TRANSPORTATION : American Airlines offers conferees a special 35%
off full coach fare, 30% off full Y fares for passengers originating in
Canada, or 5% off any published roundtrip airfare applicable to and
from Austin.  Call toll free 1-800-433-1790 and give the conference's
STAR number S81816.  If you normally use the service of a travel agent,
please have them make your reservations through this number.
 
DIRECTIONS : There is one public exit from Robert Mueller Airport in
Austin; at the traffic light, turn right (onto Manor Rd.) and drive to
Airport Blvd.  (approx. 1/4 - 1/2 mile).  Turn right on Airport Blvd.,
and drive to highway I-35 (approx. 1-2 miles).  Turn left (south) onto
I-35, heading toward town.  Get off at the 11th-12th St. (Capitol)
exit, and drive an extra block on the access road, to 11th St.  The
Marriott is on the SW corner of that intersection (across 11th St., on
the right).  A parking garage is attached.
 
The Marriott at the Capitol operates a free shuttle to and from the
airport.  Cab fare would be approx. $6.
 
The Joe C. Thompson Conference Center parking lot is on the SW corner
of Red River and 26th Street; the entrance is on Red River, and a guard
will point out the center (adjacent, to the west).  Directions to JCT
from Marriott parking garage: Turn right (S) on I-35 frontage road,
turn right (W) on 10th St., turn right (N) on Red River, and drive
[almost] to 26th.
 
 
 
 
 APPLICATION FOR REGISTRATION
 
Association for Computational Linguistics, Second Conference on
Applied Natural Language Processing, 9 - 12 February 1988, Austin, Texas
 
 
NAME  _________________________________________________________________
      Last                             First                        Middle
AFFILIATION (Short form for badge ID)
___________________________________________________________
 
ADDRESS _______________________________________________________________
 
_______________________________________________________________________
 
_______________________________________________________________________
 
_______________________________________________________________________
 
TELEPHONE  ____________________________________________________________
 
COMPUTER NET ADDRESS  _________________________________________________
 
REGISTRATION INFORMATION  (circle fee)
NOTE: Only those whose dues are paid for 1988 can register as members.
 
                        ACL     NON-    FULL-TIME
                        MEMBER* MEMBER* STUDENT*
 
by 25 JANUARY           $170    $205    $85
at the Conference       $220    $255    $110
*Member and Non-Member fees include Wednesday and Thursday luncheons;
Students can purchase luncheon tickets at a reduced rate.
 
LUNCHEON TICKETS FOR STUDENTS:  $10  each; Wednesday _____;
Thursday ________; amount enclosed  $ ______
 
LUNCHEON TICKETS FOR GUESTS:  $15  each; Wednesday _____;
Thursday ________; amount enclosed  $ ______
 
SPECIAL MEALS: VEGETARIAN ______  KOSHER ______
 
EXTRA PROCEEDINGS:  $20  members;  $30  non-members; amount enclosed  $ ______
 
TUTORIAL INFORMATION  (circle fee and check at most two
tutorials)
 
FEE PER TUTORIAL        ACL     NON-    FULL-TIME
                        MEMBER  MEMBER* STUDENT
 
by 25 January           $75     $110    $50
at the Conference       $100    $135    $65
*Non-member tutorial fee includes ACL membership for 1988;
do not pay non-member fee for BOTH registration and tutorials.
 
Morning Tutorials:
 select ONE: INTRODUCTION: Allen  LEXICONS: Boguraev &  SPEECH: Roucos
                                              Levin
Afternoon Tutorials:
 select ONE: INTERFACES: Hafner   LOGIC: Moore          TRANSLATION: Nirenburg
 
TOTAL PAYMENT MUST BE INCLUDED :   $ ____________
 
(Registration, Luncheons, Extra Proceedings, Tutorials)
 
 
Make checks payable to  ASSOCIATION FOR COMPUTATIONAL LINGUISTICS  or
ACL.  Credit cards cannot be honored.
 
RSVP for MCC Reception: Please check if you plan to attend the MCC
reception on Wednesday evening, February 10th. _________
 
Send Application for Registration WITH PAYMENT before 25 January to
the address below; AFTER 25 January, wait to register at Conference:
 
        Donald E. Walker (ACL)
        Bell Communications Research
        445 South Street, MRE 2A379
        Morristown, NJ 07960, USA
        (201)829-4312
        walker@flash.bellcore.com
        ucbvax!bellcore!walker
 
 
 
 
 APPLICATION FOR HOTEL REGISTRATION
 
Reservations subject to guest room availability for reservations
received after 25 January 1988.  In the event of unanticipated demand,
rooms will be assigned on a first-come, first-served basis.  Please
send in your reservation request as early as possible.
 
 
NAME  _________________________________________________________________
      Last                             First                        Middle
AFFILIATION
 ___________________________________________________________
 
ADDRESS _______________________________________________________________
 
_______________________________________________________________________
 
_______________________________________________________________________
 
_______________________________________________________________________
 
TELEPHONE  ____________________________________________________________
 
Room Requirements
 
  Single  $64 ________
 
  Double  $74 ________
 
Date and time of arrival _________________________________________
 
Date and time of departure _______________________________________
 
Complete if arrival after 6PM
 
__________________________________________________________________
Credit Card Name                Number          Expiration Date
 
 
 
Send  Application for Hotel Reservation to:
        Austin Marriott at the Capitol
        Attn: Reservation Office
        701 East 11th Street
        Austin, Texas 78701
        (512) 478-1111
 
 
 
 
              ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
       SECOND CONFERENCE ON APPLIED NATURAL LANGUAGE PROCESSING
                              TUTORIALS
                           9 February 1988
   Joe C. Thompson Conference Center, University of Texas at Austin
 
                    Morning 8:30 A.M. - 12:30 P.M.
 
 
8:30 12:30 INTRODUCTION TO NATURAL LANGUAGE PROCESSING
          James Allen, University of Rochester
 
ABSTRACT
 
This tutorial will cover the basic concepts underlying the construction
of natural language processing systems.  These include basic parsing
techniques, semantic interpretation and the representation of sentence
meaning, as well as knowledge representation and techniques for
understanding natural language in context.  In particular, the topics
to be addressed in detail will include augmented transition networks
(ATNs), augmented context-free grammars, the representation of lexical
meaning, especially looking at case-grammar based representations, and
the interpretation of pronouns and ellipsis.  In addition, there will
be an overview of knowledge representation, including semantic
networks, frame-based systems, and logic, and the use of general world
knowledge in language understanding, including scripts and plans.
 
Given the large range of issues and techniques, an emphasis will be
placed on those aspects relevant to existing practical natural
language systems, such as interfaces to database systems.  The
remaining issues will be more quickly surveyed to give the attendee
an idea of what techniques will become important in the next
generation of natural language systems.  The lecture notes will
include an extensive bibliography of work in each area.
 
INTENDED AUDIENCE
 
This tutorial is aimed at people who are interested in learning the
fundamental techniques and ideas relevant to natural language
processing.  It will be useful to managers who want an overview of
the field, to programmers starting research and development in the
natural language area, and to researchers in related disciplines such
as linguistics who want a survey of the computational approaches to
language.
 
BIOGRAPHICAL SKETCH
 
Dr. James Allen is an Associate Professor and Chairman of the
Computer Science Department at the University of Rochester.  He is
editor of the journal Computational Linguistics and author of the
book Natural Language Understanding, published in 1987.  In 1984, he
received a five-year Presidential Young Investigator award for his
research in Artificial Intelligence.
 
 
 
 
8:30 12:30 MACHINE-READABLE DICTIONARIES: A COMPUTATIONAL LINGUISTICS
          PERSPECTIVE
          Branimir Boguraev, Cambridge University, and
          Beth Levin, Northwestern University
 
 
 
ABSTRACT
 
The lexical information contained explicitly and implicitly in
machine-readable dictionaries (MRDs) can support a wide range of
activities in computational linguistics, both of theoretical interest
and of practical importance.  This tutorial falls into two parts.
The first part will focus on some characteristics of raw lexical data
in electronic sources, which make MRDs particularly relevant to
natural language processing applications.  The second part will
discuss how theoretical linguistic research into the lexicon can
enhance the contribution of MRDs to applied computational
linguistics.
 
The first half will discuss issues concerning the placement of
rich lexical resources on-line; raise questions related to the
suitability, and ultimately the utility, of MRDs for automatic
natural  language processing;  outline a  methodology aimed at
extracting maximally usable subsets of the dictionary with minimal
introduction of errors; and present ways in which specific use can be
made of the lexical data for the construction of practical language
processing systems with substantial coverage.
 
The second half of the tutorial will review current theoretical
linguistic research on the lexicon, emphasizing proposals concerning
the nature of lexical representation and lexical organization.  This
overview will provide the context for an examination of how the
results of this research can be brought to bear on the problem of
extracting syntactic and semantic information encoded in dictionary
entries, but not overtly signaled to the dictionary user.
 
INTENDED AUDIENCE
 
This tutorial presupposes some familiarity with  work in both
computational and theoretical linguistics.  It is aimed at
researchers in natural language processing and theoretical linguists
who want to take advantage of the resources available in MRDs for
both applied and theoretical purposes.  The issues of providing
substantial lexical coverage and system transportability are
addressed, thus making this tutorial of particular relevance to those
concerned with the automatic acquisition, on a large scale and in a
flexible format, of phonological, syntactic, and semantic information
for nlp systems.
 
BIOGRAPHICAL SKETCHES
 
Dr. Branimir Boguraev is an SERC (UK Science & Engineering Research
Council) Advanced Research Fellow at the University of Cambridge.  He
has been with the Computer Laboratory since 1975, and completed a
doctoral thesis in natural language processing there in 1979.
Recently he has been involved in the development of computational
tools for natural language processing, funded by grants awarded by
the UK Alvey Programme in Information Technology.
 
Dr. Beth Levin is an Assistant Professor in the Department of
Linguistics, Northwestern University, Evanston, IL.  She was a System
Development Foundation Research Fellow at the MIT Center for Cognitive
Science from 1983-1987 where she assumed major responsibility for
directing the MIT Lexicon Project.  She received her Ph.D. in
Electrical Engineering and Computer Science from MIT in June 1983.
 
 
 
 
8:30 12:30 SPOKEN LANGUAGE SYSTEMS: PAST, PRESENT, AND FUTURE
          Salim Roucos, BBN Laboratories, Inc.
 
ABSTRACT:
 
This tutorial will present the issues in developing spoken language
systems for natural speech communication between a person and a
machine.  In particular, the performance of complex tasks using large
vocabularies and unrestricted sentence structures will be examined.
The first Advanced Research Projects Agency (ARPA) Speech Understanding
Research project during the seventies will be reviewed, and then the
current state-of-the-art in continuous speech recognition and natural
language processing will be described.  Finally, the types of spoken
language systems' capabilities expected to be developed during the next
two to three years will be presented.
 
The technical issues that will be covered include acoustic-phonetic
modeling, syntax, semantics, plan recognition and discourse, and the
issues for integrating these knowledge sources for speech understanding.
In addition, computational requirements for real-time understanding,
and performance evaluation methodology will be described.  Some of the
human factors of speech understanding in the context of performing
interactive tasks using an integrated interface will also be
discussed.
 
INTENDED AUDIENCE:
 
This tutorial is aimed at technical managers, product developers, and
technical staff interested in learning about spoken language systems
and their potential applications.  No expertise in either speech or
natural language will be assumed in introducing the technical details
in the tutorial.
 
BIOGRAPHICAL SKETCH:
 
Dr. Salim Roucos has worked for seven years at BBN Laboratories in
speech processing such as continuous speech recognition, speaker
recognition, and speech compression.  More recently, he has been the
principal investigator on integrating speech recognition and natural
language understanding for developing a spoken language system. His
areas of interest are statistical pattern recognition and language
modeling.  Dr. Roucos is chairman of the Digital Signal Processing
committee of the IEEE ASSP society.
 
 
 
                   Afternoon 1:30 P.M. - 5:30 P.M.
 
1:30 5:30 THE TECHNOLOGY OF NATURAL LANGUAGE INTERFACES
          Carole D. Hafner, Northeastern University
 
ABSTRACT
 
This tutorial will describe the development of natural language
processing from a research topic into a commercial technology.  This
will include a description of some key research projects of the 1970's
and early 1980's which developed methods for building natural language
query interfaces, initially restricted to just one database, and later
made "transportable" to many different applications.  The further
development of this technology into commercial software products will
be discussed and illustrated by a survey of several current products,
including both micro-computer NL systems and those offered on
higher-performance machines.  The qualities a user should look for in a
NL interface will be considered, both in terms of linguistic
capabilities and general ease of use.  Finally, some of the remaining
"hard problems" that current technology has not yet solved in a
satisfactory way will be discussed.
 
INTENDED AUDIENCE
 
This tutorial is aimed at people who are not well acquainted with
natural language interfaces and who would like to learn about 1) the
capabilities of current systems, and 2) the technology that underlies
these capabilities.
 
 
BIOGRAPHICAL SKETCH
 
Dr. Carole D. Hafner is Associate Professor of Computer Science at
Northeastern University.  After receiving her Ph.D. in Computer and
Communication Sciences from the University of Michigan, she spent
several years as a Staff Scientist at General Motors Research
Laboratories working on the development of a natural language
interface to databases.
 
 
 
 
 
1:30 5:30 THE ROLE OF LOGIC IN REPRESENTING MEANING AND KNOWLEDGE
          Robert C. Moore, SRI International
 
ABSTRACT
 
This tutorial will survey the use of logic to represent the meaning
of utterances and the extra-linguistic knowledge needed to produce
and interpret utterances in natural-language processing systems.
Problems to be discussed in meaning representation include
quantification, propositional attitudes, comparatives, mass terms and
plurals, tense and aspect, and event sentences and adverbials.
Logic-based methods (unification) for systematic specification of the
correspondence between syntax and semantics in natural language
processing systems will also be touched on.  In the discussion of the
representation of extra-linguistic knowledge, special attention will
be devoted to the role played by knowledge of speakers' and hearers'
mental states (particularly their knowledge and beliefs) in the
generation and interpretation of utterances and logical formalisms
for representing and reasoning about knowledge of those states.
 
INTENDED AUDIENCE
 
This tutorial is aimed at implementors of natural-language processing
systems and others interested in logical approaches to the problems
of meaning representation and knowledge representation in such
systems.
 
BIOGRAPHICAL SKETCH
 
Dr. Robert C. Moore is a staff scientist in the Artificial
Intelligence Center of SRI International.  Since joining SRI in 1977,
Dr. Moore has carried out research on natural-language processing,
knowledge representation, automatic  deduction, and nonmonotonic
reasoning.  In 1986-87 he was the first director of SRI's Computer
Science Research Centre in Cambridge, England.  Dr. Moore received
his PhD from MIT in 1979.
 
 
 
 
 
1:30 5:30 MACHINE TRANSLATION
          Sergei Nirenburg, Carnegie Mellon University
 
ABSTRACT
 
The central problems faced by a Machine Translation (MT) research
project are 1) the design and implementation of automatic natural
language analyzers and generators that manipulate morphological,
syntactic, semantic and pragmatic knowledge; and 2) the design,
acquisition and maintenance of dictionaries and grammars.  Since a
short-term goal (or even medium term goal) of building a system that
performs fully automated machine translation of unconstrained text is
not feasible, an MT project must carefully constrain its objectives.
 
This tutorial will describe the knowledge and processing requirements
for an MT system.  It will present and analyze the set of design
choices for MT projects including distinguishing features such as
long-term/short-term, academic/commercial, fully/partially automated,
direct/transfer/interlingua, pre-/post-/interactive editing.  The
knowledge acquisition needs of an MT system, with an emphasis on
interactive knowledge acquisition tools that facilitate the task of
compiling the various dictionaries for an MT system will be
discussed.  In addition, expectations, possibilities and prospects
for immediate application of machine translation technology will be
considered.  Finally, a brief survey of MT research and development
work around the world will be presented.
 
INTENDED AUDIENCE
 
This tutorial is aimed at at a general audience that could include
both students looking for an application area and testbed for their
ideas in natural language processing and people contemplating
starting an MT or machine-aided translation project.
 
BIOGRAPHICAL SKETCH
 
Dr. Sergei Nirenburg, Research Scientist at the Center for Machine
Translation at Carnegie-Mellon University, holds an M.Sc. in
Computational Linguistics from Kharkov State University, USSR, and a
Ph.D. in Linguistics from the Hebrew  University of Jerusalem,
Israel.  He has published in the fields of parsing, generation,
machine translation, knowledge representation and acquisition, and
planning.  Dr. Nirenburg is Editor of the journal Computers and
Translation.
 
 
 
 
       SECOND CONFERENCE ON APPLIED NATURAL LANGUAGE PROCESSING
                         Conference Committee
 
General Chair
Norman Sondheimer, USC/Information Sciences Institute
 
Secretary-Treasurer
Donald E. Walker, Bell Communications Research
 
Program Committee
Bruce Ballard (Chair), AT&T Bell Laboratories
Madeleine Bates, BBN Laboratories
Tim Finin,  Unisys
Ralph Grishman, New York University
Carole Hafner, Northeastern University
George Heidorn, IBM Corporation
Paul Martin, SRI International
Graeme Ritchie, University of Edinburgh
Harry Tennant, Texas Instruments
 
Tutorials
Martha Palmer, Unisys
 
Local Arrangements
Jonathan Slocum, MCC (Chair)
Elaine Rich, MCC
 
Exhibits and Demonstrations
Kent Wittenburg, MCC
 
Publicity
Jeffrey Hill and Brenda Nashawaty, Artificial Intelligence Corporation
 
 
------------------------------
=========================================================================
Date:         6 December 1987, 18:22:13 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contibuted by Robert Amsler <amsler@flash.bellcore.com>
Subject: Reply to James H. Coombs `ACH Text markup' message (109 lines)
 
(I'll make this reply public from the start since Nancy Ide already
had to double-back and make her's public afterwards. It may,
however, become a suitable topic for a more extended private
discussion between those with an interest in text encoding
standards.)
 
As Nancy already noted, SGML is the most likely model which will be
used for the Humanities Text Standard, however there was considerable
concern at the meeting by the French delegation about the workshop
endorsing SGML as the official standard to be emulated.  In view of
that, it was deemed essential to avoid specifically saying this in
favor of the broader statement that we'd attempt to be compatible
with applicable existing standards where possible. Specifically,
this also includes character transliteration standards--which are a
considerable part of a humanities text standard's encoding problems.
(I can hardly wait for ISO to adopt an official standard for encoding
Egyptian hieroglyphics in ASCII!)
 
I would also however like to make a strong statement that from a
computational perspective there is no need for any one format to be
the only one used. What is needed is that any format must be fully
documented and an information-preserving transformation of the
contents of any approved standard format.  This was captured in the
statement that the standard would be an `interchange' format.
 
This does beg the issue of how the transformation takes place, i.e.
a program needs to be written or capable of being run on the `other'
format and on hardware available to the recipient of the data, but it
is important to note that an SGML-like format may appear as very
formidable to users who believe they will have to type in all the
special codes manually--whereas a `keyboarding' format may be just as
faithful in representing the information without undo burden to the
typist. I'm sure you will agree to this since your excellent CACM
article notes that one of the most overlooked forms of markup is the
use of traditional English punctuation and spacing conventions.
 
Returning to your message's points, your 4th point seems to  me to be
exceptionally good and something that we did not explicitly get to in
the Poughkeepsie meeting, i.e.,
 
``4) There should be no attempt at establishing a "closed" tag set. The
 current AAP SGML application allows for definition of new tags, but
 it does not support such definition in a practical way. The
 consequence is that people will use "list items," for example, when
 they should be using "line of poetry." Within these guidelines, it
 can only be healthy to provide a list of tags that people should
 choose from when tagging certain entities.
 
 The point of this is that we cannot predict what textual elements
 will be of significance for what researchers. We have to allow for
 the discovery of textual elements that no one has categorized
 previously. At the same time, there is no point in having 30
 different tags for "line of poetry." The guidelines should make
 clear that DESCRIPTION is paramount and that the use of particular
 tags is secondary.''
 
 
I think the means by which this latter goal, of not having 30
equivalent tags for the same text element, is to be handled will be an
important role of the text encoding standards subcommittees.
 
What it strikes me are needed here are the database concept of a
`data dictionary' to provide definitions for all the `tags' and the
information-science concept of a tangled hierarchical thesaurus of
tags (terms) including the 4 major categories of `broader tag' (BT),
`narrower tag' (NT), `related tag' (RT) and `use instead' (XT ?) type
of pointers. Thus the standards subcommittees should begin work on a
thesaurus of tags which defines each tag's intended domain of text
entities, its relationship to other more general and more specific
tags as well as related tags and tags which should be used instead of
a given tag.
 
This means, for example, that in tagging a text feature, one could
use a generic tag such as `paragraph' or a more specific tag such as
`summation paragraph' and that an author would have a guidebook of
established possible tags that would tell them the options and what
qualifications a text object had to have in order to qualify for the
use of such a tag.
 
I do think it is important to allow for arbitrarily deep extensions
of the tagging, but any standard will have failed if every author has
to resort to inventing their own tags to encode text.
 
Note, this is still independent of the issue of `required minimum
tags' in that the dictionary and thesaurus of tags only tell the user
how a tag should be used and what alternatives exist to its use--they
do not say that a tag must (or must not) be used (except in the case
of the `use instead' pointers that attempt to avoid tags being used
ambiguously). My model of what such a Thesaurus should look like is
the ERIC Thesaurus of Descriptors.
 
 
 
 
Robert A. Amsler
Bellcore
435 South St., Morristown, NJ 07960
(201) 829-4278
=========================================================================
Date:         7 December 1987, 09:40:52 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributed by Sterling Bjorndahl <BJORNDAS@CLARGRAD>
Subject:  The CD-ROM debate: erasable optical disks
 
Speaking personally, I am not going to run out and buy a CD-ROM drive for my
home computer until we see what the next generation of laser technology is
going to look like in terms of cost and performance.  The latest I have
heard on the topic is on page 12 of the December _Byte_:
 
   "Matsushita, the large Japanese parent company of Panasonic, ... will
   deliver a prototype of an erasable optical disk drive next year,
   probably in the third quarter, a company spokesperson said.  It will
   probably be competing with products from Sony, Philips, and Kodak.
   Matsushita has invested heavily in the phase-change scheme, so that's
   probably the technology that will be incorporated in the drive it
   brings to market.  In phase-change technology, molecules of tellurium
   suboxide change from an amorphous noncrystalline state to a crys-
   talline state and back again, depending on the type of laser beam ap-
   plied.  But the company is also studying other approaches, including
   magneto-optical (which Sony is using) and dye-polymer technologies.
   One hurdle all the pioneers of erasable optical drives will have to
   leap is the slowness of the units, caused partially by the size of
   the optical disk head, which is much bigger than a head in a typical
   magnetic drive."
 
CD-ROM is available now, with texts that I want to use, so I am glad that I
have access to a system that can use that technology (Ibycus).  And I think
Bob Kraft has listed some excellent reasons for using CD-ROM technology
where appropriate.  However, I don't want to spend my own money on something
that will limit my flexibility in the future.   Thus my caution, until I can
determine whether the new technology will be practical for me.
 
I remember reading an article on erasable optical drives in a popular m  aga-
zine within recent months.  I thought it was Scientific American, but I
can't locate the article among the issues in my magazine rack.  Does anyone
else know of it?  I believe it was on magneto-optical technology, and I
remember it mentioning data densities on the order of current CD-ROM tech-
nology rather than the current WORM technology (which seems to be worse than
half as dense as CD-ROM).
 
 
     Sterling Bjorndahl
     Institute for Antiquity and Christianity
     Claremont Graduate School
     Claremont, California
=========================================================================
Date:         7 December 1987, 13:23:51 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Research opportunity (58 lines)
 
Contributed by E S Atwell <eric%ai.leeds.ac.uk@cernvax>
 
Dear fellow "Computational Humanities" researcher,
 
Do you know of any young graduates interested in corpus-based computational
research on the English language?  I have an opportunity for an aspiring
researcher to come to Leeds for a 'taster', to work on a large collaborative
project.  I would be very grateful if you could forward the following details
to any potential candidates you know of.
Thank you for your help,
 
 Eric Steven Atwell
 Centre for Computer Analysis of Language and Speech
 AI Division, School of Computer Studies
 phone: +44 532 431751 ext 6307/6119
 Leeds University, Leeds LS2 9JT U.K.
 UUCP:  ...!seismo!mcvax!ai.leeds.ac.uk!eric
 EARN/BITNET/ARPA: eric%leeds.ai@ac.uk
--------------------------------------------------------------------------
    Vacancy:   RESEARCH  ASSOCIATE to develop a NATURAL LANGUAGE PARSER
 
   COMMUNAL: COnvivial Man-Machine Understanding through NAtural Language
 
Artificial Intelligence Group, School of Computer Studies, Leeds University
 
COMMUNAL is a large collaborative research project aiming to develop a robust
Natural Language interface to Expert Systems, allowing access through natural
English dialogue.  This will require software to analyse and parse the user's
input; convert it into the internal Knowledge Representation formalism; infer
an appropriate response or solution; and generate output in ordinary English.
At Leeds University, we will develop a powerful parser, based on a Systemic-
Functional model of English grammar.  The other partners in the project are:
UWIST (project coordinators), the Ministry of Defence, ICL, and Longman.
 
The appointee will be principally involved in designing, building, testing and
documenting the parser software, using POPLOG prolog on a Sun Workstation.
She/he will be expected to liaise with and learn from other researchers in
the Centre for Computer Analysis of Language and Speech (CCALAS) and related
research groups at Leeds and elsewhere; there will be opportunities for
travel, to coordinate research with other partners, and to present results at
international conferences.
 
The post is for a fixed term of 18 months in the first instance, although the
project may continue to a Second Phase.  Starting salary is to be 8185 p.a.,
with an expected 7% increase in March 1988 and a further increment later.
We require an appointment as soon as possible; please contact Eric Atwell via
JANET (eric@uk.ac.leeds.ai) or EARN/BITNET (eric%leeds.ai@ac.uk), or by phone
on (+44 532) 431751 ext.6119 or 6307 for further details of the post and how to
apply; I can also give some idea of cost of living, housing etc for applicants
outside the UK.
=========================================================================
Date:         8 December 1987, 09:34:02 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
Comments:     From: amsler@flash.bellcore.com (Robert Amsler)
From:         MCCARTY@UTOREPAS
Subject:      Coombs' ``Markup: On Requirements'' message
 
While I have great sympathy for the goals expressed by James H. Coombs
in this message, I have no optimism about the methods suggested to
achieve those goals.
 
The issue here is one of money and the existing source of such
funding would be the same source of funding which currently supports
research in the humanities. If we propose that a computer archive in
the humanities should have all these desirable properties, then
unless a new source of funding is provided, it would have to take
funds away from other types of humanities research.
 
The alternative would be to create a self-funded archive which
would have to derive funding from the sale of copies of its
machine-readable data. This seems possible, perhaps funded by
a surcharge something like that of the current copyright clearance
center to whom most libraries send payments when they make
photocopies of magazine and journal articles. However such a center
would have to also be prepared to legally sue users of copyrighted
data who did not pay for their copies. I have no trouble with this
since as Howard Webber recently said, if we interfere with the flow
of funding back to the creators of intellectual property, we will
eventually cut off the funds to develop such works.
 
At present most texts in the humanities in  machine-readable form are
either  the  result  of funded  research or  `donations' of humanists
time.  This creates a poorman's archive.  The real owners of the bulk
of  the  humanities  texts  not  available  are  the  publishers, who
routinely destroy the machine-readable works they print  because of a
variety of excuses similar to those of monks burning manuscript pages
to light their candles.
 
We need to form an archive in which major humanities publishers
would be eager to deposit their machine-readable tapes--for the
purpose of generating additional revenue from their computational
use.
 
I do not think attempting to Prussianize either the volunteer
humanities data enterers nor the existing marginally-funded
archives would be a very good idea.
 
Robert A. Amsler
Bellcore
Morristown, NJ 07960
=========================================================================
Date:         8 December 1987, 09:38:34 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
Comments:     From: amsler@flash.bellcore.com (Robert Amsler)
From:         MCCARTY@UTOREPAS
Subject:      Text Encoding, a reply to James H. Coombs comments
 
[This is a reply to some of James H. Coombs comments on Nancy Ide's message]
 
Coombs writes:
 
 ``A minor philosophical point, I guess:  I don't think that we CAN know
   our  needs  fully.    We need  standards that  accommodate needs that
   cannot  be  predicted  today.    The  practical  consequence  of this
   observation,  which  I'm  sure  Nancy would  agree with,  is that one
   should seek a "productive" system instead of a  system that satisfies
   everything  on  a  list,  and  one  should  not  spend a  lot of time
   developing the list.''
 
Once upon a time I was doing a survey of the keywords and descriptors
used to characterize articles in the Communications of the  ACM.  The
keywords  were author-supplied  terms that  described their article's
content;  the  descriptors  were  selected  by  the  author's  from a
pre-specified set of similar content descriptors supplied by the ACM.
 
What I discovered was that as I collected more and more instances of
keywords created by the authors, there was no closure whatsoever.
The set of terms just kept expanding and there were large numbers of
keywords which only one author used, and then only for one article.
 
This  is how  I see  the problem  of the  selection of  tags for text
entities in documents.  That is, if the system is completely open and
`productive'  there  will  be  little  commonality  between  author's
selections--whereas  if  the  authors  are  offered  a  wide-range of
approved  tags to  select from,  then they  will manage  to find tags
which meet their needs.
 
-------
 
 ``What are "multiple parallel hierarchies"?  I can guess, but I want to
   be sure that I understand the problem.  In a most documents, we have,
   for  example,  pragmatic  and  syntactic  hierarchies.''
 
The term was used at the Vassar meeting by David Barnard, I believe.
The statement was in reference to the difficulty of software
developers in providing software capable of interpreting a document
written in the full SGML standard and as far as I'm aware there is
still no full-SGML-capable software available.
 
I assumed that he was referring to the potential division of a work
into OVERLAPPING tagged segments, i.e. it would be possible to have
a work with tags which ended inside the span of other still running
open tags, e.g.,
 
<line> <sentence> ... <foreign-word> ... <\line>
<line> ... <\foreign-word> ... <\sentence> <\line>
 
The problem here is that some entity would be broken into two parts
if any entity were extracted.
 
 ``What is the practical value of a metalanguage that generates all markup
   languages?  I would think that it would be so abstract as to be of no
   value.''
 
Who said `generates'; what we were discussing was a meta-language
which `parses' all markup languages--a sort of least upper bound
markup language.
 
The thought was that we needed to accomodate all reasonable existing
texts with markup information already in them. We weren't intending
to require existing texts with carefully worked out markup schemes to
be redone in a scheme which would offer nothing new to their markings
other than a different way of noting the same information.
 
However, your next point is well-taken...
 
 ``I suspect that this is part of  the goal  of salvaging  work that has
   been inadequately coded.''
 
Actually we were thinking of salvaging work that had been ADEQUATELY
coded before a standard was available. Rather than requiring every
such work to be recoded in a new format, it was hoped that the new format
could accept the existing works as is. Whether that is possible or not,
as Nancy stated, is an open question since we haven't yet collected the
documentation for existing collections of text and their formats.
 
 ``I believe that we will be better off if we worry less  about the past
   and  plan  more  for  the  future.    I  suppose that  it's true that
   publishers  have typesetting  tapes in  their basements,  and that we
   could use those tapes.''
 
Actually, no they don't.   They ordinarily  don't get  the tapes from
the printers and if they did would only get the last  version on tape
before the final manual  cut-and-paste corrections.   Publishers thus
routinely ignore and discard this phototypesetting data  as a useless
intermediate step  and save  the `valuable'  printing plates instead.
One reason is there is no common format in which to save the data for
reuse.  Each printer has their own variant of the hardware/software.
 
However, regardless of that, the next is very true.
 
 ``I think that we have to accept that those  tapes are  of little value
   until someone converts the coding to descriptive markup.
                ....
   Whether  it's a  dictionary or  a literary  text, we  can expect that
   inadequate coding will cause considerable work  for anyone attempting
   to use the database.  A metalanguage that  includes procedural markup
   as well as descriptive markup will not help  in such  a case, because
   one still  has to  map procedural  markup onto  descriptive markup in
   order  to  be  able  to  work  with  meaningful entities (definition,
   paragraph,  etc.).    Since procedural  markup tends  to be performed
   somewhat  arbitrarily  and  does  not  normally  provide a one-to-one
   relationship between entity and markup, there is no metalanguage that
   will help a researcher perform the necessary conversions.
 
You are mixing two things here. First, while it is true one cannot go
from a typesetting tape to a descriptive markup in one step, it
doesn't necessarily follow that the procedural markup is useless. A
case in point is dictionaries. There IS NO descriptive markup
standard for dictionary entries (I'm working on developing one with
a number of other computational lexicologists, but none exists right
now), yet the phototypesetting tapes of dictionaries are very useful
to creating a descriptive markup of their contents. Headwords are
typeset in boldface, possibly outdented, certainly starting new
lines; parts of speech are in italic, pronunciations are in special
fonts for their phonetic characters, usually enclosed in (,)'s or
similar delimiters. Etymologies prefer [,]'s. Labels are in italics,
sense numbers in boldface, definition texts in Roman type, with
examples sometimes offset in <,>'s and sometimes in italics. All of
these are positionally context-sensitive within the dictionary entry.
Their descriptive nature can usually be unambiguously determined
from the positional and font information on a phototypesetting tape.
 
It would be a genuine aid to the people who today decode such
phototypesetting tapes if they were in only ONE procedural markup
language. At present they are in innumerably many different markup
languages.
 
 ``What we  really need  is a  sensible and  dynamic standard.   I don't
   think that anyone would argue that  that standard  should be anything
   other  than  descriptively  based.    Since we  are going  to have to
   convert texts to descriptive markup in order to use  them anyway, why
   not just develop the standard  and convert  as necessary.   Trying to
   save the past is just going to retard development.''
 
The reason is that ther conversion is going to have to be done fairly
often UNTIL a standard for both procedural and  descriptive markup is
available.    We have  no future  without the  publisher's adopting a
descriptive markup eventually, but until they do, we have no sensible
future in hand-entry of published books when some electronic
typesetting format is available. Keyboarding the OED, for instance,
took several MILLION dollars! If the typesetting data had been
available in machine readable form, it would probably have reduced
the effort by a factor of ten.
 
 
 
Again...
 
Robert A. Amsler
Bellcore
Morristown, NJ 07960
 
 
=========================================================================
Date:         8 December 1987, 09:46:13 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Encoding schemes, text archive (reply to Coombs)
 
Contributed by "Michael Sperberg-McQueen"   <U18189@UICVM>
 
James Coombs has suggested I post my reply to his comments on
text encoding.  This is it; I have also appended a note on the
phrase 'multiple incompatible hierarchies' which seems to be unclear.
 
----------
 
Many thanks for your note about text encoding and the ACH/ACL/ALLC
initiative in particular.  I agree with you in every detail, as far
as I can see, and will just try to clarify a couple of things
quickly, which can be discussed at length later.
 
1 in preparing for the Vassar meeting, the ACH committee on text
encoding had come up with a plan similar to the base + extensions
that you suggest.  The basic idea of user extensions was universally
accepted, but the word 'require' was complete anathema to a number
of people at the Vassar meeting.  These were (a) those in charge of
large existing text archives, who wanted to make very sure the
guidelines would not turn into something their funding agencies
would eventually require (no single quotes here!) them to conform
to; and (b) some people worried about the possibility that funding
agencies and their reviewers might use the 'requirements' of any
guidelines to refuse funding to anyone who deviates from the
required minimal tagging, even for adequate scholarly reasons.
It was agreed to 'recommend' certain minimal tags (the verse
paragraphs of Milton would be a good example) for newly encoded
texts, but consensus could not be reached on any more than that.
This was a disappointment to me, but appears on reflection to affect
not the structure of the guidelines but only the choice of words
to describe it.
 
In any case, there will be a fairly extensive pre-defined tag set,
I expect, but not a closed one.
 
2 SGML should have been mentioned explicitly in the closing
document, but at the last minute some delegates objected that such
details were too low-level to deserve mention in such a statement
of principles.  The objection was presented as being stylistic, but
may have been partly substantive.  In any case, the planning group
at Vassar were unwilling to commit themselves to SGML without
reservation, because it was not clear how well SGML proper could
handle the multiple incompatible hierarchies necessary for a lot
of textual research, and some objected to what they said was SGML's
verbosity.  The SGML supporters did succeed in persuading the
group that SGML should be used, unless experience showed it simply
could not.  (We know full well experience will show no such thing.)
Whether we try to formulate formal document type definitions or not
remains to be seen, but given the unregulated habits of the texts
we study, cleanly defined hierarchies of the sort DTDs are designed
for won't be very easy or do anyone much good.  (The OED people
said that they use SGML syntax but have never bothered with a DTD
and never missed one.  The variety of entries in the dictionary,
they said, is such that a type definition couldn't be written in
advance anyway, and written after the fact would just be an
inventory of the various forms of entries they had empirically
found.)
 
In any case, we are hoping not to re-invent SGML.
 
In fact, some people were very interested in attempting to use SGML
for the metalanguage required to describe existing encoding schemes,
but I am uncertain whether SGML itself will be useful in defining
the syntax and semantics of procedural markup or of old card-oriented
encodings with author / play / act / scene / line references encoded in
columns 73-80.  But perhaps when I finally  get my hands on a copy
of the standard itself, I'll find out it can do all of that too.
 
------- [ end of extract from original note ] -----
 
HUMANISTs will be very interested in the article Coombs et al. have
just published in the Communications of the ACM, and I encourage
anyone interested in encoding texts or in using encoded texts to
read it.
 
Further clarifications and suggestions:
 
3 'multiple parallel hierarchies' (Ide) and 'multiple incompatible
hierarchies' (above) seem not to be immediately clear to all.  We
mean:  if you mark a text with BOOK PART CHAPTER PARAGRAPH SENTENCE
TOKEN you have one hierarchy, but marking the same text with VOLUME
PHYSICAL-PAGE PHYSICAL-LINE TOKEN gives a different and incompatible
hierarchy, as does PART CANTO STANZA LINE FOOT SYLLABLE.  Physical
lines cannot fit at all into the text-pragmatics + syntax hierarchy,
and neither can the metrical units.  But physical layout and
metrical units are crucial for analytic bibliographers and metrists.
SGML formally allows definitions of such messy multiple hierarchies
only in its optional portions -- so some SGML applications won't
be able to handle such parallel hierarchies 'correctly'.  No one
seems to know whether this will matter in practice.
 
Michael Sperberg-McQueen, Univ. of Illinois at Chicago
=========================================================================
Date:         8 December 1987, 09:58:27 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Bibliography or bibliographer needed (40 lines)
 
Contributed by "Rosanne G. Potter" <S1.RGP@ISUMVS>
 
I am editing a book on Literary Computing and Literary Criticism containing
essays by Richard Bailey, Don Ross, Jr., John Smith, Paul Fortier, C.
Nancy Ide, Ruth Sabol, myself and others.  I am looking for someone
who already has, in a fairly advanced state, a bibliography on this
subject, or who is an experienced bibliographer and can put one togehter
in the month of Jan and Feb (the latest).   Anyone who meets either of
these descriptions or can suggest the name of someone who could fulfill
this need, please let me know.  The book is completely written, a copy
could be sent immediately to anyone seriously interested in the project.
 
The current situation is that Art Evans at U of Penn Press wants to
publish the book is, in fact, planning to get it into the Fall List, but
we are both waiting on two readers reports--the UPENN board is not
as enthusiastic about the possibility of a collocation between LIT CRIT
and Computing as either Art or I wish--so they must be convinced by
the readers reports.  Whether Penn publishes it or not, I have little
doubt that I will be able to find a suitable publisher--though not
likely one who will publish it as quickly--and that a bibliography
will be required by some reader or board soon.  (I'm surprised it hasn't
happened yet.)
 
Rosanne G. Potter
Department of English
Iowa State University
Ross Hall 203
(515) 294-2180 (Main Office)
(515) 294-4617 (My office)
(515) 232-4473 (Home)
BITNET:  GG.BIB@ISUMVS or S1.RGP@ISUMVS
=========================================================================
Date:         8 December 1987, 10:06:53 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Text archives, centers (92 lines)
 
Contributed by  "Michael Sperberg-McQueen"   <U18189@UICVM>
 
Jim Coombs asks whether a North American text archive would get us
anything we can't already get from Oxford, and if so how it should be
funded and organized.  I think it can, and this is why:
 
  1 First, note:  We *don't* need a North American text archive just
as an archive or text repository.  In this area, Oxford's work can
hardly be faulted.  They take everything, they work hard to document
everything, they distribute as freely as their donors allow them to.
 
  2 A North American center, though, could and should be set up to be
funded by a number of universities.  No one university in North
America is likely to fund the kind of public service Oxford performs,
let alone anything more.  But ongoing funding from many schools
could make it possible for a center to do some things that the Oxford
Archive just does not have the funding or staff to do.
 
  3 A North American center does not (thank heaven) have to compete
with Oxford; it would make far better sense to work in cooperation.
Oxford (in the person of Lou Burnard), it pleases me to say, agrees.
 
  4 A center should provide a locus for cooperation in all the areas
where universities now must pay large amounts of money to re-invent
the wheel.  That is:
    a  creating new machine-readable texts (preferably according to
some rational plan, as well as on demand)
    b  documenting existing machine-readable texts both for users
and for collecting libraries -- that is, the center should provide
a basis for cooperative library cataloguing of machine-readable texts
and the distribution of the catalogue records to the library community
    c  upgrading existing machine-readable texts, converting them
to a standard format and checking (or spot-checking) their validity
    d  distributing all these texts to scholars
    e  training of users and of computer advisors/consultants, via
summer seminars, short-term grants to individuals to work in residence
on their projects (in funding-agency terms, acting as a re-granting
agency, I think this is called)
    f  (possibly) assisting software development, either by helping
establish and encourage cooperation among university-based developers
or by performing development work of its own.  (Frankly, I'm a little
unsure how useful or feasible this is, but it's a point one often
hears, so I mention it.)
 
This is not an exhaustive list.  It reflects what I know happens at
Oxford, Toronto, Penn, BYU and such places.  Also what ICPSR (the
Inter-University Consortium for Political and Social Research) does
now.
 
  5  Funding -- it seems to me the universities should pay for this
center, just as they do for ICPSR.  We don't want just a consortium
of humanities-computing centers, because many universities don't choose
to support humanities computing that way.  We want services to include
enough library services that at least some university or college
libraries will want to join.  We want data distribution to be important
enough that local data archives will want their schools to join.  We
want enough emphasis on humanities research that humanities departments
will lobby for membership, enough benefits to computing
consultants/support staff that computer centers will be in favor too.
Who pays the membership fee will obviously depend on the internal
politics of the institution, but the membership should be by
institution.  (Obviously it also must be possible to support the needs
of independent scholars.  But arrangements to that end must not allow
schools to reap the benefits of having a center while evading the
costs of supporting it.)
    The motive to join must, I think, be partly altruistic, partly
financial.  By joining such a consortium a school can help support
humanities research in general, and get an awful lot of data free.
Not joining, then, must mean the data costs money.  It is not hard to
figure that a consortium membership can be far cheaper than acquiring
a scanner, paying maintenance on it, and running it within the
school.  Joining must also be preferable to buying the data from the
consortium.
 
  6 A center like this could support text-encoding standards in ways
I think Oxford would find difficult.  Without some deep changes in
its funding, the Oxford Archive can't hope to convert its holdings to
a new standard.  A new center could make that part of its raison
d'etre.
 
  7 Obviously, there is no need to limit the membership of a consortium
such as I have just described to North America.  But I think that's
where the need is, research in Europe being organized on different
lines.
 
  8 This concept of a consortium-supported general-purpose archive
and center contrasts sharply with that of cooperation among humanities
computing centers and with that of a set of regional or discipline-based
centers, which have been propounded in recent years from some quarters.
I hope those who prefer those plans to this will be persuaded to
describe to use how they would prefer to see things organized.  That
would be extremely useful to us all.
 
Michael Sperberg-McQueen, University of Illinois at Chicago
=========================================================================
Date:         8 December 1987, 10:11:52 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      In reply to Robert Amsler on ACH Text Markup
 
Contributed by "James H. Coombs" <JAZBO@BROWNVM>
 
Robert says:
 
         I would also however like to make a strong statement that from a
         computational perspective there is no need for any one format
         to be the only one used.  What is needed is that any format
         must be fully documented and an information-preserving
         transformation of the contents of any approved standard format.
         This was captured in the statement that the standard would be
         an `interchange' format.
 
First, I am confused by the word "format."  I would like to see
something more specific, such as "markup language."  Perhaps ACH does
intend something broader than markup language though.  I need a
definition to know that I know what is being referred to.  Does "format"
here include things like the location of markup?  Microsoft Word, I am
told, stores markup at the end of the file.  Well, that seems to me on
further thought to be a markup language with something like a postfix
syntax.
 
While it's true that we can process files with a variety of markup
languages, we need more than full documentation.  We have full
documentation of the procedural markup language for Waterloo SCRIPT, for
example, but that markup language does not provide us with the
information that we need; instead of telling researchers that an entity
is a "verse paragraph", it tells Waterloo SCRIPT what procedures to
perform at a particular point in the text stream.
 
Perhaps "information preserving" is intended to capture this notion
somewhat.  Well, the information must be encoded in the text in the
first instance before we can preserve it, and we want researchers to
encode the information when they enter the text.
 
An "interchange format"?  Again, I'm a little confused, or I'm not
convinced that primary needs are being addressed.  We have standards for
document interchange that preserve nothing but formatting information,
e.g., font changes. If the primary instance of a text encodes nothing
more than formatting information, then we will have information
preservation, but the information that we preserve will not be the
information that we really need.  We will know how to print a text, but
we won't know (computationally) what the individual entities are.
 
I imagine that Robert and others agree with everything that I have said.
What I am asking for is more precision and, above all, a commitment to
descriptive markup.  In addition, I am asking for some restrictions on
the markup languages.  Specifically, markup should be contiguous with the
elements that are being marked up; markup should appear in the text
stream.  Yes, we can process files that store all of the (electronic)
markup at the beginning or at the end, or part here and part there.  But
why should we invite this complexity?  Can we reasonably expect every
scholar to have access to programmers who can convert many formats into
one?  Or can we reasonably expect every scholar to have a separate
concordance program, a separate retrieval program, etc., for every
possible markup langauge? or for every markup language that is used in
one of the texts that the scholar needs?
 
While it may be convenient in the short term for someone to sit down and
type in Microsoft Word, our acceptance of Word documents would be very
expensive to many people for many years.  What is the value of a standard
that allows this?
 
         it is important to note that an SGML-like format may appear as
         very formidable to users who believe they will have to type in
         all the special codes manually--whereas a `keyboarding' format
         may be just as faithful in representing the information without
         undo burden to the typist.  I'm sure you will agree to this
         since your excellent CACM article notes that one of the most
         overlooked forms of markup is the use of traditional English
         punctuation and spacing conventions.
 
If SGML appears formidable to people, let's educate them, and let's
develop software that minimizes the effort.  Currently popular software
seems to minimize markup effort, but it fails to record sufficient
markup.  Unaware of the deficiencies of their software, people say that
they want more fonts, for example, and that they are not interested in
descriptive markup.  We need to make it clear to Microsoft, Dragonfly,
and others that we need descriptive markup.
 
I think that we want to stay away from the word "keyboarding."  One of
the complaints has been that people do not want to "keyboard" markup.
So, if we use "keyboarding" for the act of performing scribal markup
(punctuational and presentational) but not for the act of typing
descriptive (and referential) markup, then we invite confusion.
 
Clearly Robert is referring to the use of punctuation, and he must also
be referring to the use of presentational markup (e.g., skipping space
between paragraphs).  Both of these forms of markup have deficiencies
that descriptive (and referential) markup do not have.  Above all, they
are ambiguous; in addition, they are often much harder to parse.
 
First, ambiguity.  Periods are used to end sentences and they are used to
indicate that a string of characters is an abbreviation (Mr.).  Perhaps
even worse, the same character is used to indicate that a word is
possessive and to indicate the end of an imbedded quotation, e.g.:
 
  a) She told him, "Do not say 'dogs' house' anymore."
 
How much time do we want to waste on developing algorithms to parse this
markup?  Why not use (b) instead?
 
  b) She told him, <q>Do not say <q>dogs' house</q> anymore.</q>
 
Software can easily display (a) when it has recorded (b), but it cannot
easily generate (b) when it has recorded (a).
 
Of course it is easier for most of us to enter (a) than it is to enter
(b); it is always easier to do half the job.  Once we start accepting
this responsibility, we will start convincing software developers to
support our needs, and entering (b) will not require much more than
entering (a) does now.
 
Why would anyone want to record (b)?  Well, they might want to print the
text with open and close quotation marks.  They might want to study all
of the quotations, or all of the imbedded quotations.  They might want to
study the use of possessives.  And so on.
 
Similar problems occur with presentational markup.  Yes, if we have a
one-to-one mapping between presentational markup and text element, then
presentational markup records all of the information that descriptive
markup does.  We don't really need tags for each line of poetry in
*Paradise Lost*, for example.  We need know only that each line of poetry
is terminated with '\n', for example.  There is no conflict with SGML
here, however, since SGML supports this method of marking up texts.  In
fact, in such a case we don't really have presentational markup at all,
we have descriptive markup; the markup serves not to enhance the
presentation but to identify that a stream of text is a line of poetry.
 
You also load things a little with the phrase "undue burden," Robert.  In
part, I am arguing that there is a "due burden" that scholars must accept
if we are to get anywhere in this whole project of using computers to
assist our scholarship directly.  Part of that "due burden" is the proper
encoding of texts.
 
In addition, I think that you over emphasize the costs of entering
descriptive markup.  You do so partially by implicating that
presentational markup is easier to select and perform.  We argue in our
article that presentational markup is considerably harder to select, and
that there is no pretheoretical motivation for believing that either
form of markup is easier to perform than the other.  In addition, you
seem to classify markup as presentational whenever it does not consist
of tags.  Under our functional definition of descriptive markup, at
least, the markup that you are talking about is actually descriptive
markup.  In any case, the sort of markup that you are talking about is
provided for under the SGML standard.
 
I thank you for your kind words on our article.  Our next article will
help clarify the distinctions that we make and how we are making them.
For the present, it seemed more important to make people aware of the
advantages of descriptive markup.
 
I hope that my response does not seem overly microscopic.  I find again
and again that conceptual confusion leads to unnecessary practical
problems.  In order for scholars to decide what form of markup to use,
they must know clearly what the competing forms of markup are and what
each form has to offer.
 
Finally, your discussion of thesauri and tag sets is interesting.  I'm
not sure that I have anything to add to it.  Need to think about it more.
 
Cheers!
 
--Jim
 
Dr. James H. Coombs
Software Engineer, Research
Institute for Research in Information and Scholarship (IRIS)
Brown University
jazbo@brownvm.bitnet
=========================================================================
Date:         8 December 1987, 10:18:46 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      archival politics (50 lines)
 
Contributed by Lou Burnard  <LOU@VAX.OXFORD.AC.UK>
 
[This message was delayed as a result of finger trouble on my part - I
sent it to the wrong node - LB]
 
        " If I can't count on a text from a particular
         archive to meet my needs, what is my motivation for bothering
         with that archive; and what is the motivation for the
         archive's existence?  I certainly would not want to see it
         supported by public funds." (JAZBO on Friday)
 
This is fighting talk!  The only defence I can offer is that
a community gets the Archive it deserves. If you guys don't have the
sense to agree on a common language, why should the humble archivist
be expected to do it? On the contrary, I could argue that I had a
responsibility to preserve accurately the current state of affairs as
a dire warning to future generations.
I expect librarians would like to insist that all publishers produced books
to the same dimensions too (makes the shelving so much easier dontcha know).
I expect there were even once some librarians who did so insist. But I
doubt whether they won many friends.
 
There is a self-evident crying need "to set and maintain standards". But
it has to come from the community of users. Once a standard has been
defined, it is possible for an archive to indicate whether or to what
degree a text is conformant to it, and that is certainly something every
user has a right to expect of an archive. Once a standard exists it is
also reasonable to expect an archive to seek ways of converting
and enhancing nonconformant texts. But I don't think a general purpose
deposit archive has any right to decide what is or isn't acceptable until
such standards have been defined. After all, most of the texts we have
WERE useful to someone, at least once.
 
Finally, may I with the greatest deference point out that an archive
is emphatically not the same as a publisher. Publishers have to please
their public or they go under. An archive is a mirror of its users. If all
that its users wish to share is rubbish, reserving the best quality stuff
for themselves, then the archive will be full of rubbish. It's up to you.
 
Lou Burnard
Oxford Text Archive
=========================================================================
Date:         8 December 1987, 10:33:48 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Archives
 
Contributed by dartvax!psc90!jdg@ihnp4 (Dr. Joel Goldfield)
 
    Regarding Jim Coombs questions concerning Michael Sperberg-McQueen's
queries and comments, having a text archive at Oxford but not in North
America as well seems adequate if a pledge is made by Oxford to supply
these texts at a reasonable price (to be determined) and reasonably quickly.
    The only negative aspect I can think of at the moment if these
conditions are met is that it would certainly be costly to download them
via transatlantic (satellite) communication.  I would hope that telephone/
modem linkage to receive this information would be cost-effective and that
we wouldn't be limited to sending CD-ROM's or magnetic tape by mail.
 
                --Joel D. Goldfield
                  Plymouth State College (NH, USA)
=========================================================================
Date:         8 December 1987, 10:38:11 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Analysis of papyrological mss
 
Contributed by Jack Abercrombie  <JACKA@PENNDRLS>
 
In the interest of improving a program for papyrologists, the
Center for Computer Analysis of Texts is willing to make available
to colleagues a preliminary version of a program for mss analysis.
Manuscripts first must be digitized and stored in a TIFF format,
a common file structure used in desktop publishing.  The program
allows one to enhance the digitized image on an EGA screen. If
you have serious interest in assisting in the development work,
we would be willing to send you the source code.  You would have
to have access to a digitizer.  WRITE TO: JACKA @ PENNDRLS.
 
John R. Abercrombie
Assistant Dean for Computing,
Director of the Center for Computer Analysis of Texts
(University of Pennsylvania)
=========================================================================
Date:         8 December 1987, 10:44:17 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Coombs' ``Markup: On Requirements''
 
Contributed by Richard Giordano <RICH@PUCC>
 
I really can't see what all the fuss is about.  If people are serious
about creating both national and international standards for data "markups",
a data archive, and such related issues, I don't see why we don't work in close
collaboration with at least these three organizations: The American Library
Association; the Library of Congress; and the Research Libraries Group.
They have the resources, know-how, and institutional connections to develop
such standards, communication formats, and the like--and they have a track
record in this regard that extends back over twenty years.
 
Someone mentioned somewhere here that ALA was "a conservative bastion".
I have no idea what he means by this.  Traditionally, ALA and LC have both
taken the lead in the scholarly world in providing machine-readable
information.  The technical problems that LC has addressed have been
fundamental to data processing.
 
Rich
=========================================================================
Date:         8 December 1987, 10:47:21 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Sending messages to HUMANIST, an editorial plea (30 lines)
 
Dear Colleagues:
The new arrangement, whereby I intercept all messages to HUMANIST, seems
to have worked well so far, but about one thing some confusion has
arisen. Messages apparently intended for distribution sometimes are sent
to me directly, that is, to mccarty@utorepas.bitnet, rather than to
HUMANIST, i.e., humanist@utoronto.bitnet. My life would occasionally be
made simpler if you would all adopt the convention of sending messages
for distribution only to humanist@utoronto, even if you want my opinion
on whether or not they should be distributed. (In that case, put a note
to that effect in the message; I can easily delete the note.) If you
want to write to me *as editor* of HUMANIST, then please send your
message to mccarty@utorepas. Finally, beware of the distinction between
UTORONTO, where HUMANIST lives, and UTOREPAS, where I electronically
reside.
Thanks very much for making HUMANIST a lively place.
Yours, W.M.
_________________________________________________________________________
Dr. Willard McCarty / Centre for Computing in the Humanities
University of Toronto / 14th floor, Robarts Library / 130 St. George St.
Toronto, Canada M5S 1A5 / (416) 978-4238 / mccarty@utorepas.bitnet
=========================================================================
Date:         8 December 1987, 12:29:31 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      A North-American text archive and service centre (80 lines)
 
Contributed by Ian Lancashire <IAN@UTOREPAS>
 
Michael Sperberg-McQueen argues for a national North American
text archive and service centre, supported by a consortium
of colleges and universities. He contrasts this to a consortium
of humanities computing centres, which (because it involves
fewer institutions) can be perceived as serving only a
small percentage of faculty and students. He also challenges
someone to dispute this.
 
I'm suspicious of any proposal to centralize computing
needs about one data-processing shop. Competition
is the essence of being American, isn't it? The more
heads at work on a problem, the better chance of
finding an answer, or ideally something completely
new that we didn't expect in the first place.
 
Most of us have just won the fight for personal
computing equipment and software: resources for
which we're beholding to no-one because they are in
the marketplace, available for a price that's affordable
even to students (or should I say even to faculty).
Hasn't centralized computing lost the war in most
universities? Do we want to perpetuate it on a
national scale?
 
The more people creating text archives, the better, because
what we need are specialized collections from the scholarly
editors who have previously worked only with paper books.
Will the research projects set up to edit works by
individual authors trust a central archive to do their
work for them? Surely not.
 
Look at the same argument for centralized software
provision on a national scale. You can find clearinghouses
of MS-DOS programs at North Carolina and at Wisconsin,
and competitors emerge monthly from the woodwork.
Our colleagues cannot agree to accept only one
place for a software depository and distribution
centre. They long ago rejected centralized software
development because business proved it could produce
far better work than any academic could.
 
I'd rather buy my car from a car dealership that's
in business for the money than from the government
or from my engineering colleagues who occasionally build
faster, more efficient cars for academic reasons.
 
Few people in this field will argue with the idea of
cooperation or consortia. The question Michael poses
is, should the consortium be a collection of workers
or a collection of customers?
 
Probably a consortium of humanities computing
centres and facilities would be a good beginning to
to persuading our colleagues (wherever they are,
inputting whoever) that a circle has more strength
than a scattergram.
 
We could at least help make the market for machine-readable
texts profitable enough that companies now selling them
(in the States, Electronic Text Corp. comes to mind)
do well enough to subsidize (modestly, from royalties)
further reliable machine-readable editing.
 
[-30-]
=========================================================================
Date:         8 December 1987, 16:52:09 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Text encoding (63 lines)
 
Contributed by "Michael Sperberg-McQueen"   <U18189@UICVM>
 
Four quick observations on text encoding provoked by the recent barrage
of postings:
 
1 Jim Coombs is right to praise the better information content of
descriptive tagging, but still we should not require descriptive markup
for *all* texts.  Confronted with a printed book or a manuscript,
there will be cases where we don't *know* whether something is a
'chapter' or a 'section'--what we know objectively might be that there
is a page break followed by centered 14-point Baskerville saying XXXX,
followed by 28 points of white space, followed by text.  Everything
more is interpretation.  If we do have an interpretation, I'm in favor
of encoding it in descriptive markup.  But sometimes we won't and
won't want to.
 
The Carmina Burana manuscript is a classic example of this:  it has
been rebound and the gatherings re-arranged at least once, and different
parts of the manuscript (and different hands) may well reflect multiple
attempts to impose some (mutually incompatible) structure(s) on the
collection.  It would be sound practice to separate an editorial
judgment on the intended structure(s) of the manuscript from a
codicological description of the information that leads to that
editorial judgment.
 
The First Folio of Shakespeare, similarly, must be encoded with detailed
typographic information if it is to be used for textual criticism,
since the position of a word in the line, on the page, within the
gathering, and within the volume, are all relevant to judging the
authority of the word and its spelling.
 
2 Yes, the coming flood of machine-readable texts will overwhelm the
material we now have in the machine, but still we must make our peace
with (a) other (existing) markup schemes and (b) specifically
presentational and procedural markup schemes.  They will continue in
use at least for a while and we must provide migration paths into the
new scheme if we can.  And markup restricted to font, etc. may be a
useful first step in analysing any complex text, as dictionary work
by Amsler and by Raimund Drewek at Zurich seems to show.
 
3 No, people should not have to have one concordance program for every
encoding scheme in the world.  (That is the current situation, though.)
But many people do have large software systems built around specific
formats.  There is no need to cut them off, if we can develop one
scheme capable of representing texts in those special formats without
information loss.  Given N different encoding schemes, such a universal
scheme would reduce the translation problem from magnitude N * N to
magnitude 2 * N.  That, I believe, is a good reason to work for an
"interchange format," and a good reason to accept in the interchange
format whatever level of information is in the source.  (Specific
recommendations for minimal markup content shouldn't prevent this.)
 
Eventually, we can always hope software developers will see that they
might as well work directly with the interchange format rather than
engaging in preliminary translation.  But first we have to survive in
the existing world dominated by existing schemes and non-schemes.
 
4 The library community is interested in machine-readable cataloguing
data, and some of it also interested in collecting and cataloguing
machine-readable data.  But are they also interested in creating it?
If so, then yes we should surely cooperate with them.  But the only
useful basis for any cooperation is for each group to be clear on
its own point of view.  And that is what all this fuss is about.
 
Michael Sperberg-McQueen, University of Illinois at Chicago
=========================================================================
Date:         8 December 1987, 16:57:06 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      more archival politics (this one shd run and run)
 
Contributed by Lou Burnard  <LOU@VAX.OXFORD.AC.UK>
 
Disagreeing with Ian is not to be undertaken lightly. Nevertheless...
 
"I'm suspicious of any proposal to centralize computing
needs about one data-processing shop."
 
An archive is not a data processing shop.
 
"Hasn't centralized computing lost the war in most universities?
 
Well, actually, no, it hasn't - not at those where it's been recognised that
there's room for both private and public resources anyway. Some of us didnt
even know there was a war going on...
 
"The more people creating text archives, the better"
  Maybe we need a definition here. The more people creating *text resources*
the better, of course. But the more centres competing to archive and secure
those resources? I'm not so sure! How many libraries does your university need?
Ours has far too many - and when it started thinking about the problems of
integrating their various catalogues, it soon became apparent that no one
library could impose its will on the others. So, guess what, a consortium
emerged. A centralized quasi-official embodiment of the university's collective
desire to bang the librarians' heads together until they started squeaking
in tune.
 
I'm all in favour of competition and the American Way (I want to see
New York again too). But an archive has responsibilities which distinguish
it very sharply from data producers or consumers.
 
Recently, an organisation called the Knowledge Warehouse came into being
here in the UK. It was funded by a consortium of UK publishers as a private
company and also got a grant from the British Library. The idea was to set
up some sort of archival service for publishers typesetting tapes etc. The
scheme looked good on paper and had a lot of money behind it. But it doesnt
seem to have been successful. The consensus amongst those I've talked to was
that too few publishers wanted to play ball with an organisation which
they at least perceived as a competitor.
 
The moral I draw from this is that just as with books, there is a place for
bookshops and private collections and state-owned and maintained great
libraries, so is there a place for electronic text corpora and private
collections of texts as well as for great archives. But it's important to
distinguish them, because their roles and priorities are quite different.
I wouldnt put a bookseller in charge of a library - nor would I expect a
librarian to make much money in publishing.
 
Lou
 
 
=========================================================================
Date:         8 December 1987, 19:01:37 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Text Encoding
 
Contributed by "James H. Coombs" <JAZBO@BROWNVM>
 
In reply to Robert Amsler's of 8 December 1987, 09:38:34 EST
 
On closed vs. open tag sets, Robert concludes:
 
         This is how I see the problem of the selection of tags for text
         entities in documents.  That is, if the system is completely
         open and `productive' there will be little commonality between
         author's selections--whereas if the authors are offered a
         wide-range of approved tags to select from, then they will
         manage to find tags which meet their needs.
 
I would agree with Robert if I could bring myself to believe that we can
develop a tag set that will genuinely meet our needs.  The AAP tag set,
for example, does not provide a "poetry quotation" tag; and we can expect
scholars to realize that they can use a "list type" for poetry quotations
in order to meet the immediate needs of 1) tagging an entity and 2) getting
the entity formatted in a particular way.  To some extent, we also have
to say that this approach would meet many of the needs of descriptively
marking up a text (as long as the chosen list type is used only for poetry
quotations---with internal consistency).
 
Some of the advantages of descriptive markup are lost in such an
approach, however.  Above all, the choice of the tag is not intuitive;
both the original researcher and anyone using the text later will have
to perform extra work to determine what "list type 2" is used for.  I
don't want to go on about this too long here, so let me just appeal to
people's intuitions by saying that the tag "poetry quotation" has many
advantages over the tag "list type 2".  (None of these advantages are
computational however; once a programmer determines that he/she should
do X to "list type 2", the two tags have equal value.)
 
If we discount the sort of advantages that I am referring to (discussed
in our article---I'm not hedging), then we can solve the problem quite
easily: let's just have one tag with serialization.  In addition, we can
close the set by specifying upper and lower bounds.  So, all elements
will be tagged "<En>...</En>", where 'n' is some integer in the range of
0 to 4096 (or (2**32)-1)?).  I doubt that anyone will need to tag more
than (2**32)-1 entity types in a single document.  Then, people just need
to provide us with appropriate documentation, e.g.:
 
  I used the following tags:
  E0 for paragraphs
  E1 for poetry quotations
  .
  .
  .
  E2347 for passages that allude to Genesis
 
Of course, people will immediately say things like, "Let's all use <E0> for
paragraphs."  The many motivations that would cause such a response are the
same motivations that cause us to provide <p> for paragraphs in the first
instance.
 
Because we cannot predict all entities that people need to mark up, we tend
to throw our hands up in the air and say one of two things:
 
  1) Let's just fake it from here on out and provide several list types.
  2) We need to keep the tag set open.
  3) [another approach that I am missing??]
 
AAP has chosen approach (1) and then said go look at the standard if you
really need something else.  (The information for developing
AAP-compliant documents with user-defined tags is not provided in the
authors' documentation.)  The deficiencies of approach (1) should be
immediately clear to us when someone like the AAP ignores something so
basic (to humanists) as poetry quotations.  Moreover, if I am really
analyzing a document, I will quickly run out of AAP list types.  (And I
don't think that I could twist things quite so far as to use a list type
for my <E2347> anyway.)
 
Who is capable of providing a closed tag set that addresses these problems?
Yes, the <En> approach addresses them to some extent, but then what have
we gained over <allusion text='GENESIS'>?
 
Ok, so perhaps we remember to provide a tag for allusions.  But what tags
will we provide for post-structualist critics?  For the next major critical
theory?
 
I agree that we should provide "a wide range of approved tags to select
from," but I think it even more important to ensure that documents are marked
up descriptively.
 
(I recognize that I am close to equivocating in my use of "description."
I am not fully satisfied with the functional definition that we offer in
our article.  Renear and I are working on this, and it gets complicated
quickly.  Basically, however, I want to say that <allusion> is
descriptive in some way that <E2347> is not.)
 
And, a posting just arrived from Michael Sperberg-McQueen, who argues
that descriptive markup is not always appropriate.  I suspect that
Michael is saying that we sometimes need to describe the manuscript
instead of the abstract text; in which case, we still want descriptive
markup (i.e, we don't want Waterloo SCRIPT font instructions; we want
something that says that X is/was in F font).  In any case, I can hedge
and conclude: insofar as a text is susceptible to description, it
should be marked up descriptively and, further, that tag sets should be
1) open and 2) descriptive in this more intuitive sense of descriptive
that favors <allusion> over <E2347>.
 
--Jim
 
Dr. James H. Coombs
Software Engineer, Research
Institute for Research in Information and Scholarship (IRIS)
Brown University
jazbo@brownvm.bitnet
=========================================================================
Date:         8 December 1987, 19:06:03 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      North American Archive(s) Issue
 
Contributed by amsler@flash.bellcore.com (Robert Amsler)
 
    How about a Humanities Archive Network (HumAN)
 
I think we have an opportunity to do something considerably greater
than the Oxford Archive and in fact an obligation to do this because
of the state of networking available in the USA. What I'd propose is
a collection of sites across the country ALL offering to host the
archive or provide access to its data via their computing facilities.
We should be thinking of downloading of information electronically as
the PRIMARY means of distribution of archive data, with only rare
recourse to writing the information out onto magnetic media as a
dissemination method.
 
The model I have in mind is based upon that used for the ARPANET's
Network Information Center (NIC), which maintains a list of software
and personnel at all the sites it serves. One can access this
database via connecting to it from anywhere on the network, and
determine where the data you want is located, and set about its
retrieval by either anonymous remote login and file-transfer-protocol
(FTP) downloading of the data; or finding out who to contact as the
holding institution's network liaison.
 
So... data would be distributed around the country as suited the
individual institutional  member's computing  facilities.  Some
institutions might opt to have copies of everything; others to
themselves store nothing, but instead to keep texts they created on
equipment elsewhere. Each member institution would have a designated
liaison who maintained contact with the central information resource
center which itself kept a complete database of what was available
where, both in terms of data and computing facilities (not unlike a
list of libraries, their holdings and research facilities) and also
of researchers and their interests and how to reach them
electronically. This part of the Humanities Archive Network would
require funding, as well as the creation of the HumAN itself--though
this is becoming easier and easier as more and more research
communities take to setting up their own networks. I would think the
NEH ought to find such a proposal well justified in terms of the
potential multiplier effect it would have upon the entire field of
(computational) research in scholarship.
 
Robert A. Amsler
Bellcore
Morristown, NJ 07960
 
 
=========================================================================
Date:         8 December 1987, 19:12:29 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      The Humanities Computing Yearbook (53 lines)
 
Contributed by Willard McCarty (in this case as YEARBOOK@UTOREPAS)
 
Dear Colleagues:
 
As some of you will know, I am gathering information about interesting
and worthy software for a new serial, the Humanities Computing Yearbook,
to be published by Oxford U.P. The announcement for the Yearbook
follows. Please send your recommendations to me, c/o
yearbook@utorepas.bitnet.
Thanks very much for your help.
--------------------------------------------------------------------------
 
                  The Humanities Computing Yearbook
 
On behalf of Oxford University Press, the publishers, the Centre
for Computing in the Humanities is pleased to announce a new
periodical, The Humanities Computing Yearbook. Ian Lancashire and
Willard McCarty are the co-editors. An editorial board is in
process of being set up.
 
The first volume, scheduled for publication in the summer of
1988, aims to give a comprehensive guide to publications,
software, and specialized hardware organized by subject or area
of application. Research and instructional work in many fields
will be covered: ancient and modern languages and literatures,
linguistics, history, philosophy, fine art, archaeology, and areas of
computational linguistics affecting text-based disciplines in the
humanities. The more notable software packages will be described
in some detail.
 
We welcome your suggestions of what we should consider. We are
especially interested in discovering innovative software that may
not be widely known, including working prototypes of systems in
development.
 
Electronic correspondence should be sent to
YEARBOOK@UTOREPAS.BITNET, conventional mail to the Editors, The
Humanities Computing Yearbook, Centre for Computing in the
Humanities, Univ. of Toronto, 14th floor, Robarts Library, 130
St. George Street, Toronto, Canada M5S 1A5. Our telephone number
is (416) 978-4238. Please feel free to distribute this notice.
 
Ian Lancashire
Willard McCarty
=========================================================================
Date:         8 December 1987, 22:00:06 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Text Encoding: salvaging texts (addendum)
 
Contributed by  "James H. Coombs" <JAZBO@BROWNVM>
 
Oops.  I should have added that I am not saying that people should throw
away everything that does not accord with the standard.  I am saying
that the standard should not try to accommodate inadequate texts.  I
like (my interpretation of) what Lou Burnard says about them
(implicitly?): they are "rubbish."  Well, ok, so we may be better off
recycling many of them instead of just throwing them out, but let's say
that they are in the recycle bin and not that they are in the approved
bin.
 
--Jim
 
Dr. James H. Coombs
Software Engineer, Research
Institute for Research in Information and Scholarship (IRIS)
Brown University
jazbo@brownvm.bitnet
=========================================================================
Date:         9 December 1987, 09:03:00 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Text Encoding: salvaging texts
 
Contributed by "James H. Coombs" <JAZBO@BROWNVM>
 
Robert Amsler corrects my perhaps overly vigorous condemnation of texts
that have been marked up procedurally.  I have no intention of entering
the American Heritage Dictionary from scratch or even of working with a
version that has all markup stripped away.  Such markup can help one
considerably in the process of deriving a descriptively marked up
version.  (Just to clarify, I AM working with the AHD.)
 
I don't feel the same way about *Paradise Lost*, however.  Perhaps I am
being overly vigorous again, but I would rather enter that relatively
tiny (compared to the AHD) and simple document myself than spend the same
time negotiating for a tape, getting it loaded on the mainframe, learning
the markup system, writing the programs to convert it to descriptive
markup, etc.
 
So, first point, dictionaries are unusually large and complicated.
Poems, even long poems, imbue one with the poetic experience even when
the task is as mindless as keyboarding (but they better be good poems
too!).  We have a continuum, and we have all of that old philosophical
stuff about points at which one would just prefer to enter and proof
read than to negotiate, acquire, interpret, program, etc.
 
Second and final point, my concern was with the value of a metalanguage.
Correct me if I am wrong, but the fact that I would rather convert the
AHD than enter and proof read it has nothing to do with our ability to
develop a metalanguage that will generate(JHC)/parse(RA) both the
procedural and the descriptive markup.  Perhaps one CAN develop a
context-sensitive grammar that will enable one to uniquely identify
every element type in the AHD.  I don't know anyone who believes that
they can develop that grammar more quickly than they can perform partial
conversion automatically and then finish up by hand.  If it's that
difficult to generate the context-sensitive grammer, won't it be much
more difficult to generate a metalanguage?
 
Now, if a single grammar will work for many dictionaries (and we actually
have the need to convert many dictionaries), then it may be justified to
develop the grammar.  Is this what you are working on, Robert?
 
My goal was (and remains) to discourage what seems to me to be a
quixotic pursuit: the development of a metalanguage that will
generate(JHC)/parse(RA) all forms of markup for all documents.  The fact
that one may be better off with procedural markup than without it in
some/many cases does not address my claim that such a metalanguage is
impossible or even my weaker claim that even if it is impossible, it's
not worth the effort (again, what's the gain?).
 
--Jim
 
Dr. James H. Coombs
Software Engineer, Research
Institute for Research in Information and Scholarship (IRIS)
Brown University
jazbo@brownvm.bitnet
=========================================================================
Date:         9 December 1987, 09:04:34 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      archives, mark-up and money (75 lines)
 
Contributed by Phillipa Mathieson <AMPHORAS at UTOREPAS>
 
Interesting how HUMANIST discussions on standards for text encoding,
making publishers aware of the need for electronic texts, assessing
copyrights for such texts, establishing text archives, and programs
for text searching and retrieval all seem to come together.
 
It almost sounds as if we were all aiming for the same thing:  texts
on-line in machine-readable format with the software to manipulate
them, available to all who want them.  The main question seems to be
"whose money are we going to use to achieve this?"
 
Ian Lancashire's analogy of buying a car from commercial dealers, and
the acceptance by other HUMANISTS from time to time of "intellectual
copyright," and of the restrictive mechanisms needed to insure
financial returns for the owners of that copyright, alarm me.  I see
no reason why academic grant money intended for humanist research
should not be spent on laying down guidelines for the encoding of
texts and the software to read them, and for distributing the results.
And I agree with Lou Burnard that "the community gets the archive it
deserves."  If we aren't enough of a community, or interested enough
in the disinterested rewards of scholarship, to share our work with
others without demanding additional financial rewards (in most cases,
over and above those already granted us by salaried positions in
academic institutions or as staff members of publically-funded
educational projects), we don't deserve either the positions or the
use of on-line materials.
 
I recently discussed with a Toronto software firm my need to use a
database program (Empress32 from Rhodnius) on a second computer with
slightly different architecture from the first.  We had already bought
a licence to use the program on one, and we wished to use the same
program for the same project on another.  Their attitude was that a
firm which expands and buys a new machine must pay for a second
licence for the new machine.  I think they saw this as a kind of tax
on the profits of the firm which their software was assumed to have
contributed to.  When I said we had no profits, the salesman kindly
tried to explain to me that they had to protect their copyright in the
program by charging individual licences for individual machines: "If
you wrote an article and someone else used it as the basis for his own
work, without acknowledgement, and made a great success of it, you'd
sue the balls off him."
 
This kind of commercial attitude has no place in humanist scholarship,
and putting the development of archives and their software on a
commercial basis will simply cheapen (in the sense of "lowering the
quality"--certainly not in the financial sense) and restrict humanist
activites.
 
It is good to have an organized group establishing guidlelines for
text mark-up and doing so in an open forum.  It would be bad to have a
commercially-based central text archive system which discouraged
individual scholars from making available their work by maintaining an
arcane set of instructions for mark-up, which only "they" really knew
how to insert so that the standard software programs could use it.
Michael Sperberg-McQueen's reservations about the software-development
function of a central archive system are a good sign:  setting up
a central archive system seems to me likely to lead to the development
of software for the specifications of that archive, and if you add the
commercial competition angle, we'll all end up paying through the nose
for the software *and* the texts, and running round nervously trying
to comply with restrictive copyright requirements for texts long since
free from their original publication copyright restrictions.  At which
point, it will again become easier to type it in yourself, and the
idea of a community of scholars sharing their work will bite the dust
yet again.
 
=========================================================================
Date:         9 December 1987, 09:14:37 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      What new information in texts of the Oxford Archive? (33 lines)
 
Contributed by Lou Burnard <ARCHIVE@VAX.OXFORD.AC.UK>
 
The Text Archive gets a fair amount of criticism for not providing more
information about the texts in the catalogue ('fair' meaning both "a modest
quantity" and "justifiable"). As I am now embarking on a major overhaul
consequent on a local change of mainframe, I'd like to start trying a bit
harder to rectify this situation. Humanists and others who have a view can
help by making some suggestions about what information they think ought
(minimally) to be provided in the catalogue. I should stress that I dont
have the resources to do a proper cataloguing job - not yet anyway. But some
things that could be added to the current shortlist are
1. more bibliographic info (e.g. date of first publication/composition,
	genre etc)
2. some sort of code for level/type of markup
3. some sort of code indicating completeness, accuracy, level of verification
4. (probably not in the catalogue, but generated for each text) text profile
  i.e. everything that a program I havent written yet can deduce automatically
  about the text - size in records, tags used, character usage profile etc.
 
Comments? Preferences? Concrete suggestions?
 
Lou
=========================================================================
Date:         9 December 1987, 12:46:41 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Archives and encoding (51 lines)
 
Contributed by Richard Giordano <RICH@PUCC>
 
From what I've been reading, four issues seems to be
 
    - if there is to be an machine-readable archive, where should
           it be?
    - who will pay for it?
    - what constitutes a coding standard?
 
Michael Sperberg-McQueen also includes the questions, who is going to do
the conversion, as well as the coding?
 
The library community certainly will not get involved in conversion efforts.
But you can be sure that the institutional structures already exist within
the library community to both establish and maintain a data archive of
machine readable text.  They're in the business of collecting and making
available information to users, and I think the best of them do a great
job at it.  Anyway, you can be certain that sooner or later--and probably
sooner--the American Library Association is going to take up the issue.
And when it does, the first thing that will come up is the establishment
of a standard interchange format--much the same way that cataloging and
other data is exchanged throughout the world in a standard MARC format.
 
As for the Libraries point of view: nothing more and nothing less than
to (1) preserve information;
   (2) index and describe the information so that users can easily get
         to the source information, as well as having an idea of what
         the information is about;
   (3) making the information available to users.
 
There might be more to it than this, but I think this pretty-much covers it.
 
It seems so obvious to me that the institutional structure exists, as well
as the expertise, to establish a national archive of machine-readable
texts, as well as assistance in generating a standard communications
format.  Libraries can also be of use in helping to establish practices
by which text itself is indexed (since the indexing and retrieval of
information for untrained users is at the heart of every librarian's
professional education).  Libraries, however, are not in the position
to convert the sources into machine-readable form.
 
Richard Giordano
=========================================================================
Date:         9 December 1987, 14:41:51 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Salvaging Texts (20 lines)
 
Contributed by Mark Olsen <ATMKO@ASUACAD>
 
Funny that James Coombs should mention *Paradise Lost* since I am
currently going through the process of pulling off of a tape and
formatting it for my purposes.  I think that he seriously over-
estimates the effort required to use existing text data and under-
estimates the effort required to scan and correct even a simple
text.  The materials stored at Oxford, Packard and ARTFL in any
condition can be corrected, coded and formatted much faster than
starting from hardcopy.
                               Mark
=========================================================================
Date:         9 December 1987, 19:05:35 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      A national archive for the U.S. (97 lines)
 
Contributed by Jack Abercrombie <JACKA@PENNDRLS>
 
 
 
We have been following with much interest the discussion on
establishing a national archive center similar in some respects
to the Oxford Archive.
 
Many of you are not aware that some four years ago the staff of
the Center for Computer Analysis of Texts (CCAT)  submitted a
working proposal to the  National Endowment for the Humanities
advocating the establishment of a US national center for textual studies
(including archive).  From the comments then received
as the proposal was circulated by NEH as well as other comments
received at the Grenell conference (1985) where our draft proposal
was discussed by fifteen representatives from national and
international centers, the following conclusions seemed accurate.
First, a national center at that time did not have a "snowball's
prayer in hell" of coming into being given the general lack of
collaboration and cooperation amongst US institutions and their
faculties on this very issue.  Second, regional centers, an idea
originally proposed to us by the late Art Hanson from Princeton
University (1982), would be a better approach in that a
regional/discipline specific center can concentrate on a few tasks
and do them well with limited resources.  With this in mind, we
established the Center for Computer Analysis of Texts (1984). CCAT
has focused on three specific areas both for internal and external
users: building an accessible, unified archive for biblical and
ancient texts, providing scanning services to colleagues just
above cost, and assisting colleagues through consultation,
information dessimination and software development.  We realized
then that sister institutions would start similar centers that
might, but probably wouldn't seriously, overlap our own goals.
 
In the spring of 1986 at the University of Toronto, we again
proposed to the representatives of existing and potential centers
(in US, Canada and England) that we should share information on
our archival holdings, and that we should coordinate more fully our
efforts to add texts to our archives as well as in software
development.  Of the six centers represented at that  meeting,
there seemed to be general agreement that it was a good idea to
try to federate our efforts to avoid duplication and to cut
costs in supporting international, accessible archives.  Again
we proposed to seek funding to make this a reality as well as to
solve some other minor problems within the proposed consortium.
Our hastily prepared proposal to begin implementing these ideas
was submitted to NEH and severely criticized by some reviewers.
We accepted the reality here, and have proceeded to work with other
equally concerned institutions to make them aware of our
archival holdings and to keep them informed on the projects taking
place at CCAT (e.g. CD-ROM Project).
 
This chronicle of frustration and also hope, we think, is
instructive,  because it points out that the ideal (that is, a
national archive or even a federated sytem of archives) may not
be realistic given the number and nature of the relevant
participants.  The reality, regional and discipline  specialized
centers, continues to grow in many positive ways.  Unfortunately
from our perspective, we would like to see more coordination than
is possible as long as we work within the blinders of discipline,
university, nation, etc.
 
At the very least, centers should be sharing, as some already do,
information on their archival holdings and additions to their
archives whether by acquisition or data entry.  (NOTE: To obtain
information on CCAT's accessible archive request information
sheet from CCAT, Box 36 College Hall, Philadelphia, PA 19104.)
Centers should also foster new ways for cooperation and
collaborations.  Towards this end, the Center for Computer
Analysis of Texts in coordination with Computer Assisted Research
Group (CARG) of the Society of Biblical Literature has begun an
ambitious project to prepare an archival list of biblical and
other material deemed relevant to CARG members.  A first step will
be to build an archival list along the lines of the information
submitted by CCAT to the Rutgers Inventory Project.  The second
step will be acquiring copies of the texts not in CCAT's archives
and placing that information in the same, consistent format (that
is, the present format or a future format as is being discussed)
of all the other material in CCAT's accessible archive.
 
 
Prepared by John R. Abercrombie (Assistant Dean for Computing and
Director of the Center for Computer Analysis of Texts) with
cooperation from Robert Kraft (Coordinator of External Affairs CCAT
and Director of CARG)
 
 
 
=========================================================================
Date:         9 December 1987, 19:29:44 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Correction (re: Yaacov Choueka's affiliation)
 
Contributed by "Michael Sperberg-McQueen"   <U18189@UICVM>
 
In the posting about the Vassar conference for planning the basic
structure of the ACH/ACL/ALLC text encoding guidelines, Yaacov
Choueka's affiliation was wrongly given.  It should read:
 
    Institute for Information Retrieval and Computational
    Linguistics, and Department of Mathematics and Computer
    Science, Bar=Ilan University
 
I apologize for the error.
=========================================================================
Date:         9 December 1987, 19:30:58 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Warning about another Christmas virus
 
Contributed by "Michael Sperberg-McQueen"   <U18189@UICVM>
 
We've already had several score, by now probably a few hundred
copies of this turning up here; it may reach you next.
 
If you are at a CMS site and receive a program called CHRISTMA EXEC,
please (a) warn your postmaster and (b) discard the exec (or keep
a copy for the postmaster to look at, but DO NOT RUN IT).  This
exec paints a Christmas tree on your screen and then sends itself
to everyone named in either your NAMES or NETLOG files.  The result
is potentially serious stress on Bitnet and on your local spool
system, and possibly a few system crashes here and there as the
number of reader files soars and exceeds the maximum.  The
Christmas tree isn't all that pretty, and the joke is pretty mean.
 
A word to the wise.  Your postmaster will thank you.
 
Michael Sperberg-McQueen, UIC
 
=========================================================================
Date:         9 December 1987, 19:35:31 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      archives-coding-texts
 
Contributed by Bob Kraft <KRAFT@PENNDRLN>
 
I really must finish up the task of insuring consistent
ID coding for the dozens of texts on the forthcoming CCAT-PHI
CD-ROM, or I would plunge in at length on the current Humanist
discussions. Meanwhile, I will take a minute to UNDERSCORE the
comments of Mark Olsen. It is hard for me to conceive of a
situation in which it would be more efficient to rekey or to
scan anew a text already extant in some electronically readable form.
I also have *Paradise Lost* in the CCAT Archive, and formatted it
into TLG Beta Code ID form last week, checking the results against
a library edition. It probably took me about an hour, including
making sure that every line began in upper case and that the
"paragraph" type breaks in the poetry were indicated. This text
will be on the CD-ROM and is available on IBM diskette to anyone
who would like it for $25 (CCAT minimum charge) and who agrees
(by signing the CCAT Users Contract) to use it non-commercially
and responsibly. One person's "rubbish" is another's treasure.
Some of the happiest hours of my weekend life have been spent in
junkyards. Incidentally, CCAT texts come with a "convert" program
to permit the user to change the file so that explicit book/line
locators are inserted at the left margin of each line. This type
of software development permits us to be consistent and frugal
about coding the IDs without inconveniencing the user who might
otherwise be mystified by the implicit nature of the ID system.
To leave that task in the hands of others made no sense to us.
We will handle "SGML-type" markup requests similarly, for existing
textual materials.
 
If people want concrete information about the issues raised in
the current HUMANIST discussions, just ask. Few of the issues are
hypothetical, at least to those of us already engaged in archiving,
(re)coding, formatting, and distributing -- not to mention searching
for funding and other types of support!
 
Bob Kraft
=========================================================================
Date:         9 December 1987, 21:31:31 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Rage for chaos, or, in praise of polymorphic encoding (50 lines)
 
Contributed by Sebastian Rahtz <CMI011@IBM.SOUTHAMPTON.AC.UK>
 
I have just been reprieved from the gallows! I had approx 110 HUMANIST
messages from the last couple of weeks in my mailbox which I hadnt
really read, and I had been planning to print out the whole lot
and read it at home tonight. Due to a combination of
unfortunate circumstances (daily backup, me reading my mail etc),
I have now lost the whole damned lot!
I feel so relieved! Can I put in my trivial penyworth, tho?
 
Lets face it, I dislike SGML so much is because its UGLY. But however
important it all is, could those who care about text archives gather in
a corner away from HUMANIST for a while? i was under the impression that
there was a conference about it recently, so is there a need for the
same people to discuss it in public..... it all reminds me of archaeology.
Some years ago field archaeologists in Britain used to bicker at every
opportunity about standardisation of recording methods, and all the same
arguments were trotted out every time. No-one ever agreed, various people
said they would set up global answers, and even now there remain a multiplicity
of schemes. Why did it all fail? Because the problem was really that people
didnt know why they were collecting the data in the first place....
I for one no longer believe in absolute recording; I believe that each
excavation record, or each encodedtext, is a reflection of its creator, not the
real world.
 
But i apologize for dipping my toe in the text-encoding water;
I vote for chaos, though, when the chips are down. Why? Because
I used to be an archaeologist, and therefore I am interested in
historical processes not in fossilisation. In the same way that I
would have anyone who wanted walk all over Stonehenge because 20th C
destruction of monuments is itself archaeology, so I wouldn't shed many tears
if Lou Burnard's archive went up in flames (sorry, Lou), because
the variety of texts lost is itself interesting (would we compare
the loss of Lou's tapes to the destruction of Alexandria?).
 
People who try and impose 'standards' on the world are basically
misguided--variety is the spice of life. Sorry, is there some NEED
to analyse all texts in the world NOW that I am not aware of? And there
was I thinking scholarship was only a joke.....
 
sebastian rahtz, computer science, southampton, UK
=========================================================================
Date:         9 December 1987, 21:50:33 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      More Markup etc. (62 lines)
 
Contributed by amsler@flash.bellcore.com (Robert Amsler)
 
I must admit I do have many reservations about the feasibility of
coming up with a universal metalanguage for all markup schemes. I
think this is first an empirical problem though, not a theoretical
one. We need to know what markup systems are in use and how much
text is/could be available in these systems. That will determine how
much effort should be made to accomodate their markup system in any
future standard.  The exact trade-off point between developing a
parser to read text marked up in an inadequate markup language and
then adding `useful' markup vs. starting over and typing the text in
with the `right' markup is a hard one to specify.  Dictionaries are
on the `use any machine-readable copy' side by a massive amount (i.e.
probably the data entry effort is ten or a hundred times the effort
of the `figure out how to use what they have' effort).
 
However there is still another issue here, the likelihood that anyone
else will want to markup the text in a manner that you would find
completely satisfying. There strikes me as a large range of
variations in descriptive markup from noting simple text units to
noting full interpretive tagging of historical and symbolic meaning
`believed' to be associated with certain parts of a text. The
inference I get from James Coombs side is that there is somehow an
easily understood common agreement as to what should be marked in a
text. I am not certain I agree with that when one leaves the domain
of markup which recreates the visible form of the original document
and enters the interpretive tagging area. In fact, I would define
`inadequate' markup as markup from which one cannot recreate the
original form of the document--regardless of whether it is
descriptive or procedural markup.
 
I've be concerned about is that one cannot tag a text with all the
descriptive markup that everyone might want to be there. Could
anyone imagine a historic text being published with ALL the
commentary upon its meaning being interspersed in the text? We'd
have to have tags with authors names on them and maybe even dates.
 
I think perhaps what is needed is a means of integrating interpretive
tags with a rather sparsely marked up version of a document. That
is, having a tag set which is stored independently of the text to
which it refers and which can readily be sorted into the linear
sequence of the document as desired. In fact I even imagine a
futuristic world in which a scholar can distribute ONLY their tag
set for a well-known work, such that the recipients can study it
with their copy of the original text on a variety of software and
hardware systems. Some might simply elect to have the `annotated'
text printed out on paper for study--others to have it loaded into a
hypertext system for interactive reading on-line.
 
 
Robert Amsler
Bellcore
Morristown, NJ
=========================================================================
Date:         9 December 1987, 23:06:31 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Heisenbergian mark-up (46 lines)
 
Contributed by Willard McCarty <mccarty@utorepas.binet>
 
Here's a brief and probably one-sided observation about textual
mark-up, offered by someone interested chiefly in the themes and
images of literary texts rather than in their syntactic
structures or physical features.
 
Regardless of the medium, when I mark up a text for
interpretation I am doing something like reading it, that is,
taking it in, attaching to its words things I know, discover, or
think about, and preserving all that along with the original
text. I want to mark-up my own text because (a) marking-up in my
sense is primarily an intellectual, not a mechanical activity,
and (b) it is utterly dependent on some hypothetical construct I
have or am developing. (Building this construct may owe a debt to
things that can be counted, hence "objectively" tagged, but the
construct cannot be verified by relating it to countable things.)
At the same time I must always keep a clear distinction between
the words as the author or editor has given them, and if I'm
doing this electronically with proper software, I have the
liberty of erasing easily the remnants of interpretation I no
longer respect.
 
Note that I am not making a distinction here between an
"objective" text and "subjective" commentary; that distinction
misses the point of literary criticism altogether.
 
So, I don't want anybody's scheme for marking up (in my sense),
and I don't expect my marked up text to be of interest to anybody
either. Nevertheless, if I'm successful, the final result (an
essay or book) will say something valuable to others.
 
Can it be said that there are aspects of textual mark-up that do
not have to take interpretation into account at all? Sebastian
Rahtz has suggested that there aren't.
 
Willard McCarty, Univ. of Toronto (mccarty@utorepas.bitnet)
=========================================================================
Date:         10 December 1987, 09:15:25 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Copyright-free Texts Wanted (130 lines)
 
Contributed by amsler@flash.bellcore.com (Robert Amsler
 
 
One project I and some others at Bellcore are interested in is an effort
to integrate a dictionary with citations to texts with these texts.
The OED is the dictionary we have in mind, though I am also working with
the Century Dictionary (not yet in machine-readable form) and other
dictionaries such as the Collins English Dictionary, the Merriam-Webster
Seventh Collegiate Dictionary and the Oxford Advanced Learners Dictionary.
 
`Integrate' here means to provide access to the complete textual work from
the dictionary definitions and visa versa, to provide access to the definitions
from within the textual work.
 
This is being envisioned as a form of hypertext access.
 
The primary requirement is to obtain textual works which are cited in
these dictionaries--which basically means most classical works in English.
 
Appended to this message is a lit of the most frequently cited authors and
works from the OED (compiled by actually searching the OED database thanks
to Frank Tompa' help).
 
29140 citations - Shakespeare
    1311 citations - Hamlet (1600-1)
    1034 citations - Love's Labour's Lost (1594-5)
     906 citations - 2 Henry IV (1590-91)
     877 citations - Merchant of Venice (1596-7)
     874 citations - King Lear (1605-6)
     868 citations - The Tempest (1611-12)
     865 citations - Romeo and Juliet (1594-5)
     862 citations - 1 Henry IV (1597-8)
     846 citations - Macbeth (1605-6)
     841 citations - Henry V (1598-9)
     834 citations - Othello (1604-5)
     821 citations - Merry Wives of Windsor (1599-1600)
     801 citations - Midsummer Night's Dream (1595-6)
     794 citations - King John (1596-7)
     779 citations - Richard III (1592-3)
     778 citations - Troilus and Cressida (1601-2)
     775 citations - As You Like It (1599-1600)
     705 citations - Measure for Measure (1604-5)
15499 citations - Scott, Sir Walter
     890 citations - The Heart of Midlothian (1817) [Novel]
     880 citations - The Fair Maid of Perth (1828) [Novel]
     694 citations - Guy Mannering (1815) [Novel]
     644 citations - The Antiquary (1816) [Novel]
     616 citations - Kenilworth (1821) [Novel]
     599 citations - Lady of the Lake (1810) [Poem]
     592 citations - Waverley (1814) [Novel]
     543 citations - Rob Roy (1817) [Novel]
     532 citations - Old Mortality (1816) [Novel]
     490 citations - Marmion (1808) [Poem]
     474 citations - The Monastery (1820) [Novel]
     428 citations - Ivanhoe (1820) [Novel]
     405 citations - Quentin Durward (1823) [Novel]
     344 citations - Lord of the Isles (1815) [Novel]
     328 citations - Woodstock (1826) [Novel]
11967 citations - Milton, John
    4945 citations - Paradise Lost
     648 citations - Samson Agonistes (1671) [Poem]
     640 citations - Paradise Regained (1671)
     625 citations - Comus (1634) [Poem]
             (A Maske presented at Ludlow Castle 1634:
             on Michaelmasse night etc.)
11000 citations - Chaucer
    1238 citations - Troylus (Troilus ? and Criseyde) (1382?)
             [8200 line poem]
     986 citations - (Translation of Boeth(ius)'s
             ``Consolation of Philosophy'') (1380?) [Prose]
     877 citations - The Legend of Good Women (1382)
     663 citations - Prologue (to The Legend of Good Women)
     549 citations - The Knight's Tale
     506 citations - The House of Fame
10759 citations - Wyclif
    1166 citations - Selected Works
    1072 citations - Works
     713 citations - Sermons
     474 citations - Genesis
     420 citations - Isa
     413 citations - Matt
     315 citations - Ecclus
     306 citations - Ps
     278 citations - Luke
     265 citations - Prov
 9554 citations - Caxton
    1282 citations - The Golden Legend (1483)
     718 citations - The Foure Sonnes of Aymon (1489?)
     668 citations - The boke yf (= of) Eneydos'' (1490)
     639 citations - The chronicles of englond (1480)
     610 citations - The historie of Jason (1477)
     457 citations - Geoffroi de la Tour l'Andri
             (the knyght of the toure) (1483)
     399 citations - The historye of reynard the foxe (1481)
     399 citations - The book of fayttes of armes and of chyualrye (1483)
 8745 citations - Dryden
11041 citations - <anononymous>
       11041 citations - Cursor Mundi
 5385 citations - <an
    5385 citations - Promptorium Parvulorum sive clericorum,
             lexicon. Anglo-Latinum princeps (1440) [Dictionary]
 4682 citations -
    4682 citations - Aeneis
 3090 citations - Homer
    3090 citations - Iliad
 5161 citations - Langland (or Langley), William
    5161 citations - Piers (the) Plowman,
             (The Vision of William Concerning)
             (written 1399?; printed 1550)
      citations - Southey, Robert
     428 citations - Letters (1856)
     342 citations - Joan of Arc (1796)
     319 citations - Thalaba the Destroyer (1801)
     277 citations - The Doctor (1834-47)
     232 citations - Roderick, The Last of the Goths (1814)
 5133 citations - Chambers, Ephraim
    3531 citations - Cyclopedia; or, an universal dictionary
             of arts and sciences (1728)
    1476 citations - Supplement -- Cyclopedia; or, an universal
             dictionary of arts and sciences (1753)
 
=========================================================================
Date:         10 December 1987, 10:46:16 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Two confusions about text-markup standards (30 lines)
 
From  "Michael Sperberg-McQueen"   <U18189@UICVM>
 
Sebastian Rahtz says:
   People who try and impose 'standards' on the world are basically
   misguided--variety is the spice of life.
 
This may be true, but I should point out that no one on HUMANIST is
trying to impose standards or anything else on anyone.  As if anyone
could!  And if anyone must rely on the differences between TLG and SGML
methods for encoding chapter headings to give spice to intellectual
life, humanities computing is in even deeper trouble that I thought.
 
Willard McCarty says:
  So, I don't want anybody's scheme for marking up [...] and I don't
  expect my marked up text to be of interest to anybody either.  [...]
  At the same time I must always keep a clear distinction between
  the words as the author or editor has given them, and if I'm
  doing this electronically with proper software, I have the
  liberty of erasing easily the remnants of interpretation I no
  longer respect.
 
Would a conventional set of markup rules restrict one's freedom more
than the conventional alphabets and syntax we already use?  But the
crucial point is that the "proper software" you describe cannot do its
work without *some* encoding scheme.  We have the choice of all of us
developing software independently, so as to ensure that we use different
schemes and make certain that once you have marked up your text with
your software you cannot concord it with my software, and vice versa, or
we can try to find a framework that allows sharing and flexibility.
It is not standardization but chaos that produces rigidity and destroys
freedom.
 
-Michael Sperberg-McQueen
 University of Illinois at Chicago
=========================================================================
Date:         10 December 1987, 12:23:15 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      "electronic paralanguage" (100 lines)
 
From Dr Abigail Ann Young      1-416-585-4504       YOUNG    at UTOREPAS
 
 
This notice appeared in IRList, and I found it sufficiently
interesting to pass along to HUMANIST.  I apologise in advance
to those who will be getting it twice!
**********************************************
Date:     Sun,  6-DEC-1987 12:38 EST
<contributor>     Janet F. Asteroff <ASTEROFF@CUTCV1>
Subject:  Computer-mediated Communication
 
 
 . . .
 
I recently completed my dissertation on paralanguage in electronic
mail, the abstract of which is appended to this posting. I found,
among the 16 people I studied, many forms of "extra expression" in
the form of "paralanguage." Ultimately, I documented enough
differences between writing on the computer and writing through other
media to identify it as "electronic paralanguage" with its own formal
definition.
 
Many people believe that face-to-face communication is the richest
form of communication because of the variety of signals and channels,
and as well the potential for channel redundancy. I have no problem
with this assumption. I do, however, take issue with comparing other
forms of communication to face-to-face and then judging any other
medium as "information poor." Some scholars of computer-mediated
communication carry this negative frame of reference over to their
own work. While the computer may not provide as many channels as
face-to-face communication, and the channel itself may be somewhat
more limited, there is considerable research to indicate that
computer users have done some interesting things to convey their
meaning and message.
 
Since I am not a fan of clogging up boards with long messages,
anyone interested in my work can contact me directly at
Asteroff@cutcv1.bitnet and I will be happy to send you more material.
 
The dissertation is also available through University Microfilms:
 
Janet F. Asteroff, "Paralanguage in Electronic Mail: A Case Study."
Teachers College, Columbia University, May, 1987.
 
 
/Janet
(Asteroff@cutcv1.bitnet)
 
 
                               ABSTRACT
 
 
                   PARALANGUAGE IN ELECTRONIC MAIL:
 
                             A CASE STUDY
 
 
                          Janet F. Asteroff
 
 
This study explores the use of paralanguage in electronic mail
communication. It examines the use of paralanguage according to the
electronic mail and computing experience and technical expertise of
16 library science graduate students who fall into two groups by rank
of experience, novice and advanced.  These respondents used
electronic mail in a non-elective and task-related situation to
communicate with their instructor. This case study is based on a
multi-level qualitative content analysis of the electronic mail
exchanged between the respondents and the instructor, and the
attitudes and experiences of the respondents about their use of
electronic mail and computers.  This research interprets the roles
and functions of paralanguage in computer-mediated communication and
explores the phenomenon as an indicator of certain kinds of
expression.
 
Paralanguage is a component of spoken, written, and electronic
communication.  It gives to what is being communicated a character
over and above that which is necessary to convey meaning in the
linguistic or grammatical sense.  Paralanguage in electronic mail is
positioned between spoken and written paralanguage in its visual and
interpretive structures. Electronic paralanguage, a term developed to
describe paralanguage in computer-mediated communication, is defined
as: features of written language which are used outside of formal
grammar and syntax, and other features related to but not part of
written language, which through varieties of visual and interpretive
contrast provide additional, enhanced, redundant or new meanings to
the message.
 
Electronic paralanguage is revealed to be a component of
communication which in some situations showed substantial differences
by the rank of the respondent, as well as differences in individual
behaviors. Novice respondents used more paralanguage in more types of
messages than did advanced respondents.  Electronic paralanguage also
provides a robust picture of the character of communication.  The use
of exclamation points by novice respondents in task-related messages
showed that electronic paralanguage can in certain cases be a general
measure of stress and experience, and as well is a precise indicator
of different kinds of positive and negative psychological stress.
 
------------------------------
=========================================================================
Date:         10 December 1987, 16:25:16 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      More on ACH Standards for Markup  (63 lines)
 
From Nancy Ide <IDE@VASSAR>
>
In response to Jim Coombs' comments and questions:
>
(1) My messages may have been sent out in the wrong order, but I meant to make
it clear that we fully agree with Jim's assertion that we cannot know al l of ou
needs yet, and for that reason the standard will be extensible--and we hope to
make user-defined extensions far easier to deal with than AAP does. But as Bob
Amsler pointed out, a standard which specifies so little that most researchers
end up inventing their own tags anyway is not of much use.  Bob also mentioned
that our subcommittees are going to have to work closely together to avoid
redundancy and to make note of places where alternate descriptions, related to
different applications, describe what may be physically the same
thing--something Jim was concerned about.  I think Bob's idea of a "data
dictionary" is excellent and I hope we can implement it in the course of
development.
 
(2) By "multiple parallel hierarchies" I mean something along the lines of what
Jim outlined--except that his examples all use nicely nested entities.  The
problems arise when we have overlapping entities, for example,
 
  <b-np> the <b-th> dog <e-np> ran <e-th>
 
where "<b-np>" and "<e-np>" mark the beginning and end of a noun phrase,
respectively, and "<b-th>" and "<e-th>" mark the beginning and end of another
entity--say, a thematic unit of some kind.  The context-free syntax of SGML
cannot handle this, and so special mechanisms are required to enable multiple
tag sets in which overlapping entities may be specified.  As I mentioned, these
exist in SGML but are not entirely straightforward, from my understanding.
 
(3) The ultimate goal of an attempt to develop some formal description of
existing schemes is to facilitate the development of translation programs to
translate old formats into the new one.  I sympathize with Jim's feeling that we
shouldn't spend so much time converting the past, but I also understand, after
spending 48 hours with the keepers of the European and Middle-eastern archives
which house millions of words of machine-readable text, that it is not possible
to mount this effort without considering what to do about texts that already
exist in machine-readable form.  I should also point out that at the end of the
two-day meeting in Poughkeepsie we had a discussion about establishing a North
American archive, but by that time many particpants had left and those who
remained had little energy left to address the issue vigorously.  However I
understood those who spoke to say that they didn't feel the need to establish
such an archive, and that in any case the Oxford model (where no guarantees of
quality are made) is sufficient.  I personally tend to agree with Jim that an
archive--North American or better yet, international--should be established in
which texts are "guaranteed," and which more importantly serves as a central
clearinghouse.  Oxford does this as well as it can now but without considerably
more funding cannot more vigorously pursue the acquisition and creation of
machine-readable texts nor ensure that they are both correct and tagged in a
standard form.
 
Nancy Ide
ide@vassar
=========================================================================
Date:         10 December 1987, 23:11:09 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      A new HUMANIST GUIDE (268 lines)
 
Dear Colleagues:
 
A somewhat revised version of the guide to HUMANIST follows. As always
your comments are welcome, to mccarty@utorepas.bitnet.
Yours, W.M.
 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
                     A Guide to HUMANIST
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
        C O N T E N T S
 
 I. Nature and Aims
II. How to use HUMANIST
    A. Sending and receiving messages
    B. Conventions and Etiquette
    C. Distributing files
    D. ListServ's commands and facilities
    E. Suggestions and Complaints
=================================================================
I. Nature and aims
=================================================================
Welcome to HUMANIST, a Bitnet/NetNorth/EARN discussion group for
people who support computing in the humanities. Those who teach,
review software, answer questions, give advice, program, write
documentation, or otherwise support research and teaching in this
area are included. Although HUMANIST is intended to help these
people exchange all kinds of information, it is primarily meant
for discussion rather than publication or advertisement.
 
HUMANIST is an activity of the Special Interest Group for
Humanities Computing Resources, which is in turn an affiliate of
both the Association for Computers and the Humanities (ACH) and
the Association for Literary and Linguistic Computing (ALLC).
Although participants in HUMANIST are not required to be members
of either organization, membership in them is highly recommended.
 
HUMANIST currently has more than 170 members in 10 countries
around the world.
 
In general, HUMANISTs are encouraged to ask questions and offer
answers, to begin and contribute to discussions, to suggest
problems for research, and so forth. One of the specific
motivations for establishing HUMANIST was to allow people
involved in this area to form a common idea of the nature of
their work, its requirements, and its standards. Institutional
recognition is not infrequently inadequate, at least partly
because computing in the humanities is an emerging and highly
cross-disciplinary field. Its support is significantly different
from the support of other kinds of computing, with which it may
be confused. It does not fit easily into the established
categories of academia and is not well understood by those from
whom recognition is sought.
 
Apart from the general discussion, HUMANIST encourages the
formation of a professional identity by maintaining an informal
biographical directory of its members. This directory is
automatically sent to new members when they join. Supplements are
issued whenever warranted by the number of new entries. Members
are responsible for keeping their entries updated.
 
Those from any discipline in or related to the humanities are
welcome, provided that they fit the broad guidelines described
above. Please tell anyone who might be interested to send a
message to me, giving his or her name, address, telephone number,
and a short biographical description of what he or she does to
support computing in the humanities. This description should
cover academic background and research interests, both in
computing and otherwise; the nature of the job this person holds;
and, if relevant, its place in the university.
 
Please direct applications for membership in HUMANIST to
MCCARTY@UTOREPAS.BITNET, not to HUMANIST itself.
=================================================================
II. How to Use HUMANIST
=================================================================
    A. Sending and receiving messages
-----------------------------------------------------------------
Although HUMANIST is managed by software designed for
Bitnet/NetNorth/EARN, members can be on any comparable network
with access to Bitnet, for example, Janet or Arpanet. Users on
these networks suffer only slight restrictions, which will be
mentioned below.
 
Submissions to HUMANIST are made by sending electronic mail as if
to a person with the user-id HUMANIST and the node-name UTORONTO.
All valid submissions are sent to every member, without
exception. The editor of HUMANIST screens submissions only to
prevent the inadvertent distribution of junk mail, which would
otherwise be a serious problem for such a highly complex web of
individuals using a wide variety of computing systems linked
together by several different electronic networks. The editor
will usually pass valid mail on to the membership within a few
hours of submission.
 
The volume of mail on HUMANIST varies with the state of the
membership and the nature of the dominant topic, if any. Recent
experience shows that as many as half a dozen or more messages a
day may be processed. For this reason members are advised to pay
regular, indeed frequent attention to their e-mail or serious
overload may occur. A member planning on being away from regular
contact should advise the editor and ask to be temporarily
removed from active membership. The editor should be reminded
when active membership is to be resumed.
 
The editor also asks that members be careful to specify reliable
addresses. In some cases the advice of local experts may help.
Any member who changes his or her userid or nodename should first
give ample warning to the editor and should verify the new
address. If you know your system is going to be turned off or
otherwise adjusted in a major way, find out when it will be out
of service and inform the editor. Missed mail can be retrieved,
but undelivered e-mail will litter the editor's mailbox.
 
[Please note that in the following description, commands will be
given in the form acceptable on an IBM VM/CMS system. If your
system is different, you will have to make the appropriate
translation.]
-----------------------------------------------------------------
    B. Conventions and Etiquette
-----------------------------------------------------------------
Conversations or asides restricted to a few people can develop
from the unrestricted discussions on HUMANIST by members
communicating directly with each other. This may be a good idea
for very specific replies to general queries, so that
members are not burdened with messages of interest only
to the person who asked the question and, perhaps, a few others.
Members have, however, shown a distinct preference for
unrestricted discussions on nearly every topic, so it is better
to err on the side of openness. If you do send a reply to
someone's question, please restate the question very briefly so
that the context of your answer will be clear.
 
[Note that the REPLY function of some electronic mailers will
automatically direct a user's response to the editor, from whom
all submissions technically originate, not to the original sender
or to HUMANIST. Thus REPLY should be avoided in many cases.]
 
Use your judgment about what the whole group should receive. We
could easily overwhelm each other and so defeat the purpose of
HUMANIST. Strong methods are available for controlling a
discussion group, but the lively, free-ranging discussions made
possible by judicious self-control seem preferable. Controversy
itself is welcome, but what others would regard as tiresome junk-
mail is not. Courtesy is still a treasured virtue.
 
Make it an invariable practice to help the recipients of your
messages scan them by including a SUBJECT line in your message.
Be aware, however, that some people will read no more than the
SUBJECT line, so you should take care that it is accurate and
comprehensive as well as brief. If you can, note the length of
your message in the subject line. The resulting line should look
something like this:
 
  Subject: Textual archives and encoding (45 lines)
 
Use your judgment about the length of your messages. If you find
yourself writing an essay or have a substantial amount of
information to offer, it might be better to follow one of the two
methods outlined in the next section.
 
All contributions should also specify the member's name as well
as e-mail address. This is particularly important for members
whose user-ids bear no relation to their names.
-----------------------------------------------------------------
    C. Distributing files
-----------------------------------------------------------------
HUMANIST offers us an excellent means of distributing written
material of many kinds, e.g., reviews of software or hardware.
(Work is now underway to provide this service for reviews.)
Although conventional journals remain the means of professional
recognition, they are often too slow to keep up with changes in
computing. With some care, HUMANIST could provide a supplementary
venue of immediate benefit to our colleagues.
 
There are two possible methods of distributing such material.
More specialized reports should probably be reduced to abstracts
and posted in this form to HUMANISTs at large, then sent by the
originators directly to those who request them. The more
generally interesting material in bulk can be sent in an ordinary
message to all HUMANISTs, but this could easily overburden the
network so is not generally recommended. We are currently working
on a means of centralized storage for relatively large files,
such that they could be fetched by HUMANISTs at will, but this
means is not yet fully operational.
 
At present the only files we are able to keep centrally are the
monthly logbooks of conversations on HUMANIST. See the next
section for details.
-----------------------------------------------------------------
    D. ListServ's Commands and Facilities
-----------------------------------------------------------------
As just mentioned, ListServ maintains monthly logbooks of
discussions. Thus new members have the opportunity of reading
contributions made prior to joining the group. To see a list of
these logbooks, send the following command:
 
          TELL LISTSERV AT UTORONTO SENDME HUMANIST FILELIST
 
Note that in systems or networks that do not allow interactive
commands to be given to a Bitnet ListServ (I will call such
systems "non-interactive"), the same thing can be accomplished
be sending a message to HUMANIST with the command as the first
and only line, which should read as follows:
 
          GET HUMANIST FILELIST
 
The logbooks are named HUMANIST LOGyymm, where "yy" represents
the last two digits of the year and "mm" the number of the month.
The log for July 1987 would, for example, be named HUMANIST
LOG8707, and to get this log on a system that supports
interactive commands to HUMANIST you would issue the following:
 
           TELL LISTSERV AT UTORONTO GET HUMANIST LOG8705
 
On a non-interactive system, you would send HUMANIST a message
with the following line:
 
           GET HUMANIST LOG8705
 
Note that on a non-interactive system as many of these one-line
commands as you wish can be put in a message to HUMANIST.
 
ListServ accepts several other commands, for example to retrieve
a list of the current members or to set various options. These
are described in a document named LISTSERV MEMO. This and other
documentation will normally be available to you from your nearest
ListServ node and is best fetched from there, since in that way
the network is least burdened. You should consult with your local
experts to discover the nearest ListServ; they will also be able
to help you with whatever problems in the use of ListServ you may
encounter.
 
Once you have found the nearest node XXXXXX, type the following:
 
                   TELL LISTSERV AT XXXXXX INFO ?
 
or, on a non-interactive system:
 
                   INFO ?
 
The various documents available to you will then be listed.
-----------------------------------------------------------------
    E. Suggestions and Complaints
-----------------------------------------------------------------
Suggestions about the running of HUMANIST or its possible
relation to other means of communication are very welcome. So are
complaints, particularly if they are constructive. Experience has
shown that an electronic discussion group can be either
beneficial or burdensome to its members. Much depends on what the
group as a whole wants and does not want. Please make your views
known, to everyone or to me directly, as appropriate.
 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Willard McCarty,                                 8 December 1987
Editor of HUMANIST,
Centre for Computing in the Humanities,
University of Toronto
(MCCARTY@UTOREPAS.BITNET)
=========================================================================
Date:         10 December 1987, 23:16:27 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Reply to Rahtz's pro-chaos message (30 lines)
 
From Robert Amsler <amsler@flash.bellcore.com>
 
Actually, `THE' standard for encoding text already exists, so I'm afraid it
doesn't really matter whether we like it or not. I speak of SGML itself,
ISO standard 8879, which was approved Sept., 1986. The AAP (American
Association of Publishers) also has completed work on an application of
the SGML standard, the so called AAP standard--which itself will soon
be adopted--also, whether or not humanities computing professionals care
or not. The good news for chaos fans is that so-far very few publishers
have made much of an effort to convert to the AAP standard or to pledge
to make their data available in that standard. Some agencies of the US
govrnment are making noises about accepting electronic texts in the
standard (such as NSF, NLM, etc.) and some software merchants (SoftQuad)
have marketed programs which use the standard for typesetting, editing, etc.
 
So... what remains. The AAP standard (and SGML itself) is based on the
concept of `document types' having their own appropriate set of `tags'.
The document types which have been created are only the most generic
sort for magazine articles and books--though they contain specs for
tables and math. equations. The humanities community has expresed no
preferences so far, such as developing its own document types for
things such as plays, poetry, etc. The stage is set for humanists to have
a voice in the future of publisher's formats.
 
=========================================================================
Date:         11 December 1987, 09:00:16 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      The flavour of HUMANIST (22 lines)
 
[The following was sent me by Sebastian Rahtz. It is a quotation
from the Archaeological Information Exchange and applies well
to the kind of thing HUMANIST is, or at least what I think it is.
It is quoted with gratitude but without permission. --ed.]
 
"An archaeological information exchange network should avoid programmatic
constraints, thereby maintaining the sense of immediacy, the ebb and
flow of discourse and activity which represent the situational flux
of daily life, while at the same time providing explicit points of
reference in order to prevent total chaos."
 
[Brian Molyneaux]
=========================================================================
Date:         11 December 1987, 10:22:59 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Text Archives/Markup/TeX
 
From  dow@husc6.BITNET (Dominik Wujastyk)
 
Just a point of information, relating to the current debate about markup
systems and text archives.  It was mentioned by Michael Sperberg-McQueen
that the First Folio has to be encoded right down to the typographical
level in order to be of maximum use.  This reminded me of an on-line
database of mathematical abstracts offered by the American Mathematical
Society, called MathSci (if I remember correctly).  The whole (large)
database is encoded in TeX, and a micro implementation of TeX is sold/given
to all subscribers to the system.  You dial up, search the database, and
download whatever you want, or can afford.  Then you run your entries
through TeX, which is sitting there quietly on your hard disk, and presto,
you have a typeset version of your mathematical texts.  You can view it on
your screen using a DVI previewer, or print it out to paper on anything
from a 9-pin matrix printer to a phototypesetter.  The important thing
in this is the different levels of encoding being represented.  The TeX
markup specifies the main structural elements of the datum, but the macro
package that is located with the TeX implementation (AMSTeX) controls the
interpretation of the tags in the database, right down to the positioning
of individual characters on the output medium.
 
Just a thought.
 
Dominik Wujastyk
dow@harvunxw.bitnet
 
=========================================================================
Date:         12 December 1987, 10:57:48 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Author's query on scholars and telecommunications (25 lines)
 
From Terry Erdt <ERDT@VUVAXCOM>
 
 For a book forthcoming from Paradigm Press, entitled The
Electronic Scholar's Resource Guide, I am putting together a
piece on telecommunications, which will include bulletin board
systems, libraries with catalogs capable of dial-up connections,
Humanet on Scholarsnet, BRS and Dialog, some forums on
CompuServe, Bitnet's IRList Digest, as well as, of course,
Humanist. I would appreciate any suggestions for broadening the
scope of coverage as well as any information about specific
resources.
 
Terrence Erdt                      Erdt@vuvaxcom.Bitnet
Technical Review Editor
Computers and the Humanities
Grad. Dept. of Library Science
Villanova University
Villanova, PA 19085
 
=========================================================================
Date:         12 December 1987, 14:59:03 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      No text or archive; joy in 29 lines
 
From Marshall Gilliland <GILLILAND@SASK>
 
                                   *
                               *  * *  *
                                **   **
                                **   **
                               *  * *  *
                                   *
                                  / \
                                 / v \
          Seasons             *--     --*                Happy
                               \ o @ & /
                               /       \
                              /      |  \
         Greetings          *--      O --*                New
                             /   *       \
                            /     *       \
                           /   %   * * *   \
           and           *--              --*             Year
                          / *   *   *   *  \
                         /   * Saskatoon * *\
                        /  |  * | * | * | *  \
                       /   O    &   $   @     \
                     *-------------------------*
                            |  |     |  |   |
                      _&_   %  |     |  Q   U
                     | / |     |_____|          _\@/_
                     |___|                     |  #  |
                                               |#SASK|
                                               |  #  == Marshall Gilliland
      _________________________________________|__#__|_____________________
=========================================================================
Date:         13 December 1987, 13:41:53 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      text encoding: a Thesaurus Linguae Graecae user's perspective.
 
Contributed by Brad Inwood <inwood@utorepas.bitnet> (124 lines)
 
Text mark up and coding have turned out to be THE issues for
HUMANISTs to get productively excited about.  It is, after all,
the most natural focus for the rather loose set of
interests shared by those who think of themselves as humanities
types.
 
Some observations:
 
The debate about what constitutes an adequate set of tags in a
machine-readable text is obviously a reflex of the interests
(and discipline) of the researcher.  It would be astounding if
we could agree about acceptable minimal markup. My own view is
that text archives need only hold texts which preserve streams
of words, minimal and transparent  editorial markup to signal
emendations, restorations, etc.; flags for verse and metre; and
precise and unambiguous indications of the normal reference
format (Bekker pages, columns and lines for Aristotle; line
numbers for Greek tragedies; book and line for Homer; or
whatever is standard).  Where no standard reference style
descends to the level of single lines (as within chapters of
modern novels) line-level reference should be imposed by the
file itself (e.g. chapter 3, line 769: 769th line of that
chapter in the electronic file) rather than imported from the
text one happens to be imputting from -- even if it is what the
researcher regards as the best text.  If archives are to create
standard reference forms where none exist for printed editions
(as in this case) they should do so in a manner appropriate to
their medium, not the printed medium.
 
researchers who require more markup should bloody well
add it themselves and not burden archives with worries about
anything more elaborate.
 
My own work with machine readable texts is limited to various
materials in the Thesaurus Linguae Graecae text base; my own
notions of minimally acceptable coding and entry format stem
from this experience.  without pausing too much for the
rationale in each case, I would extract the following lessons:
 
1. preservation of information about page breaks and line ends in
the original text is not worth the effort.
 
2. it is particularly bad if one preserves that information at
the cost of retaining soft hyphens in the text which are of no
semantic significance but a mere product of the typesetter's
art.  contrary  to what everyone says, it is not trivial to
strip them out in a global move; more important, software must
be made to do fancy tricks not to take line ends alone as
separators; to make it ignore hyphens at line end is even
harder.  yes, I know it can be done, but why must we bother?
removing hyphens from such text is the single most difficult and
time-consuming job I have had, and the one with the highest risk
of introducing errors into an already proof-read text.
 
3. the TLG preserves page layout information in a fanatical way:
e.g. it will tell you that a given line is to be indented by so many
tabs, but not that it is a verse quotation.  translating the
tabs to spaces is easy enough, but why not just have a tag to
mark verse and the particular metre?
 
4. markup for standard reference style is there in the TLG, but
inconsistently implemented from author to author.  in the
Platonic corpus, for example, Stephanus pages and columns are
declared at the head of each dialogue and subsequently
incremented by a flag; a special programme is needed to convert
the 79th [x] after a declared [x21] actually means Stephanus
page 100.  Line numbers are usually suppressed (although they
are part of the standard reference format for Plato), but
occasionally lines are indicated explicitly.  No guidance is
given as to why this is so, when the much more important page
and column references are so badly handled.
 
5. never let anyone tell you that a decent proofreading job can
be done by someone who does not know the language well
or is not reading with attention to the sense.  the TLG
was keyboarded, not scanned, and yet broken characters in the
printed edition used have turned up as the wrong character in
the final corrected files (Burnet's Oxford Text of Plato is
still running on the original plates and there are a lot of
broken characters which no literate reader could mistake; but no one caught
the rho with the missing tail: looks just like an omicron and would
even have scanned as one; the best visual confirmation based on
comparison with the printed source would not reveal an error).
I fear that scholars are really going to have to have one round
of proofreading everything, so I am pleased that some HUMANISTs
feel that keyboarding Milton can be fun.
 
6. standard coding delimiters are needed which can be regarded
as non-separators by software (I guess that means that the opening
and closing delimiters must be characters NEVER used as delimiters,
parentheses or punctuation marks).  Otherwise the coding used to
mark a conjectural supplement will break the word when the text
stream is analyzed by software.
 
7. if you really want the kind of information which would make
an electronic text useful for serious editorial work (full
apparatus, notation of font changes, change of hands, etc.) then
it seems to me you need something more than an electronic text.
you probably need a hyper-text system of some sort or a
fantastically complex data-base-cum-text.  with custom software.
I start from the assumption that most users of electronic texts
want a clean accurate electronic copy, well-referenced, so that
they can mark it up for their own analytical purposes or search
for and analyze the words in it.  it is a job of an entirely
different order to prepare a data base which can be used to help
edit a text.  the TLG's omission of textual apparatus is much
lamented, and reasonably so; but in this case I think they got
it right.  better to get the text out and make it usable.  if
they had waited to settle on how to handle the apparatus in the
electronic text, (a) we would still be waiting for the TLG, and
(b) they would have had, in effect, to re-edit all of ancient
Greek literature, not just enter and correct.  the editorial
talent for that job simply does not exist.  the media via of
just typing the apparatus which happens to be in the text you
choose to keyboard or scan is a perfect example of falling
between two stools: not enough to make electronic analysis
possible, too much for absolute ease of use and speed of
production.
 
this is all pretty bitty, but that is how user's experience
tends to come out, I guess.  maybe the peon's perspective will
be of some use when the theoretical issues threaten to get out
of hand -- or proportion.
 
Brad Inwood <inwood@utorepas.bitnet>
=========================================================================
Date:         14 December 1987, 23:49:56 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Gaudeamus igitur in about 200 lines
 
Dear Colleagues:
 
I've never before sent you a list of everyone who belongs to HUMANIST,
thinking that you could get this information yourself from ListServ at
any time. This time of year, however, motivates me to do so, I guess in
celebration of an unusual, international (though monolingual), once
noisy, sometimes argumentative, and to me always interesting fellowship.
So, to all of us listed below -- some directly, some hidden in
redistribution lists -- I wish a very happy Hanukkah and very merry
Christmas. I think the Buddha's birthday is also celebrated about this
time of year, and doubtless I have missed other holidays, for which
forgive me.
 
Yours, W.M.
--------------------------------------------------------------------------
*
* HUMANIST Discussion list - created 07 MAY 87
*
CJOHNSON@ARIZRVAX                       Christopher Johnson
OWEN@ARIZRVAX                           David Owen
ATMKO@ASUACAD                           Mark Olsen
ATPMB@ASUACAD                           Pier Baldini
CNNMJ@BCVMS                             M. J. Connolly
CHOUEKA@BIMACS                          Yaacov Choueka
ALLEN@BROWNVM                           Allen H. Renear
JAZBO@BROWNVM                           James H. Coombs
ST401742@BROWNVM                        Timothy Seid
HAMESSE@BUCLLN11                        Jacqueline Hamesse
THOMDOC@BUCLLN11                        CETEDOC Belgium
WORDS@BUCLLN11                          Robert Hogenraad
JONES@BYUADMIN                          Randall Jones
ECHUCK@BYUHRC                           Chuck Bush
H_JOHANSSON%USE.UIO.UNINETT@CERNVAX     Stig Johansson
BJORNDAS@CLARGRAD                       Sterling Bjorndahl
YOUNGC@CLARGRAD                         Charles M. Young
spqr@cm.soton.ac.uk                     Sebastian Rahtz
FKOCH%OCVAXA@CMCCVB                     Christian Koch
PRUSSELL%OCVAXA@CMCCVB                  Roberta Russell
mffgkts@cms.umrcc.ac.uk                 Tony Smith
nash@cogito.mit.edu                     David Nash
epkelly@csvax1.tcd.hea.irl              Elizabeth Dowse
MCCARTHY@CUA                            William J. McCarthy
JMBHC@CUNYVM                            Joanne M. Badagliacco
RSTHC@CUNYVM                            Robert S. Tannenbaum
TERGC@CUNYVM                            Terence Langendoen
MTKUS@CUVMA                             Mark Kennedy
RCLUS@CUVMA                             Robert C. Lehman
SLZUS@CUVMA                             Sue Zayac
cul.henry@cu20b.columbia.edu            Chuck Henry
cul.lowry@cu20b.columbia.edu            Anita Lowry
cul.woo@cu20b.columbia.edu              Janice Woo
humanist@edinburgh.ac.uk                Humanist Group
r.j.hare@edinburgh.ac.uk                Roger Hare
cameron@exeter.ac.uk                    Keith Cameron
amsler@flash.bellcore.com               Robert Amsler
GAUTHIER@FRTOU71                        Robert Gauthier
DOW@HARVUNXW                            Dominik Wujastyk
GALIARD@HGRRUG5                         Harry Gaylord
HUET@HUJIPRMB                           Emanuel Tov
ST_JOSEPH@HVRFORD                       David Carpenter
ayi017@ibm.soton.ac.uk                  Brendan O'Flaherty
cpi047@ibm.soton.ac.uk                  Simon Lane
fri001@ibm.soton.ac.uk                  Sean O'Cathasaigh
ayi004@ibm.southampton.ac.uk            Brian Molyneaux
CONSERVA@IFIIDG                         Lelio Camilleri
GG.BIB@ISUMVS                           Rosanne Potter
S1.CAC@ISUMVS                           Carol Chapelle
sano@jpl-vlsi.arpa                      Haj Sano
nick@lccr.sfu.cdn                       Nick Cercone
bol@liuida.uucp                         Birgitta Olander
RPY383@MAINE                            Colin Martindale
psc90!jdg@mnetor.uucp                   Joel Goldfield
humanist@mts.durham.ac.uk               Humanists' Group
CHADANT@MUN                             Tony Chadwick
DGRAHAM@MUN                             David Graham
H156004@NJECNVM                         Kenneth Tompkins
FAFEO@NOBERGEN                          Espen Ore
FAFKH@NOBERGEN                          Knut Hofland
collins@nss.cs.ucl.ac.uk                Beryl T. Atkins
g.dixon@pa.cn.umist.ac.uk               Gordon Dixon
KRAFT@PENNDRLN                          Robert Kraft
JACKA@PENNDRLS                          Jack Abercrombie
jld1@phx.cam.ac.uk                      John L. Dawson
PKOSSUTH@POMONA                         Karen Kossuth
sdpage@prg.oxford.ac.uk                 Stephen Page
T3B@PSUVM                               Tom Benson
BALESTRI@PUCC                           Diane P. Balestri
RICH@PUCC                               Richard Giordano
TOBYPAFF@PUCC                           Toby Paff
d.mitchell@qmc.ac.uk                    David Mitchell
BARNARD@QUCDN                           David T. Barnard
LESSARDG@QUCDN                          Greg Lessard
LOGANG@QUCDN                            George Logan
ORVIKT@QUCDN                            Tone Orvik
WIEBEM@QUCDN                            M. G. Wiebe
weinshan%cps.msu.edu@relay.cs.net       Donald Weinshank
GILLILAND@SASK                          Marshall Gilliland
JULIEN@SASK                             Jacques Julien
FRIEDMAN_E@SITVXA                       Edward A. Friedman
JHUBBARD@SMITH                          Jamie Hubbard
ZRCC1001@SMUVM1                         Robin C. Cover
GX.MBB@STANFORD                         Malcolm Brown
XB.J24@STANFORD                         John J. Hughes
ACDRLK@SUVM                             Ron Kalinoski
DECARTWR@SUVM                           Dana Cartwright
bs83@sysa.salford.ac.uk                 Max Wood
A79@TAUNIVM                             David Sitman
lb0q@te.cc.cmu.edu                      Leslie Burkholder
ECSGHB@TUCC                             George Brett
DUCALL@TUCCVM                           Frank L. Borchardt
DYBBUK@TUCCVM                           Jeffrey Gillette
SREIMER@UALTAVM                         Stephen R. Reimer
TBUTLER@UALTAVM                         Terry Butler
USERDLDB@UBCMTSG                        Laine Ruus
EGC4BFD@UCLAMVS                         Kelly Stack
IBQ1JVR@UCLAMVS                         John Richardson
IMD7VAW@UCLAMVS                         Vicky Walsh
IZZY590@UCLAVM                          George Bing
U18189@UICVM                            Michael Sperberg-McQueen
qghu21@ujvax.ulster.ac.uk               Noel Wilson
BAUMGARTEN@UMBC                         Joseph Baumgarten
J_CERNY@UNHH                            Jim Cerny
CLAS056@UNLCDC3                         John Turner
FELD@UOFMCC                             Michael Feld
CSHUNTER@UOGUELPH                       Stuart Hunter
AMPHORAS@UTOREPAS                       Philippa Matheson
ANDREWO@UTOREPAS                        Andrew Oliver
ANNE@UTOREPAS                           Anne Lancashire
BRAINERD@UTOREPAS                       Barron Brainerd
ERSATZ@UTOREPAS                         Harold Chimpden Earwicker
IAN@UTOREPAS                            Ian Lancashire
INWOOD@UTOREPAS                         Brad Inwood
MCCARTY@UTOREPAS                        Willard McCarty
ROBERTS@UTOREPAS                        Robert Sinkewicz
STAIRS@UTOREPAS                         Mike Stairs
WINDER@UTOREPAS                         Bill Winder
YOUNG@UTOREPAS                          Abigail Young
ZACOUR@UTOREPAS                         Norman Zacour
humanist@utorgpu.utoronto               Humanist Redistribution List
S_RICHMOND@UTOROISE                     S. Richmond
BRADLEY@UTORONTO                        John Bradley
DESOUS@UTORONTO                         Ronald de Sousa
ESWENSON@UTORONTO                       Eva V. Swenson
LIDIO@UTORONTO                          Lidio Presutti
PARROTT@UTORONTO                        Martha Parrott
PAULIE2@UTORONTO                        Test Account
42104_263@uwovax.uwo.cdn                Glyn Holmes
42152_443@uwovax.uwo.cdn                Richard Shroyer
IDE@VASSAR                              Nancy Ide
a_boddington@vax.acs.open.ac.uk         Andy Boddington
aeb_bevan@vax.acs.open.ac.uk            Edis Bevan
may@vax.leicester.ac.uk                 May Katzen
catherine@vax.oxford.ac.uk              Catherine Griffin
dbpaul@vax.oxford.ac.uk                 Paul Salotti
john@vax.oxford.ac.uk                   John Cooper
logan@vax.oxford.ac.uk                  Grace Logan
lou@vax.oxford.ac.uk                    Lou Burnard
stephen@vax.oxford.ac.uk                Stephen Miller
susan@vax.oxford.ac.uk                  Susan Hockey
v002@vaxa.bangor.ac.uk                  Thomas N. Corns
udaa270@vaxa.cc.kcl.ac.uk               Susan Kruse
wwsrs@vaxa.stir.ac.uk                   Keith Whitelam
ej1@vaxa.york.ac.uk                     Edward James
gw2@vaxa.york.ac.uk                     Geoffrey Wall
jrw2@vaxa.york.ac.uk                    John Wolffe
chaa006@vaxb.rhbnc.ac.uk                Philip Taylor
srrj1@vaxb.york.ac.uk                   Sarah Rees Jones
cstim@violet.berkeley.edu               Tim Maher
f.e.candlin@vme.glasgow.ac.uk           Francis Candlin
CHURCHDM@VUCTRVAX                       Dan M. Church
ERDT@VUVAXCOM                           Terry Erdt
fwtompa@watdaisy.uucp                   Frank Tompa
DDROB@WATDCS                            Don D. Roberts
VANEVRA@WATDCS                          James W. Van Evra
WALTER@WATDCS                           Walter McCutchan
drraymond@watmum.waterloo               Darrell Raymond
makkuni.pa@xerox.com                    Ranjit Makkuni
xeroxhumanists~.x@xerox.com             Humanists at Xerox
ELI@YALEVM                              Doug Hawthorne
YAEL@YKTVMH2                            Yael Ravin
DANIEL@YORKVM1                          Daniel Bloom
YFAN0001@YORKVM1                        Gerald L. Gold
YFPL0004@YORKVM1                        Shu-Yan Mok
YFPL0018@YORKVM1                        Paul Kashiyama
CS100006@YUSOL                          Peter Roosen-Runge
GL250012@YUVENUS                        Jim Benson
*
* Total number of users subscribed to the list:  168
=========================================================================
Date:         15 December 1987, 23:40:40 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      An idea about biographies; Supplement 5 (440 lines)
 
Dear Colleagues:
 
At some point in the near future, if anyone would care for such a thing,
I have it in mind to do a proper job on the biographies. Apart from the
editing and formatting, this would involve collecting a revised
biographical statement from each of you, if you'd care to supply one.
These might be written or rewritten according to a suggested list of
things to be mentioned -- to make them *slightly* less chaotic without
taking the play out. The revised collection would be for circulation
only on HUMANIST. What do you think? Please let me know if the idea
strikes you as worthy of effort. What do you think should be on the
list of things to be mentioned?
 
Meanwhile, the next supplement follows.
 
Yours, W.M.
--------------------------------------------------------------------------
                   Autobiographies of HUMANISTs
                         Fifth Supplement
 
Following are 23 additional entries to the collection of
autobiographical statements by members of the HUMANIST discussion
group.
 
Further additions, corrections, and updates are welcome, to
mccarty@utorepas.bitnet.
 
W.M. 16 December 1987
=========================================================================
*Atwell, Eric Steven  <eric@ai.leeds.ac.uk>
 
Centre for Computer Analysis of Language and Speech, AI Division,
School of Computer Studies,  Leeds University, Leeds LS2 9JT;
+44 532 431751 ext 6
 
I am  in a Computer Studies School, but specialise in linguistic and
literary computing, and applications in Religious Education in
schools.  I would particularly like to liaise with other researchers
working in similar areas.
=========================================================================
*Benson, Tom  <T3B@PSUVM>
              {akgua,allegra,ihnp4,cbosgd}!psuvax1!psuvm.bitnet!t3b (UUCP)
              t3b%psuvm.bitnet@wiscvm.arpa (ARPA)
 
Department of Speech Communication, The Pennsylvania State University
227 Sparks Building, University Park, PA 16802; 814-238-5277
 
I am a Professor of Speech Communication at Penn State University,
currently serving as editor of THE QUARTERLY JOURNAL OF SPEECH.
In addition, I edit the electronic journal CRTNET (Communication
Research and Theory Network).
=========================================================================
*CETEDOC (CENTRE DE TRAITEMENT ELECTRONIQUE DES DOCUMENTS) <THOMDOC@BUCLLN11>
 
CETEDOC, LLN, BELGIUM
 
THE CETEDOC (CENTRE DE TRAITEMENT ELECTRONIQUE DES DOCUMENTS) IS AN
INSTITUTION OF THE CATHOLIC UNIVERSITY OF LOUVAIN AT LOUVAIN-LA-NEUVE,
BELGIUM. ITS DIRECTOR IS PROF. PAUL TOMBEUR.
=========================================================================
*Chadwick, Tony <chadant@mun>
 
Department of French & Spanish, Memorial University of Newfoundland
St. John's, A1B 3X9; (709)737-8572
 
At the moment I have two interests in computing: one is the use of
computers in composition classes for second language learners, the
socond in computerized bibliographies.  I have an M.A. in French from
McMaster and have been teaching at Memorial University since 1967.
Outside computers, my research interests lie in Twentieth Century
French Literature.
=========================================================================
*Coombs, James H.  <JAZBO@BROWNVM>
 
Institute for Research in Information and Scholarship, Brown University
Box 1946, Providence, RI 02912
 
I have a Ph.D. in English (Wordsworth and Milton:  Prophet-Poets)
and an M.A. in Linguistics, both from Brown University.  I have been Mellon
Postdoctoral Fellow in English and am about to become Software Engineer,
Research, Institute for Research in Information and Scholarship (IRIS).
 
I have co-edited an edition of letters (A Pre-Raphaelite Friendship, UMI
Research Press) and have written on allusion and implicature (Poetics, 1985;
Brown Working Papers in Linguistics).  Any day now, the November
Communications of the ACM will appear with an article on "Markup Systems
and the Future of Scholarly Text Processing," written with Allen H. Renear
and Steven J. DeRose.
 
I developed the English Disk on the Brown University mainframe, which provides
various utilities for humanists, primarily for word processing and for staying
sane in CMS.  I wrote a Bibliography Management System for Scholars (BMSS;
1985) and then an Information Management System for Scholars (IMSS; 1986).
Both are in PL/I and may best be considered "aberrant prototypes," used a
little more than necessary for research but never commercialized.  I am
currently working on a system with similar functionality for the IBM PC.
 
Last year, I developed a "comparative concordance" for the multiple editions
of Wordsworth's Prelude.  I am delayed in that by the lack of the final volume
of Cornell's fine editions.  A preliminary paper will appear in the working
papers of Brown's Computing in the Humanities User's Group (CHUG); a full
article will be submitted in January, probably to CHUM.
 
I learned computational linguistics from Prof. Henry Kucera, Nick DeRose, and
Andy Mackie.  Richard Ristow taught me software engineering management or,
more accurately, teaches me more every time I talk to him.  I worked on the
spelling corrector, tuning algorithms.  I worked on the design of the grammar
corrector, designed the rule structures, and developed the rules with Dr.
Carol Singley.  Then I started with Dr. Phil Shinn's Binary Parser and
developed a language independent N-ary Parser (NAP).  NAP reads phrase
structure rules as well as streams of tagged words (see DeRose's article in
Computational Linguistics for information on the disambiguation) and generates
a parse tree, suitable for generalized pattern matching.
 
Finally, at IRIS, I will be developing online dictionary access from our
hypermedia system:  Intermedia (affix stripping, unflection, definition,
parsing, etc.). In addition, we are working on a unified system for accessing
multiple databases, including CD-ROM as well as remote computers.
=========================================================================
*Dawson, John L. <JLD1@PHX.CAM.AC.UK>
 
University of Cambridge, Literary and Linguistic Computing Centre
Sidgwick Avenue, Cambridge  CB3 9DA England; (0223) 335029
 
I have been in charge of the Literary and Linguistic Computing Centre of
Cambridge University since 1974, and now hold the post of Assistant Director
of Research there.  The LLCC acts as a service bureau for all types of
humanities computing, including data preparation, and extends to the areas
of non-scientific computing done by members of science and social science
faculties.  Much of our work remains in the provision of concordances to
various texts in a huge range of languages, either prepared by our staff,
by the user, or by some external body (e.g. TLG, Toronto Corpus of Old
English, etc.)  Some statistical analysis is undertaken, as required by
the users.  Recently, we have begun preparing master pages for publication
using a LaserWriter, and several books have been printed by this means.
 
My background is that of a mathematics graduate with a Diploma in Computer
Science (both from Cambridge).  I am an Honorary Member of ALLC, having
been its Secretary for six years, and a member of the Association for History
and Computing.
 
My present research (though I don't have much time to do it) lies in the
comparison of novels with their translations in other languages. At the
moment I am working on Stendhal's "Le Rouge et le Noir" in French and English,
and on Jane Austen's "Northanger Abbey" in English and French.
 
I have contributed several papers at ALLC and ACH conferences, and published
in the ALLC Journal (now Literary & Linguistic Computing) and in CHum.
=========================================================================
*Giordano, Richard  <RICH@PUCC>
 
I am a new humanities specialist at Princeton University Computer Center
(Computing and Information Technology).  I come to Prinecton from Columbia
University where I was a Systems Analyst in the Libraries for about six
years.  I am just finishing my PhD dissertation in American history at
Columbia as well.
=========================================================================
*Johnson, Christopher <CJOHNSON@ARIZRVAX>
 
Language Research Center, Room 345 Modern Languages, University of Arizona
Tucson, Az   85702; (602) 621-1615
 
I am currently the Director of the Lnaguage Research
Center at the University of Arizona. Masters in Educational Media,
Univeristy of Arizona; Ph.D. in Secondary Education (Minor in
Instructional Technology), UA.
 
I have worked in the area of computer-based instruction since 1976.  I gained
most of my experience on the PLATO system here at the University and as a
consultant to Control Data Corp.  Two years ago I moved to the Faculty of
Humanities to create the Language Research Center, a support facility for
our graduate students, staff, and faculty.
 
My personnal research interests are in the area for individual learning
styles, critical thinking skills, Middle level education and testing
as they apply to computer-based education.  The research interests of my
faculty range from text analysis to word processing to research into the
use of the computer as an instructional tool.
=========================================================================
*Johansson, Stig  <h_johansson%use.uio.uninett@cernvax>
 
Dept of English, Univ of Oslo, P.O. Box 1003, Blindern, N-0315
Oslo 3, Norway. Tel: 456932 (Oslo).
 
Professor of English Language, Univ of Oslo. Relevant research
interest: computers in English language research. Coordinating
secretary of the International Computer Archive of Modern English
(ICAME) and editor of the ICAME Journal. Member of the ALLC.
=========================================================================
*Kalinoski, Ron <ACDRLK@SUVM>
 
Academic Computing Services, 215 Machinery Hall, Syracuse University
Syracuse, New York 13244; 315/423-3998
 
I am Associate Director for Research Computing at Syracuse University
and am interested in sponsoring a seminar series next spring focusing
on computing issues in the humanities. I hope that this will lead to
hiring a full-time staff person to provide user support services
for humanities computing.
========================================================================
*Langendoen, D. Terence  <TERGC@CUNYVM>
 
Linguistics Program, CUNY Graduate Center, 33 West 42nd Street,
New York, NY 10036-8099 USA; 212-790-4574 (soon to change)
 
I am a theoretical linguist, interested in parsing and in computational
linguistics generally.  I have also worked on the problem of making
sophisticated text-editing tools available for the teaching of writing.
 
I am currently Secretary-Treasurer of the Linguistic Society of America,
and will continue to serve until the end of calendar year 1988.  I
have also agreed to serve on two working committees on the ACH/ALLC/ACL
project on standards for text encoding, as a result of the conference
held at Vassar in mid-November 1987.
=========================================================================
*Molyneaux, Brian <AYI004@IBM.SOUTHAMPTON.AC.UK>
 
Department of Archaeology, University of Southampton, England.
 
I am at present conducting postgraduate research
in art and ideology and its relation to material culture.  I am also a
Field Associate at the Royal Ontario Museum, Department of New World
Archaeology, specialising in rock art research.  I obtained a BA (Hons)
in English Literature, a BA (Hon) in Anthropology, and an MA in Art and
Archaeology at Trent University, Peterborough, Ontario.  My research
interest in computing in the Humanities includes the analysis of texts
and art works within the context of social relations.
=========================================================================
*Olofsson, Ake <AAKE@SEUMDC51.BITNET>
 
I am at the Department of Psychology, University of Umea, in the north
of Sweden.  Part of my work at the department is helping people to
learn how to use our computer (VAX and the Swedish university Decnet)
and International mail (Bitnet). We are four system-managers at the
department and have about 40 ordinary users, running word-processing,
statistics and Mail programs.
========================================================================
*ORVIK, TONE <ORVIKT@QUCDN>
 
POST OFFICE BOX 1822, KINGSTON, ON K7L 5J6; 613 - 389 - 6092
 
WORKING ON BIBLE RESEARCH WITH AFFILIATION TO QUEEN'S UNIVERSITY'S
DEPT. OF RELIGIOUS STUDIES; CREATING CONCORDANCE OF SYMBOLOGY.
HAVE WORKED AS A RESEARCHER, TEACHER, AND WRITER, IN EUROPE AND CANADA;
ESPECIALLY ON VARIOUS ASPECTS OF BIBLE AND COMPARATIVE RELIGION.
 
INTERESTED IN CONTACT WITH NETWORK USERS WITH SAME/SIMILAR INTEREST OF
RESEARCH.
=========================================================================
*Potter, Rosanne G. <S1.RGP@ISUMVS or GG.BIB@ISUMVS>
 
Department of English, Iowa State University, Ross Hall 203,
(515) 294-2180 (Main Office); (515) 294-4617 (My office)
 
I am a literary critic; I use the mainframe computer for the analysis
of literary texts.  I have also designed a major formatting bibliographic
package, BIBOUT, in wide use at Iowa State University, also installed
at Princeton and Harvard.  I do not program, rather I work with very
high level programming specialists, statisticians, and systems analysts
here to design the applications that I want for my literary critical
purposes.
 
I am editing a book on Literary Computing and Literary Criticism containing
essays by Richard Bailey, Don Ross, Jr., John Smith, Paul Fortier, C.
Nancy Ide, Ruth Sabol, myself and others.  I've been on the board of ACH,
have been invited to serve on the CHum editorial board.
=========================================================================
*Renear, Allen H.  <ALLEN@BROWNVM>
 
My original academic discipline is philosophy (logic, epistemology, history),
and though I try to keep that up (and expect my Ph.D. this coming June)
I've spent much of the last 7 years in academic computing, particularly
humanities support.  I am currently on the Computer Center staff here at
Brown as a specialist in text processing, typesetting and humanities computing.
 
I've had quite a bit of practical experience designing, managing, and
consulting on large scholarly publication projects and my major research
interests are similarly in the general theory of text representation
and strategies for text based computing.   I am a strong advocate of the
importance of SGML for all computing that involves text; my views on this are
presented in the Coombs, Renear, DeRose article on Markup Systems in the
November 1987 *Communications of the ACM*.  Other topics of interest to me are
structure oriented editing, hypertext, manuscript criticism, and specialized
tools for analytic philosophers.  My research in philosophy is mostly in
epistemic logic (similar to what AI folks call "knowledge representation");
it has some surprising connections with emerging theories of text structure.
I am a contact person for Brown's very active Computing in the Humanities
User's Group (CHUG).
=========================================================================
*Richardson, John <IBQ1JVR@UCLAMVS>
 
Associate Professor, University of California (Ls Angeles), GSLIS;
(213) 825-4352
 
One of my interests is analytical bibliography,
the desription of printed books.  At present I am intrigued
with the idea that we can describe various component parts
of books, notably title pages, paper, and typefaces, but
the major psycho-physical element, ink, is not described.
Obviously this problem involves humanistic work but also
a far degree of sophistication with ink technology.
 
I would be interested in talking with or corresponding
with anyone on this topic...
=========================================================================
*Taylor, Philip <CHAA006@VAXA.RHBNC.AC.UK>
 
Royal Holloway & Bedford New College;  University of London; U.K;
(+44) 0784 34455 Ext: 3172
 
Although not primarily concerned with the humanities (I am
principal systems programmer at RHBNC), I am freqently involved in humanties
projects, particularly in the areas of type-setting (TeX), multi-lingual text
processing, and natural language analysis, among others.
=========================================================================
*Whitelam, Keith W. <WWSRS@VAXA.STIR.AC.UK>
 
Dept. of Religious Studies, University of Stirling, Stirling FK9 4LA
Scotland; Tel. 0786 3171 ext. 2491
 
I have been lecturer in Religious Studies at Stirling since 1978 with
prime responsibility for Hebrew Bible/Old Testament. My research interests
are mainly aimed at exploring new approaches to the study of early Israelite/
Palestinian history in an interdisciplinary context, i.e. drawing upon
social history, anthropology, archaeology, historical demography, etc.
I have been constructing a database of Palestinian archaeological sites,
using software written by the Computing Science department, in order to
analyse settlement patterns, site hierarchies, demography, etc.
The department of Environmental Science has recently purchased Laser Scan
an offered me access to the facilities. This will enable me to display
settlement patterns, sites, etc in map form for analysis and comparison.
I am particularly interested in corresponding/discussing with others working
on similar problems, particularly in Near Eastern archaeology.
 
I have also been involved in exploring the possibilities of setting up
campus-wide text processing laser printing facilities. It looks as though we
shall be able to offer a LaTeX service in the New Year. We are also planning
to offer a WYSIWYG service, such as Ventura on IBM or a combination with
Macs for the production of academic papers. Again I have a particular interest
in the use of foreign fonts, e.g. Hebrew, Akkadian, Ugaritic, Greek, etc.
 
My teaching and research on the Hebrew Bible leads to a concern with
developing computer-aided text analysis, although I have had little time
to explore this area. We have OCP available on our mainframe VAX but my
use of this has been very limited. I see this as an important area of
future development in teaching and research along with Hebrew teaching.
=========================================================================
*Wilson, Noel <QGHU21@UJVAX.ULSTER.AC.UK>
 
Head of Academic Services, University of Ulster, Shore Road
Newtownabbey, Co. Antrim, N. Ireland BT37 0QB;  (0232)365131 Ext. 2449
 
My post has overall responsibility for the central academic
computing service, offered by the Computer Centre, to the
University academic community. Within this brief, my Section
is responsible for the acquisition/development and documentation
of CAL and proprietary software. We currently provide a program
library in support of courses and research which contains approx.
400 programs; of these approx. 80 are in-house developments,
50 proprietary systems and the remainder obtained from a variety
of sources incl. program libraries (eg CONDUIT - Univ. of Iowa).
 
We have only very recently addressed computing within the Faculty
of Humanities; academic staff in the Faculty have used computers in
a research capacity and are now turning towards the various u'grad.
courses. Presently we hold a grant of 79,000 pounds from the United
Kingdom Computer Board for Universities and Research Councils, for
the development of CAL software in support of Linguistics and
Lexicostatistics. Within this project we are attempting to develop
courseware to support grammar teaching in French, German, Spanish
and Irish (details of existing materials appropriate to u'grad.
teaching would be most welcome!). We also are investigating the
creation of software to support an analysis of text (comparative
studies) - in this area we are looking at frequency counts assoc.
with words/expressions/words within registers etc. - again help
would be appreciated.
 
I am happy to provide further details on any of the above points
and wish to keep informed of useful Humanities-related CAL work
elsewhere. We currently use the Acorn BBC micro. but are also
moving in the direction of PC clones.
=========================================================================
*Wood, Max <BS83@SYSA.SALFORD.AC.UK>
 
Computing Officer, 403 Maxwell Building, The University of Salford
The Crescent, Salford, G.M.C. ENGLAND;  061-736-5843 Extension 7399
 
We are  involved in  a project to introduce the use of  computing
in teaching here in the Business  and Management Department of
Salford University and I am  keen to extend links to other
Business schools both here  in  the  U.K.   and  indeed
in the U.S.A. Obviously therefore I  would like to join
your forum so as  to possibly exchange  ideas news  etc.
 
My  background  is essentially  in computing  and  I  mainly supervise
the  computing  resources available to our  Department, and have
formulated  much of the teaching systems we  currently use.
=========================================================================
*Wujastyk, Dominik <dow@husc6.BITNET>
 
I am a Sanskritist with some knowledge of computing.  Once upon a time
(1977-78) I learned Snobol4 from Susan Hockey at Oxford, where I did
undergraduate and later doctoral Sanskrit.  More recently, I have been
using TeX on my PC AT (actually a Compaq III), and in the middle of this
summer I published a book _Studies on Indian Medical History_, which was
done in TeX and printed out on an HP LJ II, and sent to the publisher as
camera ready.  It all went very well.
 
I have received the MS DOS Icon implementation from Griswold at Arizona,
but have not spent time on it.  I am trying to teach myself at the moment,
just to learn enough to knock out ocassional routines to convert files from
wordprocessor formats to TeX, and that sort of thing.  (Probably
reinventing the wheel.)
 
At the present time I am editing a Sanskrit text on medieval alchemy, and
doing all the formatting of the edition in LaTeX.
Before I ever started Sanskrit, I did a degree in Physics at Imperial
College in London, but that is so long ago that I don't like to think about
it!
=========================================================================
*Young, Charles M. <YOUNGC@CLARGRAD>
 
Dept. of Philosophy, The Claremont Graduate School
 
I am a member of the American Philosophical Association's committee on
Computer Use in Philosophy. One of my pet projects is to find some way
of making the Thesaurus Linguae Graecae database (all of classical
Greek through the 7th century C.E.) more readily available to working
scholars.
=========================================================================
*END*
=========================================================================
Date:         16 December 1987, 15:24:57 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      National text archive (45 lines)
 
From C. Faulhaber (U.C. Berkeley, ked@coral.berkeley.edu)
     via Tim Maher <cstim@violet.Berkeley.EDU>
 
1)  Text Archives.  What is needed is some sort of alliance
between the computing types and the professional librarians.
It seems to me that there is a much better chance of getting
a national text archive if it can be integrated into an
ongoing concern. I list three candidates, in decreasing order
of feasibility:
 
a) RLG: Through their PRIMA project they are actively interested
in providing access to new information resources.
b) The organization at the U. of Michigan which already maintains
data bases for use in the social sciences.
c) OCLC: They have relatively less experience than RLG in providing
services for research institutions but are aggressively expanding
their range.
 
2) Citation dictionaries:  John Nitti (Medieval Spanish Seminary,
1120 Van Hise Hall, U. of Wisconsin, Madison 53720) has been
working on just such a dictionary (Dictionary of the Old Spanish
Language) since ca. 1970, although the original plan was to draw
the citations from texts transcribed specifically for that purpose
and publish in standard format on OED lines. With optical disk
technology, the possibility now exists to combine DOSL and athe
texts themselves. In fact, we are contemplating the possibility
of combining these 2 elements with my own Bibliography of Old
Spanish Texts serving as a data base front end in order to
search through texts on the basis of, e.g., date, author,
subject.
 
Prof. Charles Faulhaber
Dept. of Spanish and Portuguese
Univ. of California, Berkeley.
ked@coral.berkeley.edu
=========================================================================
Date:         17 December 1987, 15:53:31 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Info (30 lines)
 
From Mark Olsen <ATMKO@ASUACAD>
 
A student here is doing a project on the discourse of John Woolman
and is looking for computer readable versions of texts by other
18th century American Quakers for comparisons.  I would appreciate
any info concerning the availability of these texts before scanning
them in.
 
A second, stranger request has come through.  I have a faculty member
who is studying a 19th century manuscript.  Parts of it were crossed
out and she is wondering if there is the possibility of using computer
enhancement of the images to improve readability.  She has tried
blowing-up the images, but has not gotten much.  Any ideas?  I must
that I know nothing about image processing except what I read about
concerning the space shots.  Maybe I should try JPL (snicker).
 
Thanks in advance,
                  Mark Olsen
 
I don't know how many lines of text this has, but it doesn't conform
to any known mark-up standard.
=========================================================================
Date:         17 December 1987, 19:51:24 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Christmas gift for HUMANISTs (50 lines)
 
From Sebastian Rahtz <CMI011@IBM.SOUTHAMPTON.AC.UK>
 
 
The following Christmas gift for HUMANISTs is prompted by a
description Lou Burnard sent me of the Vassar 'text encoding
standards' meeting, and by the subsequent HUMANIST discussion
(no I dont have permission to 'publish' this)
 
Incidentally, a recent contribution to HUMANIST implied that
text-encoding standards were a central issue to all HUMANISTs.
May I stand up for the archaeologists, musicians, art-historians,
linguists and philosophers amongst us to say that there is more
to humanities computing than text! equality for all.
 
Sebastian Rahtz (spqr@uk.ac.soton.cm)
 
   A cold coming we had of it,
   just the worst time of the year
   for a journey, and such a long journey:
   the ways deep and the weather sharp,
   a hard time we had of it.
   at the end we preferred to travel all night,
   sleeping in snatches,
   with the voices singing in our ears, saying
   that this was all folly.
   but there was no information, and so we continued
   and arrived at evening, not a moment too soon
   finding the place; it was (you may say) satisfactory.
   all this was a long time ago, I remember,
   and I would do it again, but set down
   this set down
   this: were we led all that way for
   birth or death? there was a birth, certainly,
   we had evidence and no doubt. I had seen birth and death,
   but had thought they were different; this birth was
   hard and bitter agony for us, like death, our death.
   we returned to our places, these kingdoms,
   but no longer at ease here, in the old dispensation,
   with an alien people clutching their gods.
   I should be glad of another death.
 
=========================================================================
Date:         21 December 1987, 15:01:37 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Test message
 
This is a test. Please neither do nor conclude anything because of its
appearance.
=========================================================================
Date:         21 December 1987, 19:29:37 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Whereabouts of R.G. Ragsdale
 
From Christian Koch <FKOCH%OCVAXA@CMCCVB>
 
On December 4 an announcement was sent out over HUMANIST regarding a proposed
course to be offered in connection with the European Conference on Computers
in Education to be held next summer in Lausanne, Switzerland.  The proposal
was by R.G. Ragsdale and the course in question was International Educational
Computing: An Interaction of Values and Technology.  Anyone interested in
further information was to contact R.G. Ragsdale.  Unfortunately there was
no address given, neither e-mail nor regular mail.  I am wondering if anyone
knows the whereabouts of R.G. Ragsdale.  Am also wondering if anyone knows
a contact person for the European Conference on Computers in Education.
 
Thanks!
 
                                   Christian Koch
                                   Oberlin College
                                   Bitnet: fkoch%ocvaxa@cmccvb
=========================================================================
Date:         22 December 1987, 10:50:41 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      ICEBOL (106 lines)
 
From David Sitman <A79@TAUNIVM>
 
 
                           ICEBOL3
 
April 21-22, 1988                      Dakota State College
                                        Madison, SD 57042
 
     ICEBOL3, the International Conference on Symbolic and
Logical Computing, is designed for teachers, scholars, and
programmers who want to meet to exchange ideas about
non-numeric computing.  In addition to a focus on SNOBOL,
SPITBOL, and Icon, ICEBOL3 will feature introductory and
technical presentations on other dangerously powerful
computer languages such as Prolog and LISP, as well as on
applications of BASIC, Pascal, and FORTRAN for processing
strings of characters.  Topics of discussion will include
artificial intelligence, expert systems, desk-top
publishing, and a wide range of analyses of texts in English
and other natural languages.  Parallel tracks of concurrent
sessions are planned: some for experienced computer users
and others for interested novices.  Both mainframe and
microcomputer applications will be discussed.
 
     ICEBOL's coffee breaks, social hours, lunches, and
banquet will provide a series of opportunities for
participants to meet and informally exchange information.
Sessions will be scheduled for "birds of a feather" to
discuss common interests (for example, BASIC users group,
implementations of SNOBOL, computer generated poetry).
 
 
Call For Papers
 
     Abstracts (minimum of 250 words) or full texts of
papers to be read at ICEBOL3 are invited on any application
of non-numeric programming.  Planned sessions include the
following:
   artificial intelligence
   expert systems
   natural language processing
   analysis of literary texts (including bibliography,
      concordance, and index preparation)
   linguistic and lexical analysis (including parsing and
      machine translation)
   preparation of text for electronic publishing
   computer assisted instruction
   grammar and style checkers
   music analysis.
 
     Papers must be in English and should not exceed twenty
minutes reading time.  Abstracts and papers should be
received by January 15, 1988.  Notification of acceptance
will follow promptly.  Papers will be published in ICEBOL3
Proceedings.
 
     Presentations at previous ICEBOL conferences were made
by Susan Hockey (Oxford), Ralph Griswold (Arizona), James
Gimpel (Lehigh), Mark Emmer (Catspaw, Inc.), Robert Dewar
(New York University), and many others.  Copies of ICEBOL 86
Proceedings are available.
 
 
                   ICEBOL3 is sponsored by
 
                The Division of Liberal Arts
 
                             and
 
            The Business and Education Institute
 
                             of
 
                    DAKOTA STATE COLLEGE
                    Madison, South Dakota
 
 
For Further Information
 
     All correspondence including abstracts and papers as
well as requests for registration materials should be sent
to:
 
                        Eric Johnson
                       ICEBOL Director
                       114 Beadle Hall
                    Dakota State College
                  Madison, SD 57042 U.S.A.
                       (605) 256-5270
 
     Inquiries, abstracts, and correspondence may also be
sent via electronic mail to:
 
                   ERIC @ SDNET  (BITNET)
 
 
------- End of Forwarded Message
=========================================================================
Date:         23 December 1987, 21:46:36 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      The reason for a silent HUMANIST from 15 to 21 December
 
Dear Colleagues:
 
You may have assumed that the silence of HUMANIST from about 15 to 21
December was due to a mass unplugging of terminals and departing for
seasonal festivals, but this is not entirely so. A new version of
ListServ (the software that runs HUMANIST), just installed about then,
ran amok, wrote about 1,000,000 lines in the system log here, and so
provoked our postmaster into disconnecting it -- and with it HUMANIST.
The messages that seemed to be sent out during that period went into
limbo, where apparently they still sit. These may suddenly appear in
your readers, perhaps even in duplicate or triplicate, or they may not
show up at all. Against the latter possibility, I am sending you copies
of the limbo'd messages in two batches, with my best wishes for your
good health and prosperity in the new year.
 
Yours, W.M.
_________________________________________________________________________
Dr. Willard McCarty / Centre for Computing in the Humanities
University of Toronto / 14th floor, Robarts Library / 130 St. George St.
Toronto, Canada M5S 1A5 / (416) 978-4238 / mccarty@utorepas.bitnet
=========================================================================
Date:         23 December 1987, 22:01:38 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Batch 1 of limbo'd messages (448 lines)
 
=========================================================================
Date: 15 December 1987, 23:40:40 EST
From: MCCARTY  at UTOREPAS
To:   HUMANIST at UTORONTO
Subject: An idea about biographies; Supplement 5 (440 lines)
 
Dear Colleagues:
 
At some point in the near future, if anyone would care for such a thing,
I have it in mind to do a proper job on the biographies. Apart from the
editing and formatting, this would involve collecting a revised
biographical statement from each of you, if you'd care to supply one.
These might be written or rewritten according to a suggested list of
things to be mentioned -- to make them *slightly* less chaotic without
taking the play out. The revised collection would be for circulation
only on HUMANIST. What do you think? Please let me know if the idea
strikes you as worthy of effort. What do you think should be on the
list of things to be mentioned?
 
Meanwhile, the next supplement follows.
 
Yours, W.M.
--------------------------------------------------------------------------
                   Autobiographies of HUMANISTs
                         Fifth Supplement
 
Following are 23 additional entries to the collection of
autobiographical statements by members of the HUMANIST discussion
group.
 
Further additions, corrections, and updates are welcome, to
mccarty@utorepas.bitnet.
 
W.M. 16 December 1987
=========================================================================
*Atwell, Eric Steven  <eric@ai.leeds.ac.uk>
 
Centre for Computer Analysis of Language and Speech, AI Division,
School of Computer Studies,  Leeds University, Leeds LS2 9JT;
+44 532 431751 ext 6
 
I am  in a Computer Studies School, but specialise in linguistic and
literary computing, and applications in Religious Education in
schools.  I would particularly like to liaise with other researchers
working in similar areas.
=========================================================================
*Benson, Tom  <T3B@PSUVM>
              {akgua,allegra,ihnp4,cbosgd}!psuvax1!psuvm.bitnet!t3b (UUCP)
              t3b%psuvm.bitnet@wiscvm.arpa (ARPA)
 
Department of Speech Communication, The Pennsylvania State University
227 Sparks Building, University Park, PA 16802; 814-238-5277
 
I am a Professor of Speech Communication at Penn State University,
currently serving as editor of THE QUARTERLY JOURNAL OF SPEECH.
In addition, I edit the electronic journal CRTNET (Communication
Research and Theory Network).
=========================================================================
*CETEDOC (CENTRE DE TRAITEMENT ELECTRONIQUE DES DOCUMENTS) <THOMDOC@BUCLLN11>
 
CETEDOC, LLN, BELGIUM
 
THE CETEDOC (CENTRE DE TRAITEMENT ELECTRONIQUE DES DOCUMENTS) IS AN
INSTITUTION OF THE CATHOLIC UNIVERSITY OF LOUVAIN AT LOUVAIN-LA-NEUVE,
BELGIUM. ITS DIRECTOR IS PROF. PAUL TOMBEUR.
=========================================================================
*Chadwick, Tony <chadant@mun>
 
Department of French & Spanish, Memorial University of Newfoundland
St. John's, A1B 3X9; (709)737-8572
 
At the moment I have two interests in computing: one is the use of
computers in composition classes for second language learners, the
socond in computerized bibliographies.  I have an M.A. in French from
McMaster and have been teaching at Memorial University since 1967.
Outside computers, my research interests lie in Twentieth Century
French Literature.
=========================================================================
*Coombs, James H.  <JAZBO@BROWNVM>
 
Institute for Research in Information and Scholarship, Brown University
Box 1946, Providence, RI 02912
 
I have a Ph.D. in English (Wordsworth and Milton:  Prophet-Poets)
and an M.A. in Linguistics, both from Brown University.  I have been Mellon
Postdoctoral Fellow in English and am about to become Software Engineer,
Research, Institute for Research in Information and Scholarship (IRIS).
 
I have co-edited an edition of letters (A Pre-Raphaelite Friendship, UMI
Research Press) and have written on allusion and implicature (Poetics, 1985;
Brown Working Papers in Linguistics).  Any day now, the November
Communications of the ACM will appear with an article on "Markup Systems
and the Future of Scholarly Text Processing," written with Allen H. Renear
and Steven J. DeRose.
 
I developed the English Disk on the Brown University mainframe, which provides
various utilities for humanists, primarily for word processing and for staying
sane in CMS.  I wrote a Bibliography Management System for Scholars (BMSS;
1985) and then an Information Management System for Scholars (IMSS; 1986).
Both are in PL/I and may best be considered "aberrant prototypes," used a
little more than necessary for research but never commercialized.  I am
currently working on a system with similar functionality for the IBM PC.
 
Last year, I developed a "comparative concordance" for the multiple editions
of Wordsworth's Prelude.  I am delayed in that by the lack of the final volume
of Cornell's fine editions.  A preliminary paper will appear in the working
papers of Brown's Computing in the Humanities User's Group (CHUG); a full
article will be submitted in January, probably to CHUM.
 
I learned computational linguistics from Prof. Henry Kucera, Nick DeRose, and
Andy Mackie.  Richard Ristow taught me software engineering management or,
more accurately, teaches me more every time I talk to him.  I worked on the
spelling corrector, tuning algorithms.  I worked on the design of the grammar
corrector, designed the rule structures, and developed the rules with Dr.
Carol Singley.  Then I started with Dr. Phil Shinn's Binary Parser and
developed a language independent N-ary Parser (NAP).  NAP reads phrase
structure rules as well as streams of tagged words (see DeRose's article in
Computational Linguistics for information on the disambiguation) and generates
a parse tree, suitable for generalized pattern matching.
 
Finally, at IRIS, I will be developing online dictionary access from our
hypermedia system:  Intermedia (affix stripping, unflection, definition,
parsing, etc.). In addition, we are working on a unified system for accessing
multiple databases, including CD-ROM as well as remote computers.
=========================================================================
*Dawson, John L. <JLD1@PHX.CAM.AC.UK>
 
University of Cambridge, Literary and Linguistic Computing Centre
Sidgwick Avenue, Cambridge  CB3 9DA England; (0223) 335029
 
I have been in charge of the Literary and Linguistic Computing Centre of
Cambridge University since 1974, and now hold the post of Assistant Director
of Research there.  The LLCC acts as a service bureau for all types of
humanities computing, including data preparation, and extends to the areas
of non-scientific computing done by members of science and social science
faculties.  Much of our work remains in the provision of concordances to
various texts in a huge range of languages, either prepared by our staff,
by the user, or by some external body (e.g. TLG, Toronto Corpus of Old
English, etc.)  Some statistical analysis is undertaken, as required by
the users.  Recently, we have begun preparing master pages for publication
using a LaserWriter, and several books have been printed by this means.
 
My background is that of a mathematics graduate with a Diploma in Computer
Science (both from Cambridge).  I am an Honorary Member of ALLC, having
been its Secretary for six years, and a member of the Association for History
and Computing.
 
My present research (though I don't have much time to do it) lies in the
comparison of novels with their translations in other languages. At the
moment I am working on Stendhal's "Le Rouge et le Noir" in French and English,
and on Jane Austen's "Northanger Abbey" in English and French.
 
I have contributed several papers at ALLC and ACH conferences, and published
in the ALLC Journal (now Literary & Linguistic Computing) and in CHum.
=========================================================================
*Giordano, Richard  <RICH@PUCC>
 
I am a new humanities specialist at Princeton University Computer Center
(Computing and Information Technology).  I come to Prinecton from Columbia
University where I was a Systems Analyst in the Libraries for about six
years.  I am just finishing my PhD dissertation in American history at
Columbia as well.
=========================================================================
*Johnson, Christopher <CJOHNSON@ARIZRVAX>
 
Language Research Center, Room 345 Modern Languages, University of Arizona
Tucson, Az   85702; (602) 621-1615
 
I am currently the Director of the Lnaguage Research
Center at the University of Arizona. Masters in Educational Media,
Univeristy of Arizona; Ph.D. in Secondary Education (Minor in
Instructional Technology), UA.
 
I have worked in the area of computer-based instruction since 1976.  I gained
most of my experience on the PLATO system here at the University and as a
consultant to Control Data Corp.  Two years ago I moved to the Faculty of
Humanities to create the Language Research Center, a support facility for
our graduate students, staff, and faculty.
 
My personnal research interests are in the area for individual learning
styles, critical thinking skills, Middle level education and testing
as they apply to computer-based education.  The research interests of my
faculty range from text analysis to word processing to research into the
use of the computer as an instructional tool.
=========================================================================
*Johansson, Stig  <h_johansson%use.uio.uninett@cernvax>
 
Dept of English, Univ of Oslo, P.O. Box 1003, Blindern, N-0315
Oslo 3, Norway. Tel: 456932 (Oslo).
 
Professor of English Language, Univ of Oslo. Relevant research
interest: computers in English language research. Coordinating
secretary of the International Computer Archive of Modern English
(ICAME) and editor of the ICAME Journal. Member of the ALLC.
=========================================================================
*Kalinoski, Ron <ACDRLK@SUVM>
 
Academic Computing Services, 215 Machinery Hall, Syracuse University
Syracuse, New York 13244; 315/423-3998
 
I am Associate Director for Research Computing at Syracuse University
and am interested in sponsoring a seminar series next spring focusing
on computing issues in the humanities. I hope that this will lead to
hiring a full-time staff person to provide user support services
for humanities computing.
========================================================================
*Langendoen, D. Terence  <TERGC@CUNYVM>
 
Linguistics Program, CUNY Graduate Center, 33 West 42nd Street,
New York, NY 10036-8099 USA; 212-790-4574 (soon to change)
 
I am a theoretical linguist, interested in parsing and in computational
linguistics generally.  I have also worked on the problem of making
sophisticated text-editing tools available for the teaching of writing.
 
I am currently Secretary-Treasurer of the Linguistic Society of America,
and will continue to serve until the end of calendar year 1988.  I
have also agreed to serve on two working committees on the ACH/ALLC/ACL
project on standards for text encoding, as a result of the conference
held at Vassar in mid-November 1987.
=========================================================================
*Molyneaux, Brian <AYI004@IBM.SOUTHAMPTON.AC.UK>
 
Department of Archaeology, University of Southampton, England.
 
I am at present conducting postgraduate research
in art and ideology and its relation to material culture.  I am also a
Field Associate at the Royal Ontario Museum, Department of New World
Archaeology, specialising in rock art research.  I obtained a BA (Hons)
in English Literature, a BA (Hon) in Anthropology, and an MA in Art and
Archaeology at Trent University, Peterborough, Ontario.  My research
interest in computing in the Humanities includes the analysis of texts
and art works within the context of social relations.
=========================================================================
*Olofsson, Ake <AAKE@SEUMDC51.BITNET>
 
I am at the Department of Psychology, University of Umea, in the north
of Sweden.  Part of my work at the department is helping people to
learn how to use our computer (VAX and the Swedish university Decnet)
and International mail (Bitnet). We are four system-managers at the
department and have about 40 ordinary users, running word-processing,
statistics and Mail programs.
========================================================================
*ORVIK, TONE <ORVIKT@QUCDN>
 
POST OFFICE BOX 1822, KINGSTON, ON K7L 5J6; 613 - 389 - 6092
 
WORKING ON BIBLE RESEARCH WITH AFFILIATION TO QUEEN'S UNIVERSITY'S
DEPT. OF RELIGIOUS STUDIES; CREATING CONCORDANCE OF SYMBOLOGY.
HAVE WORKED AS A RESEARCHER, TEACHER, AND WRITER, IN EUROPE AND CANADA;
ESPECIALLY ON VARIOUS ASPECTS OF BIBLE AND COMPARATIVE RELIGION.
 
INTERESTED IN CONTACT WITH NETWORK USERS WITH SAME/SIMILAR INTEREST OF
RESEARCH.
=========================================================================
*Potter, Rosanne G. <S1.RGP@ISUMVS or GG.BIB@ISUMVS>
 
Department of English, Iowa State University, Ross Hall 203,
(515) 294-2180 (Main Office); (515) 294-4617 (My office)
 
I am a literary critic; I use the mainframe computer for the analysis
of literary texts.  I have also designed a major formatting bibliographic
package, BIBOUT, in wide use at Iowa State University, also installed
at Princeton and Harvard.  I do not program, rather I work with very
high level programming specialists, statisticians, and systems analysts
here to design the applications that I want for my literary critical
purposes.
 
I am editing a book on Literary Computing and Literary Criticism containing
essays by Richard Bailey, Don Ross, Jr., John Smith, Paul Fortier, C.
Nancy Ide, Ruth Sabol, myself and others.  I've been on the board of ACH,
have been invited to serve on the CHum editorial board.
=========================================================================
*Renear, Allen H.  <ALLEN@BROWNVM>
 
My original academic discipline is philosophy (logic, epistemology, history),
and though I try to keep that up (and expect my Ph.D. this coming June)
I've spent much of the last 7 years in academic computing, particularly
humanities support.  I am currently on the Computer Center staff here at
Brown as a specialist in text processing, typesetting and humanities computing.
 
I've had quite a bit of practical experience designing, managing, and
consulting on large scholarly publication projects and my major research
interests are similarly in the general theory of text representation
and strategies for text based computing.   I am a strong advocate of the
importance of SGML for all computing that involves text; my views on this are
presented in the Coombs, Renear, DeRose article on Markup Systems in the
November 1987 *Communications of the ACM*.  Other topics of interest to me are
structure oriented editing, hypertext, manuscript criticism, and specialized
tools for analytic philosophers.  My research in philosophy is mostly in
epistemic logic (similar to what AI folks call "knowledge representation");
it has some surprising connections with emerging theories of text structure.
I am a contact person for Brown's very active Computing in the Humanities
User's Group (CHUG).
=========================================================================
*Richardson, John <IBQ1JVR@UCLAMVS>
 
Associate Professor, University of California (Ls Angeles), GSLIS;
(213) 825-4352
 
One of my interests is analytical bibliography,
the desription of printed books.  At present I am intrigued
with the idea that we can describe various component parts
of books, notably title pages, paper, and typefaces, but
the major psycho-physical element, ink, is not described.
Obviously this problem involves humanistic work but also
a far degree of sophistication with ink technology.
 
I would be interested in talking with or corresponding
with anyone on this topic...
=========================================================================
*Taylor, Philip <CHAA006@VAXA.RHBNC.AC.UK>
 
Royal Holloway & Bedford New College;  University of London; U.K;
(+44) 0784 34455 Ext: 3172
 
Although not primarily concerned with the humanities (I am
principal systems programmer at RHBNC), I am freqently involved in humanties
projects, particularly in the areas of type-setting (TeX), multi-lingual text
processing, and natural language analysis, among others.
=========================================================================
*Whitelam, Keith W. <WWSRS@VAXA.STIR.AC.UK>
 
Dept. of Religious Studies, University of Stirling, Stirling FK9 4LA
Scotland; Tel. 0786 3171 ext. 2491
 
I have been lecturer in Religious Studies at Stirling since 1978 with
prime responsibility for Hebrew Bible/Old Testament. My research interests
are mainly aimed at exploring new approaches to the study of early Israelite/
Palestinian history in an interdisciplinary context, i.e. drawing upon
social history, anthropology, archaeology, historical demography, etc.
I have been constructing a database of Palestinian archaeological sites,
using software written by the Computing Science department, in order to
analyse settlement patterns, site hierarchies, demography, etc.
The department of Environmental Science has recently purchased Laser Scan
an offered me access to the facilities. This will enable me to display
settlement patterns, sites, etc in map form for analysis and comparison.
I am particularly interested in corresponding/discussing with others working
on similar problems, particularly in Near Eastern archaeology.
 
I have also been involved in exploring the possibilities of setting up
campus-wide text processing laser printing facilities. It looks as though we
shall be able to offer a LaTeX service in the New Year. We are also planning
to offer a WYSIWYG service, such as Ventura on IBM or a combination with
Macs for the production of academic papers. Again I have a particular interest
in the use of foreign fonts, e.g. Hebrew, Akkadian, Ugaritic, Greek, etc.
 
My teaching and research on the Hebrew Bible leads to a concern with
developing computer-aided text analysis, although I have had little time
to explore this area. We have OCP available on our mainframe VAX but my
use of this has been very limited. I see this as an important area of
future development in teaching and research along with Hebrew teaching.
=========================================================================
*Wilson, Noel <QGHU21@UJVAX.ULSTER.AC.UK>
 
Head of Academic Services, University of Ulster, Shore Road
Newtownabbey, Co. Antrim, N. Ireland BT37 0QB;  (0232)365131 Ext. 2449
 
My post has overall responsibility for the central academic
computing service, offered by the Computer Centre, to the
University academic community. Within this brief, my Section
is responsible for the acquisition/development and documentation
of CAL and proprietary software. We currently provide a program
library in support of courses and research which contains approx.
400 programs; of these approx. 80 are in-house developments,
50 proprietary systems and the remainder obtained from a variety
of sources incl. program libraries (eg CONDUIT - Univ. of Iowa).
 
We have only very recently addressed computing within the Faculty
of Humanities; academic staff in the Faculty have used computers in
a research capacity and are now turning towards the various u'grad.
courses. Presently we hold a grant of 79,000 pounds from the United
Kingdom Computer Board for Universities and Research Councils, for
the development of CAL software in support of Linguistics and
Lexicostatistics. Within this project we are attempting to develop
courseware to support grammar teaching in French, German, Spanish
and Irish (details of existing materials appropriate to u'grad.
teaching would be most welcome!). We also are investigating the
creation of software to support an analysis of text (comparative
studies) - in this area we are looking at frequency counts assoc.
with words/expressions/words within registers etc. - again help
would be appreciated.
 
I am happy to provide further details on any of the above points
and wish to keep informed of useful Humanities-related CAL work
elsewhere. We currently use the Acorn BBC micro. but are also
moving in the direction of PC clones.
=========================================================================
*Wood, Max <BS83@SYSA.SALFORD.AC.UK>
 
Computing Officer, 403 Maxwell Building, The University of Salford
The Crescent, Salford, G.M.C. ENGLAND;  061-736-5843 Extension 7399
 
We are  involved in  a project to introduce the use of  computing
in teaching here in the Business  and Management Department of
Salford University and I am  keen to extend links to other
Business schools both here  in  the  U.K.   and  indeed
in the U.S.A. Obviously therefore I  would like to join
your forum so as  to possibly exchange  ideas news  etc.
 
My  background  is essentially  in computing  and  I  mainly supervise
the  computing  resources available to our  Department, and have
formulated  much of the teaching systems we  currently use.
=========================================================================
*Wujastyk, Dominik <dow@husc6.BITNET>
 
I am a Sanskritist with some knowledge of computing.  Once upon a time
(1977-78) I learned Snobol4 from Susan Hockey at Oxford, where I did
undergraduate and later doctoral Sanskrit.  More recently, I have been
using TeX on my PC AT (actually a Compaq III), and in the middle of this
summer I published a book _Studies on Indian Medical History_, which was
done in TeX and printed out on an HP LJ II, and sent to the publisher as
camera ready.  It all went very well.
 
I have received the MS DOS Icon implementation from Griswold at Arizona,
but have not spent time on it.  I am trying to teach myself at the moment,
just to learn enough to knock out ocassional routines to convert files from
wordprocessor formats to TeX, and that sort of thing.  (Probably
reinventing the wheel.)
 
At the present time I am editing a Sanskrit text on medieval alchemy, and
doing all the formatting of the edition in LaTeX.
Before I ever started Sanskrit, I did a degree in Physics at Imperial
College in London, but that is so long ago that I don't like to think about
it!
=========================================================================
*Young, Charles M. <YOUNGC@CLARGRAD>
 
Dept. of Philosophy, The Claremont Graduate School
 
I am a member of the American Philosophical Association's committee on
Computer Use in Philosophy. One of my pet projects is to find some way
of making the Thesaurus Linguae Graecae database (all of classical
Greek through the 7th century C.E.) more readily available to working
scholars.
=========================================================================
*END*
=========================================================================
=========================================================================
Date:         23 December 1987, 22:02:58 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Batch 2 of limbo'd messages (142 lines)
 
=========================================================================
Date: 16 December 1987, 15:24:57 EST
From: MCCARTY  at UTOREPAS
To:   HUMANIST at UTORONTO
Subject: National text archive (45 lines)
 
From C. Faulhaber (U.C. Berkeley, ked@coral.berkeley.edu)
     via Tim Maher <cstim@violet.Berkeley.EDU>
 
1)  Text Archives.  What is needed is some sort of alliance
between the computing types and the professional librarians.
It seems to me that there is a much better chance of getting
a national text archive if it can be integrated into an
ongoing concern. I list three candidates, in decreasing order
of feasibility:
 
a) RLG: Through their PRIMA project they are actively interested
in providing access to new information resources.
b) The organization at the U. of Michigan which already maintains
data bases for use in the social sciences.
c) OCLC: They have relatively less experience than RLG in providing
services for research institutions but are aggressively expanding
their range.
 
2) Citation dictionaries:  John Nitti (Medieval Spanish Seminary,
1120 Van Hise Hall, U. of Wisconsin, Madison 53720) has been
working on just such a dictionary (Dictionary of the Old Spanish
Language) since ca. 1970, although the original plan was to draw
the citations from texts transcribed specifically for that purpose
and publish in standard format on OED lines. With optical disk
technology, the possibility now exists to combine DOSL and athe
texts themselves. In fact, we are contemplating the possibility
of combining these 2 elements with my own Bibliography of Old
Spanish Texts serving as a data base front end in order to
search through texts on the basis of, e.g., date, author,
subject.
 
Prof. Charles Faulhaber
Dept. of Spanish and Portuguese
Univ. of California, Berkeley.
ked@coral.berkeley.edu
=========================================================================
Date: 17 December 1987, 15:53:31 EST
From: MCCARTY  at UTOREPAS
To:   HUMANIST at UTORONTO
Subject: Info (30 lines)
From Mark Olsen <ATMKO@ASUACAD>
 
A student here is doing a project on the discourse of John Woolman
and is looking for computer readable versions of texts by other
18th century American Quakers for comparisons.  I would appreciate
any info concerning the availability of these texts before scanning
them in.
 
A second, stranger request has come through.  I have a faculty member
who is studying a 19th century manuscript.  Parts of it were crossed
out and she is wondering if there is the possibility of using computer
enhancement of the images to improve readability.  She has tried
blowing-up the images, but has not gotten much.  Any ideas?  I must
that I know nothing about image processing except what I read about
concerning the space shots.  Maybe I should try JPL (snicker).
 
Thanks in advance,
                  Mark Olsen
 
I don't know how many lines of text this has, but it doesn't conform
to any known mark-up standard.
=========================================================================
Date: 17 December 1987, 19:51:24 EST
From: MCCARTY  at UTOREPAS
To:   HUMANIST at UTORONTO
Subject: Christmas gift for HUMANISTs (50 lines)
From Sebastian Rahtz <CMI011@IBM.SOUTHAMPTON.AC.UK>
 
 
The following Christmas gift for HUMANISTs is prompted by a
description Lou Burnard sent me of the Vassar 'text encoding
standards' meeting, and by the subsequent HUMANIST discussion
(no I dont have permission to 'publish' this)
 
Incidentally, a recent contribution to HUMANIST implied that
text-encoding standards were a central issue to all HUMANISTs.
May I stand up for the archaeologists, musicians, art-historians,
linguists and philosophers amongst us to say that there is more
to humanities computing than text! equality for all.
 
Sebastian Rahtz (spqr@uk.ac.soton.cm)
 
   A cold coming we had of it,
   just the worst time of the year
   for a journey, and such a long journey:
   the ways deep and the weather sharp,
   a hard time we had of it.
   at the end we preferred to travel all night,
   sleeping in snatches,
   with the voices singing in our ears, saying
   that this was all folly.
   but there was no information, and so we continued
   and arrived at evening, not a moment too soon
   finding the place; it was (you may say) satisfactory.
   all this was a long time ago, I remember,
   and I would do it again, but set down
   this set down
   this: were we led all that way for
   birth or death? there was a birth, certainly,
   we had evidence and no doubt. I had seen birth and death,
   but had thought they were different; this birth was
   hard and bitter agony for us, like death, our death.
   we returned to our places, these kingdoms,
   but no longer at ease here, in the old dispensation,
   with an alien people clutching their gods.
   I should be glad of another death.
 
=========================================================================
Date: 18 December 1987, 14:08:49 EST
From: MCCARTY  at UTOREPAS
To:   HUMANIST at UTORONTO
Subject: Offline 16 (20 lines)
From Bob Kraft <KRAFT@PENNDRLN>
 
My bimonthly OFFLINE column for Religious Studies News
has just been sent off to the printer for the January or
February issue of RSNews. It consists of a report on the
computer aspects of the recent annual meetings of the
Society of Biblical Literature, American Academy of Religion,
and American Schools for Oriental Research, held jointly in
Boston on 5-8 December 1987. If any HUMANISTS would like a
pre-publication electronic copy of OFFLINE 16, I am willing
to send it upon request. Happy Holidays!
 
Bob Kraft
=========================================================================
=========================================================================
Date:         23 December 1987, 22:31:33 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Hypermedia bibliography
 
Anyone wishing a copy of a recent bibliography of items on hypermedia,
compiled at IRIS (Brown Univ.), should sent a note to me requesting it.
The bibliography, which recently appeared on IRLIST, comes in three
parts, each approximately 500 lines long.
 
W.M.
=========================================================================
Date:         29 December 1987, 13:53:52 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Library of Congress: markup and MRTs?
 
From James H. Coombs <JAZBO@BROWNVM>
 
In a note posted on 8 Dec 1987: Richard Giordano states,
 
         Traditionally, ALA [American Library Association] and LC
         [Library of Congress] have both taken the lead in the scholarly
         world in providing machine-readable information.  The technical
         problems that LC has addressed have been fundamental to data
         processing.
 
Could you provide more information, e.g., citations of articles?  I know
that LC is considering SGML, but they seem to be much more of a follower
than a leader in this effort at least.  I also believe that the LC is
more interested in microfilm than in electronic media for the
preservation of materials printed on paper that is not acid free.  I was
somewhat distressed when I first read this (wish I knew where I read it
too), but apparently microfilm lasts longer than computer tape and
requires less maintenance.  (Still might be the wrong decision.)
 
So, I've missed out on what the LC is doing for Machine Readable Texts [MRTs]
and the like.  Any information appreciated.  Thanks.  --Jim
 
P.S.  Well, the same for ALA and RLG [Research Libraries Group].  What are
they doing?
 
 
Dr. James H. Coombs
Software Engineer, Research
Institute for Research in Information and Scholarship (IRIS)
Brown University
jazbo@brownvm.bitnet
=========================================================================
Date:         29 December 1987, 13:56:42 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      music-encoding standards? (50 lines)
 
From James H. Coombs <JAZBO@BROWNVM>
 
I'm glad to see Humanist up again!
 
In a posting of 17 December, Sebastian Rahtz says:
 
         Incidentally, a recent contribution to HUMANIST implied that
         text-encoding standards were a central issue to all HUMANISTs.
         May I stand up for the archaeologists, musicians,
         art-historians, linguists and philosophers amongst us to say
         that there is more to humanities computing than text! equality
         for all.
 
Just so, Sebastian!  ANSI X3V1.8M/87-17---Journal of Technical Developments
discusses the application of SGML to music (Work Group, Music Processing
Standards).
 
According to an article in TAG (The SGML Newsletter), the goal is
 
         to describe music not only for documentation and hard copy
         preparation but also to be included in technical documentation
         and played in a real time rendition simultaneously while
         viewing a particular part of the document.  Dr. Goldfarb
         referred to the inclusion of music in a technical document, and
         therefore to the concept of time, as "technical documentation
         in four dimensions." (vol. 1, no. 3, page 10)
 
--Jim
 
Dr. James H. Coombs
Software Engineer, Research
Institute for Research in Information and Scholarship (IRIS)
Brown University
jazbo@brownvm.bitnet
=========================================================================
Date:         30 December 1987, 00:46:40 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      The "interesting problem" of 30 November
 
From Prof. Yaacov Choueka <choueka@bimacs.uucp>
 
I just saw your email msg about the problem of identifying
the language of given titles.
A few months ago, when on a sabbatical at Bellcore, I was
engaged in a small research project with David Copp about
finding "minimal sets of words which would cover every "innocent"
line (=80 char.) of English", i.e. s.t. every line of standard
English (not specially constructed to give a counter-example...)
would contain at least one word from the list.
We had some quite interesting developments, and we experimented with
several lists of about 60-100 words, and tens of thousands of lines
from the New York Times, finding that a list of < 100 words
can give a criterion for English which would be reliable with
a very high percentage.
We were looking for some applications, and yours is an excellent
one! The research has not yet been written or presented at
a conference (time, time!), but either I or David can give
you more details if you are interested.
In fact if you already have hundreds or thousands of titles
for which you already know the language, it might be fun
to run our programs on this set.
Best regards.
(David Copp can be reached at copp@bellcore.flash)
Yaacov Choueka.