From:	CBS%UK.AC.RUTHERFORD.MAIL::CA.UTORONTO.UTCS.VM::POSTMSTR 14-JAN-1989 09:53:36.32
To:	archive
CC:	
Subj:	

Via: UK.AC.RUTHERFORD.MAIL; Sat, 14 Jan 89   9:50 GMT
Received: from UKACRL by UK.AC.RL.IB (Mailer X1.25) with BSMTP id 9331; Sat, 14
          Jan 89 09:50:04 GM
Received: from vm.utcs.utoronto.ca by UKACRL.BITNET (Mailer X1.25) with BSMTP
          id 1923; Sat, 14 Jan 89 09:49:54 G
Received: by UTORONTO (Mailer X1.25) id 0407; Fri, 13 Jan 89 14:46:36 EST
Date:     Fri, 13 Jan 89 14:46:07 EST
From:     "Steve Younker (Postmaster)" <POSTMSTR@CA.UTORONTO.UTCS.VM>
To:       archive@UK.AC.OXFORD.VAX

=========================================================================
Date:         1 December 1987, 00:21:22 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: Sterling Bjorndahl - Claremont Grad. School
             <BJORNDAS@CLARGRAD>
Subject:  range of discussion on HUMANIST
 
     I appreciate the concern about HUMANIST's self-editorial
policy.  There is a spirit missing from HUMANIST's discussions that
has been present in other discussions I have been a member of.  On the
other hand, I can remain a member of HUMANIST in good conscience
because it does not take much time away from my other duties, which
are considered by others here to be of a higher priority.
     Two specific examples:  I was a member of the info-c discussion
on ARPA (linked with the sister discussion group on usenet).  This was
a free-flowing discussion with frequent cries from subscribers asking
submitters to control themselves.  There was a great deal of redun-
dancy and even inanity mixed with a few nuggets of valid and even
brilliant discussion.  Although I enjoyed it immensely on the whole, I
had to quit because I couldn't afford that many hours of extra reading
per week.  After a while, the returns just weren't great enough to put
up with the noise.
     On the other end of the editorial spectrum was the arpanet RISKS
digest.  A "digest" means that the moderator is also an editor.  All
submissions are sent to him, and he exercises editorial judgement on
everything submitted.  Once or twice a week, as volume dictates, the
collected and edited submissions are mailed in one package to all sub-
scribers, with a refreshing dash of humour added.  The kind of give-
and-take conversations that have been referred to can still happen in
this environment, because the moderator is essentially benign unless
serious redundancies and/or inanities occur.  (I believe that the mod-
erator was getting full credit for his work in this, and it was proba-
bly a part of his job description.)  Nevetheless, even so edited, the
volume became more than I could deal with effectively (despite the
fascinating subject matter, by the way: risks to the public from com-
puters and automated systems).
     So although I find HUMANIST occasionally on the "dead" side, I
have no trouble maintaining my subscription since it does not demand
too much of my time. Discussions happen in private, and if I want to
get in on them I can contact the initiator. I admit, I wouldn't mind
seeing a bit more activity in HUMANIST on occasion, and I think people
with issues of broad interest (such as the recent discussion on the
OED) should feel free to bring these issues forward.  But if HUMANIST
has to err, I would rather it err on the dead side, lest I be forced
to resign.  Let my vote be so registered.
 
     Sterling Bjorndahl
     Institute for Antiquity and Christianity
     Claremont Graduate School
     Claremont, California
=========================================================================
Date:         1 December 1987, 09:25:37 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: Jim Cerny <J_CERNY@UNHH>
Subject:  The Dirty Dozen ... Plus???
 
This is just to add to the warning passed on by Stuart
Hunter about Lehigh's direct experience with a virus in
some publicly obtained copies of COMMAND.COM.
 
There are apparently a number of other programs that have
had work done to their genes to turn them into malignant
viruses.  They have come to be called "The Dirty Dozen,"
though there are more than a dozen.
 
These have been described in a number of computer center
newsletters in the last year or so.  The most recent description
I've seen was "Beware The Dirty Dozen: Software That Destroys,"
CAUSE/EFFECT, v. 10, n. 6, November 1987, pp. 44-45.  (which
is reprinted from the "Technical Update" publication at the
Univ. Cincinnati Computing Center, September 1, 1987).
 
        Jim Cerny
        University Computing, Univ. N.H.
        J_CERNY@UNHH
=========================================================================
Date:         1 December 1987, 09:34:18 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: Dr Abigail Ann Young  <young@utorepas>
Subject: Discussions [34 ll, counting this one]
 
Well, pace Sterling Bjorndahl, I don't find HUMANIST
on the dead side and I don't want to!  But I think I know
exactly what he's talking about.  Recently it seems that
queries or opinions appear and then die in electronic silence.
In fact, there seems in general to be less discussion now
than there was a few months ago.  I don't know to what to
attribute this.  It could  reflect a need on the part of those
of us who teach or provide services to students to prepare for
and then deal with the demands of a new academic session.
It could be that no-one has very much to say at the moment.
But I have wondered recently whether we were all feeling a
reluctance to say much brought on by our worthy moderator's
urgings towards self-editing (with the consequent responsiblity
of editing and posting a resulting conversation, if any) and
our new awareness of the cost factor for the Antipodes at least.
I certainly find the current "full" discussion on the details
about the electronic OED interesting and a nice change, even
though I had already found out a lot of it at the Waterloo
conference, and I wish that I'd kept my query about the Rutgers
database general now too.  So I am glad that Willard has passed
on what others have had to say, and I think perhaps we should try
out for a bit making all discussion general.  We could make
use of a subject line to indicate the topic of a posting, and
whether it were part of an on-going discussion, thus enabling
those who need to clear their readers quickly to ignore
discussions which were not of interest to them.
 
Abigail Young
Research Associate,
Records of Early English Drama
University of Toronto
young at utorepas
=========================================================================
Date:         1 December 1987, 11:14:50 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: William J. McCarthy <MCCARTHY@CUA>
Subject:  discussions (15 lines)
 
I would like to express my approval of the contents of
the recent message about discussions on HUMANIST.
Although I have no interest in scanning the
turgid "flames" of the digitally deranged, it seems much
more than unlikely that HUMANISTs will inundate one another
with drivel; and, I am content to attempt to follow the
threads of the discussions on my own. Certainly it >is< easy
enough to dispatch into oblivion (I have set up a macro to
just that purpose) any piece of mail in which one has no
interest.
 
As it now stands, HUMANIST seems a touch too formal.
=========================================================================
Date:         1 December 1987, 14:24:09 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: "Michael Sperberg-McQueen"   <U18189@UICVM>
Subject:  CD ROMs, micro- and mainframe computing with large corpora
 
A late contribution to the discussion provoked by Abigail Young
about CDs as a medium of data distribution.  [60 lines or so.]
 
I think Dr. Young hit the nail on the head with the question "Are
there people out there waiting with bated breath for the new OED
on CD ROM?"  Because certainly if we're not excited about the OED
as a group, then we're not as a group going to be very excited about
anything.
 
Yes, I AM waiting with bated breath for an electronic OED, but I was
far more excited to learn it would be available on tape than I was
to hear about the CD ROM version.  I like and use my PC, and I hope
someday to be able to work with massive textual corpora on it, but
at least for the moment I think magnetic tape is a far better medium
for distribution.  For one thing, I don't have a CD ROM drive, and
I don't know anyone who does, except for Bob Kraft and a classicist
here who has a Ibycus micro on loan but does her Greek word processing
on our mainframe.  Tape drives, on the other hand, will be available
at any school in the country.  For another, tape drives allow me to
change the data -- add to it, enhance it, reduce its size -- and
make another copy.  CD ROM doesn't.  For that reason alone, I'll
wait for WORM before buying a new drive for my PC.  And finally,
mainframes seem to me by and large better at dealing with large
quantities of data.  That is changing, to be sure.  But I can edit
the Nibelungenlied in storage on the mainframe, and extract every
occurrence of the name 'Sivrit' in a couple of seconds.  My PC
with its 640 Kbytes can only hold a fourth or so of the Nibelungenlied
in RAM at a time.  To be sure, a micro-Ibycus could also find all
the occurrences of 'Sivrit' in a few seconds -- if the Nibelungenlied
were on a CD ROM.  But it's not, and there aren't enough Germanic
philologists in the country to make it economically feasible to
make one.
 
Nor do I WANT a frozen, unalterable text of the Nibelungenlied.  I
want to be able to index it, to add parsing information or scansions
to the file so I can search on them, and so on.  Not to mention the
need to correct typos in the transcription and add manuscript
variants.  For all this, we need erasable media, not CD ROMs.
 
Magnetic tapes do have the drawback, for some users, that they are
typically readable only on mainframes.  (There are PC-based 9-track
tape drives, but they aren't real common.)  And many humanists don't
like working on mainframes.  Even for those users, however, the
local academic computing center should be, and almost always is,
in a position to read the tape and help the user download the data
to a microcomputer.  No, it's not always easy.  And no, it's not
always fast.  A megabyte an hour or so.  But the chances are good
the academic computer center knows how to do it, and does it regularly.
All the ones I've ever known as a user or staff member do.
 
There may be centers that do NOT provide this kind of service,
although I have never seen one and never heard of one.  But if
they exist, those centers should be DRIVEN to provide support
for humanities computing, support for microcomputing, and support
for data exchange between mainframes and micros.  If they are not
providing these services, they are not doing their job.
 
Given the kind of support computer centers ought to be providing
for humanist users, and given the kind of flexible text humanistic
work seems to need, I think CD ROMs look much less promising as
a means of data distribution than WORM disks and magnetic tape,
and in some cases floppy disks.
 
All of which is just one user's opinion.
 
-Michael Sperberg-McQueen
 University of Illinois at Chicago (U18189 at UICVM)
=========================================================================
Date:         1 December 1987, 14:26:50 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: Marshall Gilliland <GILLILAND@SASK>
Subject:  Subject line comments (25 lines)
 
Oh, my, the HUMANIST subject lines may get long.  Now, in addition to
the honest-to-goodness subject, and to the number of lines in the
message, Abigail Young suggests
 
"we could make use of a subject line to indicate the topic of a posting, and
whether it were part of an on-going discussion, thus enabling those who need to
clear their readers quickly to ignore discussions which were not of interest to
them."
 
Maybe we serious, dull writers can use such an augmented subject line as a
place to pun?  But woe is me for, alas, my mailer does not accept long subject
lines.  Can it be that some people will have to read the beginning of the
message to learn what we want to ignore?  Will we be like this lady:
 
        Lizzi Borden took an axe
        And plunged it deep into the VAX
        Don't you envy people who
        Do all the things you want to do?
 
(Thanks to Jerry Whitnell in California for the ditty.)
 
Maybe we'll relax a bit as our marking gets frantic and we hear the carols of
the season.
 
Marshall Gilliland       U of Saskatchewan
=========================================================================
Date:         1 December 1987, 15:55:42 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: "James H. Coombs" <JAZBO@BROWNVM>
Subject:      Concordance for Mac
 
Does anyone know of concordance programs for the Mac?  Thanks.  --Jim
=========================================================================
Date:         1 December 1987, 15:58:31 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: "Michael Sperberg-McQueen"   <U18189@UICVM>
Subject:  Text encoding guidelines -- progress report (225 lines)
 
A followup on the current status of the ACH effort to formulate
guidelines for text encoding practices.
 
   ******************************************************************
   * NOTE: The following encoding conventions have been used to     *
   *       represent French accents throughout this message:        *
   *                                                                *
   *   To Represent Accents  --  Pour la representation des accents *
   *    /       acute accent - accent aigu                          *
   *    `       grave accent - accent grave                         *
   *                                                                *
   * The accent codes are typed    Les codes pour les accents se    *
   * AFTER the letter, and are     trouvent APRES la lettre qu'ils  *
   * used with both upper and      modifient, et s'utilisent avec   *
   * lower case letters.           les majuscules aussi bien que    *
   *                               les minuscules.                  *
   ******************************************************************
 
 
On November 12 and 13, 1987, 31 representatives of professional
societies, universities, and text archives met to consider the
possibility of developing a set of guidelines for the encoding of texts
for literary, linguistic, and historical research. The meeting was
called by the Association for Computers and the Humanities and funded
by the National Endowment for the Humanities.  The list of participants
is appended to this document.
 
The participants heartily endorsed the idea of developing encoding
guidelines. In order to guide such development, they agreed on
the following principles:
 
 
       The Preparation of                 Re/daction des directives
     Text Encoding Guidelines             pour le codage des textes
 
                         Pougheepsie, New York
                            13 November 1987
 
1.  The guidelines are intended   1.  Le but des directives est de cre/er
    to provide a standard format      un format standard pour l'e/change
    for data interchange in           des donne/es utilise/es pour la
    humanities research.              recherche dans les humanite/s.
 
2.  The guidelines are also       2.  Les directives sugge/reront
    intended to suggest principles    e/galement des principes pour
    for the encoding of texts         l'enregistrement des textes
    in the same format.               destine/s a` utiliser ce format.
 
3.  The directives should         3.  Les directives devraient
 
  a.  define a recommended          a.  de/finir une syntaxe recommande/e
      syntax for the format             pour exprimer le format,
 
  b.  define a metalanguage         b.  de/finir un me/ta-langage
      for the description               de/crivant les syste`mes de
      of text-encoding schemes,         codage des textes,
 
  c.  describe the new format       c.  de/crire par le moyen de ce
      and representative                me/talangage, aussi bien qu'en
      existing schemes both in          prose, le nouveau syste`me de
      that metalanguage and             codage aussi bien qu'un choix
      in prose.                         repre/sentatif de syste`mes
                                        de/ja` en vigueur.
 
4.  The guidelines should         4.  Les directives devraient proposer
    propose sets of coding            des syste`mes de codage utilisables
    conventions suited for            pour un large e/ventail
    various applications.             d'applications.
 
5.  The guidelines should         5.  Sera incluse dans les directives
    include a minimal set of          l'e/nonciation d'un syste`me de
    conventions for encoding          codage minimum, pour guider
    new texts in the format.          l'enregistrement de nouveaux textes
                                      conforme/ment au format propose/.
 
6.  The guidelines are to be      6.  Le travail d'e/laboration des
    drafted by committees on:         directives sera confie/ a` quatre
                                      comite/s centre/s sur les sujets
                                      suivants:
 
  a.  text documentation            a.  la documentation des textes,
 
  b.  text representation           b.  la repre/sentation des textes,
 
  c.  text interpretation           c.  l'analyse et l'interpre/tation
      and analysis                      des textes
 
  d.  metalanguage definition       d.  la de/finition du me/talangage et
      and description of                son utilisation pour de/crire le
      existing and proposed             nouveau syste`me aussi bien que
      schemes                           ceux qui existent de/ja`.
 
    co-ordinated by a steering        Ce travail sera coordonne/ par un
    committee of representatives      comite/ d'organisation ou`
    of the principal                  sie`geront des repre/sentants des
    sponsoring organizations.         principales associations qui
                                      soutiennent cet effort.
 
7.  Compatibility with existing   7.  Dans la mesure du possible, le
    standards will be maintained      nouveau syste`me sera compatible
    as far as possible.               avec les syste`mes de codage
                                      existants.
 
8.  A number of large text        8.  Des repre/sentants de plusieurs
    archives have agreed in           grandes archives de textes en form
    principle to support the          lisible par machine acceptent en
    guidelines in their function      principe d'utiliser les directives
    as an interchange format.         en tant que description des formats
    We encourage funding agencies     pour l'e/change de leurs donne/es.
    to support development of         Nous encourageons les organismes
    tools to facilitate this          qui fournissent des fonds pour la
    interchange.                      recherche de soutenir le
                                      de/veloppement de ce qui est
                                      ne/cessaire pour faciliter cela.
 
9.  Conversion of existing        9.  En convertissant des textes
    machine-readable texts to         lisibles par machine de/ja`
    the new format involves the       existants, on remplacera
    translation of their              automatiquement leur codage actuel
    conventions into the syntax       par ce qui est ne/cessaire pour les
    of the new format.  No            rendre conformes au format nouveau.
    requirements will be made for     Nul n'exigera l'ajout
    the addition of information       d'informations qui ne sont pas
    not already coded in the          de/ja` repre/sente/es dans ces
    texts.                            textes.
 
                                         (trad. P. A. Fortier)
 
                            ******************
 
The further organization and drafting of the guidelines will be
supervised by a steering committee selected by the three sponsoring
organizations:  ACH (the Association for Computers and the Humanities),
ACL (the Association for Computational Linguistics), and ALLC (the
Association for Literary and Linguistic Computing).  Drafts of the
guidelines will be submitted for comment to an editorial committee with
representatives of all participating organizations (in addition to the
sponsors, thus far:  the Modern Language Association, the Association
for Computing Machinery Special Interest Group for Information
Retrieval, and the Association of American Publishers; the following
groups have indicated interest informally but have not yet formally
pledged participation, in most cases pending a foraml vote: the
Linguistic Society of America, the Association for Documentary Editing,
the American Philological Association. The American Anthropological
Association, plus several organizations within Europe, are now being
asked to consider participation.
 
The interchange format defined by the guidelines is expected to be
compatible with the Standard Generalized Markup Language defined
by ISO 8859, if that proves compatible with the needs of research.  The
needs of specialized research interests will be addressed wherever it
proves possible to find interested groups or individuals to do the
necessary work and achieve the necessary consensus.  Formation of
specific working groups will be announced later; in the meantime, those
interested in working on specific problems are invited to contact
either Dr. C. M. Sperberg-McQueen, Computer Center, University of
Illinois at Chicago (M/C 135), P.O. Box 6998, Chicago IL 60680 (on
Bitnet: U18189 at UICVM), or Prof. Nancy Ide, Dept. of Computer
Science, Vassar College, Poughkeepsie NY 12601 (on Bitnet:  IDE at
VASSAR).
 
                                                 - N.I., C.M.S-McQ
 
------------------------------------------------------------------------------
 
                    List of Participants
 
  NOTE: Association names are given following the names of their
        representatives at this meeting.
 
   Helen Aguera, National Endowment for the Humanities
   Robert A. Amsler, Bell Communications Research
   David T. Barnard, Department of Computing and Information Science,
      Queen's University, Ontario
   Lou Burnard, Oxford Text Archive
   Roy Byrd, IBM Research
   Nicoletta Calzolari, Istituto di linguistica computazionale, Pisa
   David Chestnutt  (Assoc. for Documentary Editing, American Historical
      Assoc.), Department of History, University of South Carolina
   Yaacov Choueka (Academy of the Hebrew Language), Department of
      Mathematics and Computer Science, Bar-Ilan University
   Jacques Dendien, Institut National de la Langue Francaise
   Paul A. Fortier, Department of Romance Languages, University of
      Manitoba
   Thomas Hickey, OCLC Online Computer Library Center
   Susan Hockey  (Association for Literary and Linguistic Computing),
      Oxford University Computing Service
   Nancy M. Ide (Association for Computers and the Humanities),
      Department of Computer Science, Vassar College
   Stig Johansson, International Computer Archive of Modern English,
      University of Oslo
   Randall Jones  (Modern Language Association), Humanities Research
      Computing Center, Brigham Young University
   Robert Kraft, Center for the Computer Analysis of Texts, University of
      Pennsylvania
   Ian Lancashire, Center for Computing in the Humanities, University of
      Toronto
   D. Terence Langendoen (Linguistic Society of America), Graduate
      Center, City University of New York
   Charles (Jack) Meyers, National Endowment for the Humanities
   Junichi Nakamura, Department of Electrical Engineering, Kyoto
      University
   Wilhelm Ott, Universitaet Tuebingen
   Eugenio Picchi, Istituto di linguistica computazionale, Pisa
   Carol Risher (American Association of Publishers), American
      Association of Publishers, Inc.
   Jane Rosenberg, National Endowment for the Humanities
   Jean Schumacher, Centre de traitement e/lectronique de textes,
      Universite/ catholique de Louvain a` Louvain-la-neuve
   J. Penny Small (American Philological Association), U.S. Center for
      the Lexicon Iconographicum Mythologiae Classicae, Rutgers
      University
   C.M. Sperberg-McQueen, Computer Center, University of Illinois at
      Chicago
   Paul Tombeur, Centre de traitement e/lectronique de textes,
      Universite/ catholique de Louvain a` Louvain-la-neuve, Belgium
   Frank Tompa, New Oxford English Dictionary Project, University of
      Waterloo
   Donald E. Walker (Association for Computational Linguistics), Bell
      Communications Research
   Antonio Zampolli, Istituto di linguistica computazionale, Pisa, Italy
 
[end of message]
=========================================================================
Date:         1 December 1987, 16:22:58 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: Dr Abigail Ann Young  <YOUNG at UTOREPAS>
Re: CD-ROMs & other media; on-going [body of message 26 ll inclusive]
 
(Was that too long, Marshall?)
 
Does anyone have any information on WORM drives?  A
non-HUMANIST colleague told me he had heard about them
at an IBM-sponsored conference and that they were the
best thing since sliced bread, basically.  I've also heard
that a disk for an IBM WORM drive would be capable of being
written to only once, which would certainly make such a disk
only slightly more useful than a CD-ROM, and considerably less
useful than a magnetic tape.
 
I am always suspicious of new devices which will revolutionize
my life and save me time, trouble, etc.  I think it is because
I tended to believe the Popular Science/Mechanics picture of
the future when I was a child.  But a WORM drive & disk capable
of multiple disk writes as well as reads sounds very, very
appealing.
 
Abigail Ann Young
Research Associate,
Records of Early English Drama
University of Toronto
young at utorepas
=========================================================================
Date:         2 December 1987, 00:17:29 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor:  "James H. Coombs" <JAZBO@BROWNVM>
Subject:      Sonar; Mac; concordance vs. retrieval (54 lines)
 
I asked about concordance programs for the Mac.  Someone sent me the review of
Sonar and a couple of others have mentioned it.  The review does not say
anything about concording texts with Sonar, however.  I have never used one of
these retrieval programs.  I have used WatCon and have written a concordance
program for the IBM PC (for multiple versions of the same text).  IS Sonar
appropriate for generating concordances? concordances that will be printed and
distributed?  Does it properly handle lines of poetry, for instance? and give
columns of lines with locations?  I assume that WordCruncher from BYU can do
such, since it is a descendent of a concording program (unless there is an
equivocation on "concord" here, and please let us all know if there is).
 
I am in the process of designing a retrieval engine and browser for the
American Heritage Dictionary.  When I think of retrieval programs, I think of
inverted indices, hash tables, and the like.  "Use this information to go find
X and then let's Y it."  That, to me, is a typical retrieval action, and the
access is typically random.
 
Concording, however, at least in the traditional sense, is sequential and
exhaustive.  One COULD use a retrieval application to concord a text, but it
would be very inefficient and would probably require additional programming
anyway.  One would have to have a means to call the retrieval engine
iteratively for every word in the text as well as the means to format and write
the results someplace.
 
Are WordCruncher and Sonar dual applications?  In order to index, one has to
perform much of the same processing as is required for concording (process
sequentially and exhaustively, split words out of lines, stop words,
lemmatize?, cross reference (See also xxx)?).  Well, some of the routines are
the same anyway, at least to the extent that the developer of one type of
application would have a start on developing the other.  It begins to sound
like integrated systems a la Symphony vs. 1-2-3.  Does the system that offers
both really do both jobs well?  Or, first I guess, are there systems that
offer both?
 
--Jim
 
Dr. James H. Coombs
Software Engineer, Research
Institute for Research in Information and Scholarship (IRIS)
Brown University
jazbo@brownvm.bitnet
=========================================================================
Date:         2 December 1987, 12:12:08 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor:  Bob Kraft  <KRAFT@PENNDRLN>
Subject:  CD-ROM & WORM  [88 lines]
 
The recent observations by Abigail Young and Michael
Sperberg-McQueen on CD-ROM and WORM technologies call for
some comment from the "pro" (and experienced) side. I hope
to keep them brief, just to pinpoint some of the issues.
Michael's comments seemed to me to miss many crucial points,
and did not reflect the attitudes or situation of numerous
people with whom I am in regular contact.
 
1. The difference between CD-ROM and WORM for this discussion
is negligible, as Abigail suspected. Right now, WORM drives
are more expensive and less tested publicly, but cheaper to
produce a single disk. But once you have that single WORM disk,
which currently costs about $65, there is no price advantage
to making multiple copies (50 copies would cost $3250). With
the CD-ROM, it might cost $3000 to master but each additional
copy would cost very little (perhaps $ 7 each for 100). Thus
it would be much cheaper to make 100 copies of a CD-ROM than
100 copies of a WORM disk at present. And the CD-ROM holds
more than twice as much as the WORM disks with which we are
working. So WORM is fine for limited production or in-house
purposes, CD-ROM is better for larger distribution, etc.
Neither can be changed once they are mastered, although
WORM can be mastered in stages, while CD-ROM is a once for all
mastering process.
 
2. Are people anxiously waiting for data distributed on CD-ROM?
In my experience, YES. We have many advance orders for the CCAT
CD-ROM, and more inquiries. Ted Brunner can report on the TLG
experience. What sorts of people are asking? Obviously, IBYCUS SC
owners (about 130 machines) who are set up to use CD-ROM as part of
the package; Librarians, who need massive amounts of data in a
bibliographically controlled context (static is good, in this
setting!); the mass of individual scholars/students who are not
in a tape-oriented environment such as Michael describes (his
experience is not at all typical, even at the ideal level, of
the majority of people with whom I am in contact -- people in
small colleges, seminaries, or operating individually, with
no access to a real mainframe or effective consultation).
 
3. What is attractive to these inquirers? Several fairly obvious
things. (1) Amount of material available -- e.g. all of Greek
literature through the 6th century on the TLG disk! (2) Price
of the material (on tape, the TLG data cost over $4000; on
CD-ROM, it is about 10% of that) (3) Convenience of storage,
access, etc. -- I would rather download from a CD-ROM than
from a tape drive, any day. It is the old roll vs codex issue
once again (microfilm vs microfiche, etc.). (4) Quality control --
what is on the CD-ROM may have errors, but at least they can be
identified and controlled (and corrected in a later release);
I don't have to wonder whether my dynamic file has become
corrupted (as happens more than I want to admit). (5) Speed of
access to large bodies of data -- even if the programs are not
yet in place and it will take 20 times as long to search a
large CD-ROM file on the IBM than on IBYCUS, it is at least
possible to do the search (or to search multiple files, in
various configurations), which is extremely difficult in any
other manner short of a dedicated mini.
 
I am rambling and apologize. Much more needs to be said, but I
need to finish preparing ID tables for the CCAT CD-ROM if it is
to be mastered by the end of the year! Perhaps it would not be
feasible economically to put the Nibelungenlied on its own
CD-ROM, but to have it as a small part of a CD-ROM with all sorts
of other texts is what we are talking about! That is not only
feasible, but it seems to me highly desirable, IBYCUS or not.
And I can still download what I want to edit, or manipulate, etc.
I lose none of that capability. But I gain by having the original
fixed at hand for comparison, etc.
 
Libraries will rapidly be CD-ROM centered, and that is as it ought
to be. Hopefully computer centers will not be bypassed by this
exciting and useful development!
 
Bob Kraft
=========================================================================
Date:         2 December 1987, 14:41:59 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: Jim Cerny <J_CERNY@UNHH>
Subject:  Summary of responses on KJV Bible for Macintosh (incl.
          [152 lines]
 
Thanks to everyone who responded to my recent inquiry about the
availability of the King James Version of the Bible for the
Apple Macintosh.  I've tried to acknowledge or quote from
all the responses (as of 01-Dec) in the summary that follows.
 
======================================================================
John J. Hughes (XB.J24@STANFORD) had the most definitive answer,
reflecting no doubt the research for his book "Bits Bytes and Bible
Studies".  Robin C. Cover (ZRCC1001@SMUVM1) referenced this book
and Marshall Gilliland (GILLILAND@SASK) and Tim Seid (ST401742@BROWNVM)
mentioned sources that Hughes lists.
        Hughes wrote:
----------------------------------------------------------------------
There are several companies that sell King James Versions of the
Bible for Macintoshes. Here are their names, addresses, and so
forth. The first program is reviewed in detail in chapter 3 of
BITS, BYTES, & BIBLICAL STUDIES (Zondervan, 1987).
 
  THE WORD Processor
  Bible Research Systems
  2013 Wells Branch Parkway, Suite 304
  Austin, TX 78728
  (512) 251-7541
  $199.95
  Requires 512K; includes menu-driven concording program
  CP/M version available for Kaypros.
 
  MacBible
  Encycloware
  715 Washington St.
  Ayden, NC 28513
  (919) 746-3589
  $169
  128K; text files that may be read by MacWrite
       and Microsoft Word.
 
  MacScripture
  Medina Software
  P.O. Box 1917
  Longwood, FL 32750-1917
  (305) 281-1557
  $119.95
  128K; text files designed to be used with MacWrite.
 
 
 
=======================================================================
Marshall Gilliland  (GILLILAND@SASK) pointed to a very unexpected
source, i.e., one of the DECUS (DEC Users Society) tapes.  We are an
active VAX/VMS site and we did indeed have the tape.  It is on VAX
System SIG Symposium tape VAX86D (from the Fall 86 DECUS meeting in
San Francisco).  In uncompressed form the files take about 9000
VAX disk blocks (roughly 5 MB).  It is all in upper case.  Presumably
could be downloaded to a PC, but don't think I will attempt that!
        Gilliland wrote, in part:
-----------------------------------------------------------------------
If you have VAX equipment there and get DECUS
tapes then ask one of your systems people for the copy of the ascii text of
the KJ Bible that was on a DECUS tape not too long ago (I think in 1987).
 
Marshall Gilliland
English Dept.
U. of Saskatchewan
=======================================================================
Tim Seid (ST401742@BROWNVM) pointed me to CCAT (Center for Computer
Analysis of Texts) and Bob Kraft (KRAFT@PENNDRLN) from CCAT also
responded.  Bob Kraft also sent me several files about CCAT and its
services and I've tacked CCAT's info-file at the end of this summary
... "old hands" may be aware of CCAT's electronic newsletter, ONLINE
NOTES, but it was new to me and their info-file tells how to
subscribe.
        Bob Kraft wrote:
-----------------------------------------------------------------------
I have not seen my MAC person (Jay Treat) since your inquiry
about the KJV arrived, but I am reasonably sure that it is
already available from CCAT for the MAC, or will be very soon.
We have been distributing the KJV and RSV (along with the Greek
and Hebrew texts of the Bible) to IBM types for over a year now,
and all these materials will be on our soon to be released
CD-ROM. Most of it has been ported to the MAC as well.
I will send you an order form and other information separately.
Bob Kraft
=======================================================================
Ronald de Sousa (DESOUS@UTORONTO) mentioned the possibility of using
DIALOG services.
        de Sousa wrote:
-----------------------------------------------------------------------
You'll probably get some satisfactory answers, but in the meantime I
wonder whether you you that the cheap after-hours service of DIALOG
Info Services, called "Knowledge Index", has the King James full text
on line, and can be searched using the search options of that service.
I seem to recall that for $200 you'd get about 8 hourse of search time
-- quite enough for a limited project. Of course, the same is
available on DIALOG itself, with somewhat more sophisticated options..
=======================================================================
Roger Hare (R.J.HARE@EDINBURGH.AC.UK) responded from JANET that
Catspaw Inc. has the King James Bible.  They specialize in supporting
PC-based implementations of SNOBOL and related products, as I recall.
        Roger Hare wrote:
-----------------------------------------------------------------------
Catspaw do a version of the King james Bible for 50 dollars. My
catalogue dosen't say what machine it's for, but if you have access to a
maniframe perhaps you could get it onto your Macintosh via file
transfers?
 
their address is:
 
Catspaw Inc.
PO Box 1123
Salida
Colarado
81201
USA.
 
Roger Hare.
=======================================================================
Finally, Chuck Bush (ECHUCK@BYUADMIN) mentioned that they have
the King James Bible at the Humanities Research Center at Brigham
Young University and I presume he could supply more details.
        Chuck Bush wrote:
-----------------------------------------------------------------------
At BYU we do have the text of the King James Bible in machine readable
form.  The original data is on a mainframe, but we have downloaded it
to PC disks etc. for those who have ordered it in other forms.  I have
a copy of it on a Macintosh Bernoulli cartridge from which it would be
relatively easy to copy it to some other Macintosh medium--even floppies.
 
However, this is just the TEXT.  There isn't any software to access it
conveniently.  Sonar is the only text retrieval software I know of for
the Macintosh and I don't think it would be very satisfactory.  For one
thing, it couldn't give you chapter and verse references.
 
Chuck Bush   <ECHUCK@BYUADMIN>
Humanities Research Center
Brigham Young University
=======================================================================
Interested HUMANISTs should also consult the guide to external services
of the Center for Computer Analysis of Texts (CCAT), Univ. of
Pennsylvania, available from Jack Abercrombie (JACKA@PENNDRLS.BITNET)
=========================================================================
Date:         2 December 1987, 20:29:00 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Vox populi (46 lines)
 
Dear Colleagues:
 
My thanks to the several people who offered their views on the
conversational style of HUMANIST. The majority of speakers have clearly
voiced a preference for a somewhat more open manner of conversational
exchange than has been the rule so far. For what it's worth, I welcome
this change without reservation, since HUMANIST is by design ruled
chiefly by its members rather than by its editor.
 
Until an absolutely foolproof method of screening out junk mail is
found, I will continue to have all submissions to HUMANIST sent first to
me and will forward the ones of human origin to the membership. This
means very little work for a very large improvement in the quality of
the environment.
 
One of the interesting (but, I guess, not surpising) characteristics of
HUMANIST is the number of members who never say anything -- yet continue
to put up with the large volume of mail. I imply no criticism
whatsoever, for there are many noble and practical reasons for remaining
silent. Nevertheless, I suspect that some members may occasionally have
something to say but wonder if what they have to say is worthy. In
general the advice I follow is, say it and see what happens. One
possibility for the diffident is to send in a contribution with a note
attached asking my advice, for whatever it's worth.
 
Please let me know if anything about HUMANIST bothers you or otherwise
seems to need improvement. The ListServ software (written and maintained
on a voluntary basis by a remarkable person who lives in Paris) we
cannot fundamentally alter. It has certain characteristics that some may
consider flaws but that seem to me merely features to be exploited in
the best possible way. Locally HUMANIST is supported by my Centre and by
the good will of our Computing Services, i.e., by two busy people.
There's not much that can be done given these resources, but some changes
can be made without much effort -- like the screening of junk mail.
 
In short, lead on!
 
Yours, W.M.
_________________________________________________________________________
Dr. Willard McCarty / Centre for Computing in the Humanities
University of Toronto / 14th floor, Robarts Library / 130 St. George St.
Toronto, Canada M5S 1A5 / (416) 978-4238 / mccarty@utorepas.bitnet
=========================================================================
Date:         2 December 1987, 22:53:10 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: Sebastian Rahtz <CMI011@IBM.SOUTHAMPTON.AC.UK>
 
Heres one for the eager punters; a colleague of mine wants
to study the New Kingdom El-Amarna literature (Egypt, mid 14th C BC).
Anybody care to say if someone has already typed in such stuff
onto the computer? apologies if its obvious...
 
sebastian rahtz  computer science university southampton uk
=========================================================================
Date:         2 December 1987, 23:23:34 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributor: "Michael Sperberg-McQueen"   <U18189@UICVM>
Subject:  CD ROMs, mainframes
 
Many thanks to Bob Kraft for his cogent remarks about CD ROMs.  I seem
to have given a rather scrooge-like impression in my most recent posting
about CD ROMs and PCs, which does not reflect my positive opinion of
PCs.
 
Yes, CD ROMs are ideal for certain kinds of data distribution,
especially for (a) stable data and (b) large numbers of recipients.
For humanistic research applications with those characteristics,
they are also obviously good ideas.  WORM disks, or better yet
eraseable mass storage devices, would make many of the same
advantages available for non-static data and small numbers of
recipients.  But neither description fits all research fields.
 
I am less convinced that institutional support for faculty use of
mainframes and microcomputers is untypical in North America.  This
is an empirical question, and I would like to put it up for discussion:
what is the situation at the sites represented on HUMANIST with
regard to:
 
    (a) support for humanities computing formally provided by
the institution via centralized or specialized facilities,
    (b) faculty-student computing on mainframes or minis
    (c) institutional support for microcomputing
    (d) institutional support for mainframe-micro data transfer.
 
It is possible that Bob Kraft is right and my experience is
untypical.  But it seems also possible that Penn and CCAT get so
much business from people without mainframe access because those
who do have local computer centers get their help locally.  It
would be useful, I think, for all of us if we could get some idea
of the facts in this area.  The ACH Special Interest Group for
Humanities Computing Resources (the sponsor of HUMANIST) did
plan once to distribute a questionnaire to gather this information
but the final questionnaire design seems to have been delayed,
so let's caucus informally now.
 
Michael Sperberg-McQueen, University of Illinois at Chicago
=========================================================================
Date:         2 December 1987, 23:34:33 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      Electronic OED -- for the blind?
 
Contributor: Norman Zacour   <ZACOUR at UTOREPAS>
 
I have a blind, computerized friend, a professor of English and a
professional writer, who got very excited when I passed on to him the
recent messages from HUMANISTS about plans for making the OED available
in electronic form.  He had visions - no joke intended - of consulting
it through his speech synthesizer on his PC.  His enthusiasm was
dampened by the planned use of colour to display certain types of
information.  Does anyone happen to know if the OED has any plans for
handicapped users?  I suppose that there are still architects who design
monumental buildings without ramps for wheelchairs, but perhaps...
=========================================================================
Date:         3 December 1987, 09:55:14 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
Comments:     Contributor:  Robin C. Cover <ZRCC1001@SMUVM1>
From:         MCCARTY@UTOREPAS
Subject:      Al-Amarna Correspondence (in MRT format) [96 lines]
 
 
Sebastian Rahtz asked whether the El-Amarna letters exist in digitized
format somewhere.  I doubt whether many HUMANISTS are interested in
west-semitized Akkadian texts, but this query (and its answer) provides
an opportunity to tell a sad and familiar tale...and perhaps an
opportunity for someone to come forward with better news than I have to
tell...
 
The good news (for our assyriologist friend in the UK) is that
Knudtzon's edition of the El-Amarna letters is in machine-readable
format.  I have used the massive printed "concordances" (two tomes, each
about 7 inches thick).  These printouts originated at UCLA, so the best
bet is to contact Giorgio Buccellati at the department of Near Eastern
Studies, who might make tapes or diskettes available.   UCLA has a
growing corpus of MRT material for the ancient Near East, and in time it
will be available publically as part of Buccellati's hypermedia project
for Mesopotamia (Computer-Aided Analysis of Mesopotamian Materials);
some materials are currently available from Undena, and Buccellati
passed out sample diskettes of digitized Eblaite texts at AOS.
 
The sad tale I mentioned earlier is as follows: Du Cerf (a Paris
publisher) recently released a superb volume in its series Litteratures
anciennes du Proche-Orient on the El-Amarna letters.  Its
author/translator is William Moran of Harvard University, recognized as
a (probably THE) leading Amarna scholar, who has been putting together
this polished volume over the past 30-odd years.  His translations are
based upon extensive museum collations of the tablets, together with
restorations that can be made only by someone so familiar with the
"idioms" of international diplomacy (in the 14th century B.C.E) as
Professor Moran is.  So, the MRT
edition we *REALLY* want is Moran's, not the 1915 edition of Knudtzon.
But you won't find it published on diskette with this Du Cerf volume
(which does not even have transliterated original text).  According to
the publishers, it would not be cost-effective to publish the original
text on paper, and as for a MRT edition of the text....well...
 
Shortsightedness like this has to stop, but who is responsible for
"stopping it?"  A single individual (as in this case, Moran) probably
can do very little to force publishers to change their ways.  But how
about collective bargaining....we publish such scholarly materials ONLY
with publishers that are sensitive about the future of scholarship, and
about the precious treasure we have in ancient literature.  This means
placing premium value on original texts in machine-readable form -- only
thus are they truly useful and accessible to modern scholarship -- and
making these texts available in the public domain.  I suspect that this
problem is more acute for orientalists than for classicists and other
humanities-literary subspecialty areas; we have special orthographies
and printing problems which are expensive and demanding.  But my
suggestion is that we must encourage and demand higher standards of
cooperation from publishers such that valuable (priceless!) human
efforts are not lost on a Macintosh diskette after it passes from the
departmental secretary or word-processing pool to the publisher.  Does
anyone else share this point of view?  Am I too idealistic?
 
While I am in a lament mode, I might as well refer to another problem
that needs attention: the problem of coding standards.  There are
several efforts underway internationally to "encode" ancient Near
Eastern texts in transliteration
(Toronto - RIM; UCLA; Rome; Helsinki; etc), but to my knowledge there
are no agreed-upon standards.  In the case of purely alphabetic scripts,
the problem is frustrating but not fatal, since we can use
consistent-changes programs to standardize the data for archiving.  In
the case of syllabic (logographic; heiroglyphic) scripts -- Akkadian,
Sumerian, Hittite, Elamite, Egyptian -- the plethora of transliteration
schemes is more problematic.  No-one sends this kind of data with an
SGML prologue, so the best we can hope is that the encoding is
consistent and that we can unravel the format codes.  If anyone knows
about efforts to introduce standards for transliteration and
format-coding, would you kindly let me know?  I understand that the
committee for encoding standards (Nancy Ide; Michael Sperberg-McQueen)
recently funded by NEH will not initially address the needs of
orientalists.  If there are other orientalists "out there" on the
HUMANIST reader list -- should we organize ourselves?
 
Apologies to all if this is arcane, recondite or just downright boring.
I'd like to know if anyone out there shares some of my frustrations, or
sees solutions.
 
Professor Robin C. Cover
3909 Swiss Avenue
Dallas, TX  75204
(214) 296-1783
=========================================================================
Date:         3 December 1987, 09:58:21 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
Comments:     Contributor: Brendan O'Flaherty  <AYI017@IBM.SOUTHAMPTON.AC.UK>
From:         MCCARTY@UTOREPAS
Subject:      E-mail to Australia
 
Can anyone tell me if e-mail to the Antipodes (ie Australia) has a charge?
and if so who pays---the sender if outside Australia or the Receipient?
Thanks in advance.
=========================================================================
Date:         3 December 1987, 13:36:52 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      The Thesaurus Linguae Graecae (TLG) on CD-ROM
 
The following has been contributed by Theodore Brunner, Director
of the TLG Project, from a memo circulated to all TLG customers.
Anyone wishing to arrange for a license agreement should contact
Professor Brunner, Thesaurus Linguae Graecae, University of
California at Irvine, Irvine, CA 92717 U.S.A., telephone: (714)
856-7031, e-mail: TLG@UCIVMSA.bitnet. The license per CD-ROM ,
including a copy of the printed TLG Canon, is not expensive: ini-
tial registration fee (plus first year fee) is $200 to institu-
tions and $120 to individuals; annual fee $100 to institutions,
$60 to individuals; optional one-time payment for 5 years $500 to
institutions, $300 to individuals. (All prices are in US $.)
 
_________________________________________________________________
TLG CD-ROM CUSTOMERS:
 
We have been receiving numerous questions related to TLG CD ROM
dissemination plans and policies; here is miscellaneous informa-
tion on these subjects:
 
l. To date, the TLG has produced two CD ROMs, disk "A" and disk
"B". Disk "A" contains approximately 27 mlllion words of TLG
text, as well as an electronic version of the TLG Canon. Disk "B"
contains the same 27 million words of text, the TLG, and an Index
to the TLG texts on the CD ROM.
 
Disk "A" also contains miscellaneous non-TLG materials, including
some Latin, Coptic, and Hebrew texts, some epigraphical
materials, as well as portions of the Duke Data Bank of
Documentary Papyri.  The non-TLG materials were included on TLG
CD ROM "A" for one reason only: this disk was produced (as was CD
ROM "B") primarily for experimental purposes, i.e., to aid in the
development of software resources designed to enhance utilization
of the (relatively new) CD ROM data storage medium.
 
Neither disk "A" nor disk "B" reflects the High Sierra format
standard (established after both of these CD ROMs were produced.
 
2.  In short order, the TLG will release a new CD ROM, disk "C".
This disk will contain approximately 41.5 million words of TLG
text, an index to this text material, and the TLG Canon.
 
Individuals and institutions already holding license to "A" or
"B" disks are entitled to receive "C" disks free of charge. This
(as provided for in the license agreement governing use of TLG
ROMs) will be on an exchange basis, i.e., disks previously issued
by the TLG must be returned to the TLG prior to the issuance of a
"C" disk. TLG LICENSEES SHOULD NOT RETURN THEIR "A" OR "B" DISKS
UNTIL DISK "C" IS OFFICIALLY RELEASED. [Notice will appear on
HUMANIST when disk "C" is ready.]
 
3.  Questions have been raised about the absence of non-TLG
material on the "C" disk.  The TLG controls and licenses only its
own materials, and license agreements previously executed pertain
to the TLG materials on the disks only.  Current TLG CD ROM
licensees may, of course, continue to use their ("A" or "B")
 
disks throughout the course of their license period; they will
not be issued "C" disks, however, until they have returned their
earlier CD ROM versions to the TLG.
 
It is the case, however, that the Packard Humanities Institute
(PHI) will be releasing its own CD ROM in the very near future;
this disk will contain Latin, Coptic, Hebrew, and epigraphical
materials, as well as a significant portion of the Duke
papyrological data bank.  It can be assumed that individuals and
institutions desirous of these materials can make arrangements
with PHI to gain access to them on a PHI disk.  Further informa-
tion on this subject can be obtained by contacting
 
      John Gleason, Packard Humanities Institute, P.0. Box 1330
      Los Altos, CA 94022 U.S.A.
 
4.  We have received numerous requests for technical documenta-
tion related to the forthcoming TLG CD ROM "C".  The internal
organization of the text files and of the I.D. table files will
be identical to the organization of these files on TLG CD ROM
"A".  The file directory and author table will be reorganized to
reflect the High Sierra standard. More detailed documentation is
currently being prepared and should be ready for distribution in
the near future.
 
Theodore F. Brunner, Director
November 8, 1987
_________________________________________________________________
=========================================================================
Date:         3 December 1987, 15:00:56 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
Comments:     Contributor: "Michael Sperberg-McQueen"   <U18189@UICVM>
From:         MCCARTY@UTOREPAS
Subject:      Enlightening the publishers, encoding Semitic (65 lines)
 
Three cheers for Robin Cover's idea of group pressure to bring
publishers to their senses regarding the preservation and
distribution of machine-readable materials.  Some publishers, to
their credit, are already alert to the issues involved--or so say
people who should know.  But there are still an awful lot of
them out there who behave the way Renaissance printers did with
Carolingian manuscripts:  mark it up, print it, and throw it out.
Anything we can do to preserve the fruits of scholarly labors, we
should do.
 
It would also be useful to have a better developed system of
text archives in North America -- either a network of regional or
discipline-based archives, or one central archive that would
take anything (the way Oxford does).  The latter would be
appealing because fewer texts might fall through cracks in the
system, but specialized collections would remain important because
they can do more intensive work on their holdings, the way Penn's
CCAT does.  A central North American text archive, acting in
concert with the European archives, might also be in a position
to help exert the kind of group pressure on publishers that
Robin Cover suggests.
 
Making the publisher's texts usable, by documenting as far as
possible the usual systems of typesetting codes found in the
publishing industry, is one goal of the ACH/ACL/ALLC initiative
for text-encoding guidelines.  (That goal is not wholly explicit
in the final document I posted here a couple of days ago, but it
was discussed at length during the planning meeting at Vassar and
clearly is important to a lot of people.)
 
The consensus of the planners at Vassar was also that transliteration
practices, and conventions for the encoding of character sets, should at
least be documented as far as possible in the guidelines.  Many
participants were leery of making specific recommendations for the
representation of specific characters, since local hardware features
and requirements can vary so widely.  Nevertheless, the experts
present agreed that it would not be insuperably difficult to provide
adequate documentation for the encoding of scripts which, like
Semitic scripts, provide special challenges to most commonly
available hardware.
 
That means that the guidelines can and should contain full information
on practices for encoding texts of interest to Orientalists--if the
Orientalists will document their existing practices.  If they can
also agree on common recommendations for future work, that consensus
can and should also be documented.  The same goes for any and all
other specialized interests.  These guidelines will belong to the
humanities computing community as a whole, and I hope the community
will work together to make them as complete and useful as we can.
 
Again, I reiterate the invitation:  anyone interested in helping
formulate the guidelines, either in general or with respect to some
specific question (e.g. the encoding of Akkadian, or the encoding of
numismatic materials, or the encoding of manuscript variants, or the
prosodic transcription of oral texts, or the encoding of hypertext
materials, or ...), should please contact Nancy Ide or myself.  This
invitation will be periodically renewed, as details for the formal
arrangement of the drafting committees are set, but if you let us
know now, we will have a better idea of how much interest there is,
and what kinds of special problems are on people's minds.
 
Michael Sperberg-McQueen, University of Illinois at Chicago
 
P.S. The opinions here expressed are as always mine, not necessarily
those of my employer, or the ACH, or the guidelines steering committee.
=========================================================================
Date:         3 December 1987, 19:16:39 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
Comments:     Contributor: David Nash <nash@cogito.mit.edu>
From:         MCCARTY@UTOREPAS
Subject:      E-mail to Australia (24 lines)
 
E-mail involving ACSNet (Australia, through the international
gateways, or even domestically between sites I think) has a charge for
the Australian end (whether sender or receiver).  It was something
like 10c/message plus 2c/line about a year ago.  Apparently many
institutions do not (yet?) pass on the charge to individual users.
 
The official position could presumably be got from
postmaster@munnari.oz, i.e. <munnari!postmaster@uunet.uu.net>
 
David Nash
Center for Cognitive Science
20B-225 MIT
Cambridge MA 02139
=========================================================================
Date:         3 December 1987, 19:20:14 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
Comments:     Contributor:  Laine Ruus <USERDLDB@UBCMTSG.BITNET>
From:         MCCARTY@UTOREPAS
Subject:      Archives (75 lines)
 
In response to Prof. Cover's impassioned plea, I can
only say that it IS possible, with some concerted effort
to force publishers to change their ways.
 
The American Sociological Association has recently, as
of Sept 1, 1987, in fact, begun to require of all periodicals
published under their aegis, that any computer readable files,
(both data and software) BE CITED in the bibliography.
There is an effort under way now to convince other academic
publishers to follow suit.
 
There are a number of reasons for citation of computer files:
(a) computer-readable files are intellectual property in their
own right, quite as much as publications in
other media, eg on paper, film, audio-tape, canvas, etc.
(This has been recognized by that most conservative institution,
the American Library Association, since the late 1970s.)
The authors (properly called 'principal investigators'), producers,
publishers, editors, and translators, of computer-readable files
deserve for their labours the same acknowledgement and recognition
as do the authors, composers, etc of intellectual property
in more traditional media.
(b) the citation of source materials in the bibliographies of
publications acknowledges the source materials used in the
research process, thus enabling ones peers to follow the same
line of reasonsing, using the same source materials, to (hopefully)
come to the same conclusions, thus corroborating
our initial reasoning - ie the peer review process.
(c) once computer-readable files are cited in bibliographies, they
will get picked up in the citation indices, and thus eventually
come to the attention of tenure committees. Thus individual
'authors' of these things will in time receive their due
academic brownie-points.
 
But citing computer-readable files is not enough. There must
also be a mechanism for preserving them for posterity and
making them available to others for secondary analysis.
 
Researchers are reluctant to make 'their' files available
to others for fear that they will not receive their due
acknowledgement (- the polite reason). Mandatory citation of
computer files in publications should help reduce this fear.
 
Many researchers are not aware that there in fact exists a
network of local data archives/data libraries in
academic institutions throughout the United
States and Canada, as well as a well developed system of
national data archives in Europe, most recently in Hungary,
Israel, and the USSR. Granted, these data archives primarily
concentrate on 'social science' data files, primarily because
that is the field from which the initial impetus for
their creation came. However, this orientation is not cast
in stone. And most of these data archives/libraries could
with appropriate overtures, be convinced that there
are other user communities that also need their services.
The social scientists just happen to have been among the
earliest and most vociferous. The point being that there
is already an institutional framework, staffed by knowledgeable
and experienced people who with very little effort could
provide the network of text archives that humanists seem
to want - all they want is a little proding.
------------------------------------------------------------
Laine Ruus, University of British Columbia Data Library
userDLDB@ubcmtsg.bitnet
=========================================================================
Date:         3 December 1987, 19:22:30 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
Comments:     Contributor:   STEPHEN@VAX.OXFORD.AC.UK
From:         MCCARTY@UTOREPAS
Subject:      E-mail to Australia
 
There is a relay at ULCC (UK) called EAN which
links with ACSnet - the fact that you do not register before
submitting suggests it is 'free': you may be able to learn
further from mailing an enquiry to laision@uk.ac.ean-relay
 
EAN can also link you to other European sites as well - maybe
to addresses 'missing' from EARN
 
stephen@uk.ac.oxford.vax
=========================================================================
Date:         4 December 1987, 13:02:56 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributed by Bob Kraft <KRAFT@PENNDRLN>
Subject:  CD-ROMs
 
Just to supplement Ted Brunner's information on the TLG
CD-ROM, regarding the non-TLG materials such as were
included on TLG disk "A" -- the present plan is for the
Packard Humanities Institute (PHI) jointly with the
Center for Computer Analysis of Texts (CCAT) at Penn
to produce an "experimental" CD-ROM at the heart of
which will be various Latin texts (being prepared at PHI),
Greek Papyri (Duke) and Inscriptions (Cornell, Princeton
Institute for Advanced Study), and a variety of biblical
and related materials in various languages (Hebrew, Greek,
Latin, Coptic, Syriac, Aramaic, Armenian) as well as sample
files from various other sources and projects (e.g. Dante
Commentary project, Milton Latin project, Kierkegaard in Danish,
Arabic poetry, some word lists, etc.). I call this disk a
"Sampler," and it is scheduled to be ready for distribution
by the end of this month (December). Again, the aim is to
give scholars, software developers, etc., a body of
consistently formatted (more or less!) materials on which to work
in various directions and at little cost. There will be a
notice on HUMANIST when the PHI/CCAT joint CD-ROM "Sampler"
is ready for distribution!
 
Bob Kraft for CCAT
=========================================================================
Date:         4 December 1987, 13:10:19 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributed by "James H. Coombs" <JAZBO@BROWNVM>
Subject: Enlightening the publishers, encoding Semitic (65 lines)
 
Michael Sperberg-McQueen has suggested that we need a text archive in North
America.  Is that a generally felt need?  What could a text archive here offer
that Oxford does not offer?  Certainly, shipping would be faster and cheaper,
but is there something more substantial?  Or are there real hardships now?
Or, could our needs be addressed by some adjustments in the services that
Oxford provides---such that we might better discuss our needs with Oxford
instead of duplicating their efforts.
 
If we DO need an archive in North America, who should institute and manage it?
What is the proper sort of organization?  And what's in it for them?  Will it
be a costly burden?  Or are we willing to pay for materials in order support
such a facility?  Would it be commercial or non-profit?
 
--Jim
 
Dr. James H. Coombs
Software Engineer, Research
Institute for Research in Information and Scholarship (IRIS)
Brown University
jazbo@brownvm.bitnet
=========================================================================
Date:         4 December 1987, 13:17:33 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributed by James H. Coombs <JAZBO@BROWNVM>
Subject:      ACH text markup
 
Some thoughts on guidelines for text markup, in response to
Michael Sperberg-McQueen's note.
 
1) Markup must be descriptive.
 
2) Delimiters should be '<' and '>' in conformance with the default of the
   new SMGL standard.
 
3) Markup/tag attributes should be allowed, and attribute names should be
   descriptive.
 
4) There should be no attempt at establishing a "closed" tag set.  The current
   AAP SGML application allows for definition of new tags, but it does not
   support such definition in a practical way.  The consequence is that people
   will use "list items," for example, when they should be using "line of
   poetry."  Within these guidelines, it can only be healthy to provide a
   list of tags that people should choose from when tagging certain entities.
 
   The point of this is that we cannot predict what textual elements will be
   of significance for what researchers.  We have to allow for the discovery
   of textual elements that no one has categorized previously.  At the same
   time, there is no point in having 30 different tags for "line of poetry."
   The guidelines should make clear that DESCRIPTION is paramount and that
   the use of particular tags is secondary.
 
5) In so far as possible, there should be requirements for minimal tagging.
   It would be a mistake to fail to tag "verse paragraphs" and "book" in
   *Paradise Lost*, for example, and any version that does not provide such
   tags must be considered inadequate and, ultimately, rejected.
 
6) There can be no limit placed on "maximal" tagging.  If a researcher needs
   every word tagged, we must allow for this.  It is a trivial matter to
   ignore or strip out such tagging.  Researchers with such needs cannot,
   at least for now, reasonably expect that others will provide such
   exhaustive tagging.
 
   Putting (5) and (6) together, we have a principle of base-level tagging
   with as much additional information as the original researchers care to
   provide.  Where there are common needs that may not be shared by the
   original researcher, it may still be appropriate to require that those
   common needs be met.  For example, the original researcher may not need
   to know about verse paragraphs, but we should still require that they be
   appropriately tagged.
 
7) Referential markup should be used in place of "special" characters, such
   as accented characters.  If a particular configuration supports an acute
   accent, for example, in hardware, the researcher may take advantage of
   those facilities.  When checking the document into an archive or passing
   it on to others, however, the acute accent must be translated to
   "&aacute;" (or whatever the SGML standard specifies---don't have my copy
   at hand).
 
 
This is off the top of my head, but enough for now.  I have other ideas on
this stuff, but they can come out if discussion ensues.  I am interested in
the project, but I don't have the time or money to travel to meetings right
now.
 
I also get the feeling from the preliminary document that you posted that
people are re-inventing SGML.  We already have, in SGML, a metalanguage for
generating descriptive markup languages.  I don't think that we need Document
Type Definitions right now, but even they might turn out to be useful once
SGML is established and SGML-support tools become widespread.
 
I haven't provided any defense of descriptive markup or SGML here.  We discuss
the advantages of these systems in "Markup Systems and the Future of Scholarly
Text Processing," *Communications of the ACM*, November 1987--- written with
Allen H. Renear and Steven J. DeRose.
 
Interested in any and all comments!  --Jim
 
Dr. James H. Coombs
Software Engineer, Research
Institute for Research in Information and Scholarship (IRIS)
Brown University
jazbo@brownvm.bitnet
=========================================================================
Date:         4 December 1987, 16:03:15 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      "International Educational Computing"
 
          POSSIBLE COURSE 3551 - SUMMER, 1988 - R. G. RAGSDALE
 
The  European  Conference  on  Computers  in  Education is being held in
Lausanne, Switzerland, July 24-29, 1988.  When the World  Conference  on
Computers  in  Education was held there in 1981, a substantial number of
OISE students attended, some as a portion of a  course  offered  by  Bob
McLean.
 
I propose to offer a course, 3551 - International Educational Computing:
An Interaction of Values and  Technology,  which  would  take  place  in
Switzerland,  around  the  dates of the conference.  Permission to offer
the course formally depends on several factors, including the number  of
students  likely  to  attend.   Plans are incomplete at this time, but a
projection of the plans indicates the following  format,  assuming  that
all necessary arrangements for housing, classroom, etc., can be made.
 
 
       The  course  participants  will  meet  together July 18-22 to
    study previous research and theory  on  values  and  technology,
    methods  for  evaluating  the  effects  of  technology, and case
    studies in business and education of technology-value conflicts.
    The  daily  schedule  will  have  more formal sessions (lecture,
    seminar)  in  the  mornings  and  less  formal  sessions  (group
    discussions)  in  the  early  evening,  with afternoons free for
    individual study or other activities (scheduled class  time  for
    each  day  will  be  four  hours, probably two and a half in the
    morning, one and a half in the  evening).    During  this  week,
    participants  will select and prepare for the issue(s) they plan
    to study during the conference.
 
       At the conference, each participant will focus on one or more
    topics,  such as a particular age range, subject matter area, or
    type of computer application.  They will collect  material  from
    the  formal  sessions,  but  also  from informal interviews with
    others attending the conference, both presenters and  those  who
    are only attending.
 
       August  1  is  a  Swiss  national  holiday  (which all course
    participants should enjoy), so the remaining sessions will  take
    place August 2-5, following the same schedule as the first week.
    During this time the results of the previous  week's  activities
    will  be  presented  and  group  feedback  will obtained.  Final
    papers will be due in mid-September.
 
       Preliminary  arrangements  for  accommodation  and  classroom
    space  have  been  made  at  Aiglon College, an English boarding
    school in Chesieres, Switzerland, about one hour  from  Lausanne
    by train and bus.  Room rates include the "taxe de sejour" which
    gives access to the recreational facilities of Villars, such  as
    the swimming pool, ice skating, etc.
 
_________ ____
Estimated_Cost
 
Based  on  1987 prices, the airfare to Geneva is $927, room and board is
860SF (Swiss Francs) for 20 days, and  the  conference  registration  is
280SF (higher after January 31).  At current exchange rates, these items
total  almost  $2,000.    A  better  estimate   would   include   ground
transportation,  other  likely  expenses (chocolate, etc.), and possible
price increases.  It seems extremely unlikely  that  necessary  expenses
would exceed $2,500.
 
Anyone who is interested in participating in this course should indicate
this to me  in  writing  (including,  if  possible,  your  "estimate  of
certainty").
=========================================================================
Date:         6 December 1987, 11:02:37 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributed by Nancy Ide <IDE@VASSAR>
Subject: TEXT MARK-UP (73 lines)
 
 
I recently responded to Jim Coombs' remarks concerning the principles
developed at Poughkeepsie as a basis for the development of a standard
for encoding machine-readable texts.  He suggested that we make our discussion
"public," in the spirit of recent remarks on HUMANIST, and so I will briefly
describe what has been said and put forth my reply.
 
I inidicated to Jim that much of what he says is very much in the spirit
of the discussions at Poughkeepsie among the 31 participants.  This shou ld
be made clearer in the minutes of the meeting, which Lou Burnard has drawn
up and which will be available from him or me in a few days.  Especially,
we intend to make the standard extensible to accomodate the unforeseen needs
of individual projects.
 
I also indicated that the standard will *recommend* a minimum set of tags for
texts, which is stated in the principles under number 5, I believe. We had a
lively discussion on this topic (actually, all of the discussions we very
lively!) at the Poughkeepsie meeting, with some disagreement about specifying a
minumum.  This is why *recommend* is in emphasis.  The feeling at the meeting
was that we can *require* nothing, but we can do our best to "guide the
perplexed" and provide some idea of what it makes sense to encode regardless of
how the text is originally intended to be used.  I should point out here that
among participants in the Poughkeepsie meeting, there were two clear
perspectives on the whole issue of encoding texts: one saw most encoding as a
future endeavor, and the other was focused on texts already encoded. One's
opinion concerning whether most texts have been encoded already or have yet to
be encoded obviously affects opinion on the importance of specifying a minimum
set of tags for encoded texts.
 
Jim responded to me suggesting that we could refuse to accept texts that had
been encoded without the "minimum" tags we might expect.  He made all of the
excellent arguments for insisting that certain tags be included *anytime* a tex
is encoded.  But the problem here is that I am not sure who the "we" who is to
do this refusing actually is.  If someone does not provide the minimum tags but
has encoded the collected works of some obscure author I am interested in, will
I refuse to accept the text?  If I am an archive, should I refuse to take the
text--that is, is it better to have an inadequately tagged text or none at all?
Admittedly, in some cases it may be better to start from scratch and re-enter a
text, if the existing version is pitifully done. But most of the time it will be
easier to go in and mark whatever I need to mark in the existing version than to
re-enter the text entirely.
 
Similarly, we cannot expect archives to ensure that their texts contain a
minimum tag set.  This was a point of considerable concern to the keepers of
archives present at the meeting, and led to the final agreement that only the
tags that are present (whatever they may be) in a text that is distributed by an
archive will conform to the standard.  This requirement in itself will
necessitate the writing of programs to perform tranlsation to the new scheme,
another topic addressed at some length and for which there seems to be support.
However, note that the principles indicate that texts now contained in the
archive need not be converted retrospectively. Naturally, although this is not
required we hope that it will occur in many cases.
 
So, the guidelines that will be developed will recommend a minimum set of
tags---especially, for those things that are easily encoded when the source text
is at hand and which are also obviously of use in most types of analysis.
However, it does not appear to me that it is reasonable to require such tagging.
We can only hope that the recommendation is enough to inspire most researchers
to provide the minimum set of tags when they encode new texts.
 
Nancy M. Ide
ide@vassar.bitnet
=========================================================================
Date:         6 December 1987, 11:10:43 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributed by Nancy Ide <IDE@VASSAR>
Subject:  more on mark-up (34 lines)
 
In my earlier message I neglected to summarize my reply to Jim Coombs
concerning SGML.  We have every expectation that the standard we devise will
be an application of SGML, but until we know fully our needs it is not
prudent to commit ourselves to SGML.  We know, for instance, that while it
is possible to define multiple parallel hierarchies in SGML it is not
entirely straightforward, and such parallel hierarchies are likely to be
used extensively in encoding machine-readable texts intended for literary,
linguistic, and historical analysis.  We hope that in any event the standard
will be compatible with SGML, which, as Jim points out, is bound to become
widely accepted and used.
 
Also, Jim had some concern about our defining a meta-language, since SGML
(the abstract syntax) is in fact a meta-language for describing a mark-up
scheme.  The concrete syntax of SGML is one mark-up scheme described by
this abstract syntax.  However, our goal is to provide a meta-language in
which *all* existing mark-up schemes can be described (which may prove
to be impossible), and it seems to us that the abstract syntax of SGML is
inadequate for this task.  The abstract syntax of SGML was not intended
for this purpose, it should be noted.
 
Nancy M. Ide
ide@vassar.bitnet
=========================================================================
Date:         6 December 1987, 11:15:37 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributed by C. S. Hunter <CSHUNTER at UOGUELPH>
Subject: Use of electronic communictions (29 lines)
 
Willard notes the high percentage of "silent participants" on HUMANIST.  My
experience with computer conferencing systems makes his note not at all
surprising.  At the University of Guelph we have had our CoSy conferencing
system available free of charge to all faculty for some years now.  Only
about 40 % of the faculty actually took us up on the offer of a free account
on the system.  Of that 40 %, only 25 % (or less) actively use the system
more than once a week.  The ratio of active to passive participants
on the system is something like 1 : 9.  The same is roughly true on the
student system, where only about 10 % of the registered users are actual active
participants.   We are now studying the phenomenon to determine what factors
contribute to the individual use or non-use of computer-mediated communication
among academics.
 
.
 
C. Stuart Hunter,
University of Guelph
cshunter@uoguelph
=========================================================================
Date:         6 December 1987, 11:41:45 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      E-mail to Australia
 
Contributed by Emmanuel Tov <HUET@HUJIPRMB.bitnet>
 
 
IN REPLY TO THE QUESTION OF BRENDAN O'FLAHERTY (3 DEC) I CAN TELL YOU THAT
MAIL FROM SYDNEY (MACQUARIE UNIV.) TO ISRAEL AND EUROPE AND THE U.S. IS FREE
AS WELL AS REVERSE MAIL.
 
EMANUEL TOV
=========================================================================
Date:         6 December 1987, 16:58:56 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributed by "James H. Coombs" <JAZBO@BROWNVM>
Subject: Text encoding
 
[In reply to Nancy Ide's points about SGML and related matters. The
inset paragraphs quote from her messages. -- ed.]
 
         We have every expectation that the standard we devise will be
         an application of SGML, but until we know fully our needs it
         is not prudent to commit ourselves to SGML.
 
A minor philosophical point, I guess: I don't think that we CAN know our needs
fully.  We need standards that accommodate needs that cannot be predicted
today.  The practical consequence of this observation, which I'm sure Nancy
would agree with, is that one should seek a "productive" system instead of
a system that satisfies everything on a list, and one should not spend a lot
of time developing the list.
 
         We know, for instance, that while it is possible to define
         multiple parallel hierarchies in SGML it is not entirely
         straightforward, and such parallel hierarchies are likely to be
         used extensively in encoding machine-readable texts intended
         for literary, linguistic, and historical analysis.
 
What are "multiple parallel hierarchies"?  I can guess, but I want to be sure
that I understand the problem.  In a most documents, we have, for example,
pragmatic and syntactic hierarchies.  One has no difficulty marking up
documents for both at the same time (although one does not normally mark
up the latter descriptively).  Pragmatically, we have things like
 
  [         [         [         ] [         ] ] ]
   CHAPTER   SECTION   PARAGRAPH   PARAGRAPH
 
Syntactically, we might have
 
  [  [  ] [   [  ] ] ]
   S  NP   VP  NP
 
So far as I know, there are no difficulties in marking up both types of
hierarchies.  One could argue that we really have a single hierarchy here,
but, conceptually at least, we have two different domains: pragmatics and
syntax.  Well, this distinction is bound to be controversial, to say the
least!  This is probably the wrong list for a discussion about syntax vs.
pragmatics, etc.  I can try other examples, but I'm still guessing.  And I'm
still wondering what the difficulty is in encoding them under SGML.
 
         However, our goal is to provide a meta-language in which *all*
         existing mark-up schemes can be described (which may prove to
         be impossible), and it seems to us that the abstract syntax of
         SGML is inadequate for this task.
 
What is the practical value of a metalanguage that generates all markup
languages?  I would think that it would be so abstract as to be of no
value.
 
I suspect that this is part of the goal of salvaging work that has been
inadequately coded.  I believe that we will be better off if we worry
less about the past and plan more for the future.  I suppose that it's
true that publishers have typesetting tapes in their basements, and that
we could use those tapes.  I think that we have to accept that those
tapes are of little value until someone converts the coding to
descriptive markup.  I have the typesetting tape for the American
Heritage Dictionary (sorry, can't distribute it); no one wasted time
trying to figure out how to use that tape as it is now.  I know of
several projects that are based on that tape, and all required
conversions.  Ideally, the tape would have been converted once and for
all (and it apparently has been now).
 
Whether it's a dictionary or a literary text, we can expect that
inadequate coding will cause considerable work for anyone attempting to
use the database.  A metalanguage that includes procedural markup as
well as descriptive markup will not help in such a case, because one
still has to map procedural markup onto descriptive markup in order to
be able to work with meaningful entities (definition, paragraph, etc.).
Since procedural markup tends to be performed somewhat arbitrarily and
does not normally provide a one-to-one relationship between entity and
markup, there is no metalanguage that will help a researcher perform the
necessary conversions.
 
What we really need is a sensible and dynamic standard.  I don't think
that anyone would argue that that standard should be anything other than
descriptively based.  Since we are going to have to convert texts to
descriptive markup in order to use them anyway, why not just develop the
standard and convert as necessary.  Trying to save the past is just
going to retard development.
 
I haven't mentioned SGML so far.  Is there a problem with SGML?  I have
heard complaints, and we addressed them in our article.  No one expects
individual scholars to master the full syntax and to generate Document
Type Definitions (DTD).  What we want is accurate and consistent
descriptive markup.  In our experience at Brown, people have no
difficulties mastering the principles of descriptive markup.  We can
leave the development of DTDs to experts.
 
--Jim
 
 
Dr. James H. Coombs
Software Engineer, Research
Institute for Research in Information and Scholarship (IRIS)
Brown University
jazbo@brownvm.bitnet
=========================================================================
Date:         6 December 1987, 17:12:16 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contributed by "James H. Coombs" <JAZBO@BROWNVM>
Subject:      Markup: on requirements
 
My thanks to Nancy Ide for moving the discussion out to HUMANIST.  Things
have fallen a little out of sequence, but the ideas are more important
than the sequence anyway.  I have also heard from Michael Sperberg-McQueen,
and I hope that he will post his very informative note as well.  If this
discussion becomes aggravating for the majority of HUMANISTs and there is
enough interest, then perhaps we can form a separate mailing list.
 
So, here is my (unedited) reply to the issue of requirements.
 
         While we may not be able to require that people conform to a
         standard fully, we can refuse to accept inadequate texts.
         There is an atmosophere of poverty now such that we are
         anxious to have whatever we can get our hands on.  At the
         extreme, even now most of us would reject a text that is all
         in upper case and contains errors---it turns out to be easier
         to do it oneself.  If we consider what things will be like or
         could be like in a few years though, I think it's appropriate
         to say that there are certain minimal standards (or one must
         comply with with a standard).  First, we don't accept just
         anything for other scholarly documents.  Second, we will have
         more alternatives for sources.  Third, we want high quality
         sources so that people won't have to keep reworking or
         entirely redoing.  If I can't count on a text from a particular
         archive to meet my needs, what is my motivation for bothering
         with that archive; and what is the motivation for the
         archive's existence?  I certainly would not want to see it
         supported by public funds.
 
         I don't think that this places an inordinate burden on
         individual researchers.  For the most part, I'm sure that it's
         considerably less burdensome than ensuring that one's
         bibliography, for example, accords with the MLA style sheet
         (and what bibliography unambiguously does?).
 
         --Jim
 
I should elaborate briefly.  First, I have/had a tape of Milton's
*Paradise Lost*; it was so bad that I would prefer to start from
scratch.  Second, I think that we have a right to expect archives to set
and maintain certain standards.  Perhaps they don't want to accept that
responsibility right now.  If not, then I think that we should be
planning to develop and support a good archive.  Does such an archive
need several programmers for text validation and maintenance?  Then they
should have the support to hire them.  Let's centralize the expense as
much as possible.  Currently, we have no idea who is entering what and
how they are doing it.  Even if we could get people to go to the archive,
the current approach means that many people are going to have to
massage texts into useful formats, and every project will have to ensure
that the text is accurate.  It's as if we all had to revise our copies
of *Paradise Lost* and then go proof read them before we could use them.
Finally, I have texts that I have entered, marked up, and proof read, but
I'm reluctant to check them into an archive that is inconsistent at best.
Whatever professional credit I might get for the contribution---well, let's
say that the effort is somewhat discredited by the state of the archive.
It's like publishing a book with XYZ press instead of ABC.  I would be happy
to send it off to someone who provides full services and validates text,
and I would be happy to make any necessary corrections.  To reverse the
roles, I am reluctant to acquire a text from an archive that makes no
guarantees.  After all, in the process of keyboarding a text, I get to read
it, and the time goes quickly.  It's the proofreading that is burdensome, and
I still have to proofread.  (Or do I get to say that I used X's text, and X
is going to accept the responsibility for errors.)
 
--Jim
 
Dr. James H. Coombs
Software Engineer, Research
Institute for Research in Information and Scholarship (IRIS)
Brown University
jazbo@brownvm.bitnet
=========================================================================
Date:         6 December 1987, 17:24:17 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
Subject:      ACL Applied Natural Language Conference (833 lines)
 
 
The following is republished from IRLIST, the Information Retrieval
List. -- ed.]
 
--------------------------------------------------------------------------
The printed version of the following program and registration information
will be mailed to ACL members early in December.  Others are encouraged
to use the attached form or write for a booklet to the following address:
Dr. D.E. Walker (ACL), 445 South Street - MRE 2A379, Morristown, NJ 07960,
USA, or to walker@flash.bellcore.com, specifying "ACL Applied" on the
subject line.
 
                             ASSOCIATION
                                 FOR
                      COMPUTATIONAL LINGUISTICS
 
       SECOND CONFERENCE ON APPLIED NATURAL LANGUAGE PROCESSING
 
                         9 - 12 February 1988
 
          Austin Marriott at the Capitol, Austin, Texas, USA
Tutorials: Joe C. Thompson Conference Center, University of Texas at Austin
 
                           ADVANCE PROGRAM
 
                              Features:
               Six introductory and advanced tutorials
            Three days of papers on the  state-of-the-art
                   Distinguished luncheon speakers
                     A panel of industry leaders
                     Exhibits and demonstrations
 
 
REGISTRATION : 7:30am - 3:00pm, Tuesday, 9 February,
  Joe C. Thompson Conference Center, University of Texas at Austin, 26th
  and Red River.
                7:00pm - 9:00PM, Tuesday, 9 February
                8:00am - 5:00pm, Wednesday, 10 February
                8:00am - 5:00pm, Thursday, 11 February
                8:00am - 12:00n, Friday, 12 February
  Austin Marriott at the Capitol, 701 East 11th Street
 
 
EXHIBITS :  10:00am - 6:00pm, Wednesday, 10 February
            10:00am - 6:00pm, Thursday, 11 February
             9:00am - 12:00n, Friday, 12 February
  Austin Marriott at the Capitol
 
 
TUTORIALS: TUESDAY, FEBRUARY 9, 1988
  Joe C. Thompson Conference Center, University of Texas at Austin, 26th
  and Red River.
 
8:30 12:30 INTRODUCTION TO NATURAL LANGUAGE PROCESSING
          James Allen, University of Rochester
 
8:30 12:30 MACHINE-READABLE DICTIONARIES: A COMPUTATIONAL LINGUISTICS
          PERSPECTIVE
          Bran Boguraev, Cambridge University, and
          Beth Levin, Northwestern University
 
8:30 12:30 SPOKEN LANGUAGE SYSTEMS: PAST, PRESENT, AND FUTURE
          Salim Roucos, BBN Laboratories, Inc.
 
1:30 5:30 THE TECHNOLOGY OF NATURAL LANGUAGE INTERFACES
          Carole Hafner, Northeastern University
 
1:30 5:30 THE ROLE OF LOGIC IN REPRESENTING MEANING AND KNOWLEDGE
          Bob Moore, SRI International
 
1:30 5:30 MACHINE TRANSLATION
          Sergei Nirenburg, Carnegie Mellon University
 
 
RECEPTION: 7:00pm - 9:00pm, Tuesday, 9 February
  Austin Marriott at the Capitol, 701 East 11th Street
 
 
                           GENERAL SESSIONS
        WEDNESDAY, FEBRUARY 10, 1988
 
9:00 9:15       OPENING REMARKS AND ANNOUNCEMENTS
                Norman Sondheimer, General Chair (USC/Information Sciences
                        Institute)
                Bruce Ballard, Program Chair (AT&T Bell Laboratories)
                Jonathan Slocum, Local Arrangements Chair (MCC)
                Donald E. Walker, ACL Secretary-Treasurer (Bell Communications
                        Research)
 
        SESSION 1: SYSTEMS
 
9:15 9:40       The Multimedia Articulation of Answers in a Natural Language
                Query System
                Susan E. Brennan (Hewlett Packard)
 
9:40 10:05      A News Story Categorization System
                Philip J. Hayes, Laura E. Knecht and Monica J. Cellio
                (Carnegie Group)
 
10:05 10:30     An Architecture for Anaphora Resolution
                Elaine Rich and Susann Luper-Foy (MCC)
 
        SESSION 2: GENERATION
 
11:00 11:25     The SEMSYN Generation System: Ingredients, Applications,
                Prospects
                Dietmar Roesner (Universitaet Stuttgart)
 
11:25 11:50     Two Simple Prediction Algorithms to Facilitate Text Production
                Lois Boggess (Mississippi State University)
 
11:50 12:15     From Water to Wine: Generating Natural Language Text from
                Today's Applications Programs
                David D. McDonald (Brattle Research Corporation) and
                Marie M. Meteer (Bolt, Beranek and Newman)
 
12:15 2:00      LUNCHEON
                Guest Speaker: Grant Dove
                Chairman and CEO of MCC.  Prior to joining MCC in July l987,
                Mr. Dove had been with Texas Instruments for 28 years,
                having served as Executive Vice President since l982.
 
        SESSION 3: SYNTAX AND SEMANTICS
 
2:00 2:25       Improved Portability and Parsing Through Interactive
                Acquisition of Semantic Information
                Francois-Michel Lang and Lynettte Hirschman (Unisys)
 
2:25 2:50       Handling Scope Ambiguities in English
                Sven Hurum (University of Alberta)
 
2:50 3:15       Responding to Semantically Ill-Formed Input
                Ralph Grishman and Ping Peng (New York University)
                  and
                Evaluation of a Parallel Chart Parser
                Ralph Grishman and Mahesh Chitrao (New York University)
 
        SESSION 4: MORPHOLOGY AND THE LEXICON
 
3:45 4:10       Triphone Analysis: A Combined Method for the Correction of
                Orthographical and Typographical Errors
                Koenraad DeSmedt (University of Nijmegen) and
                Brigette van Berkel (TNO Institute of Applied Computer
                  Science)
 
4:10 4:35       Creating and Querying Hierarchical Lexical Databases
                Mary S. Neff, Roy J. Byrd, and Omneya A. Rizk
                (IBM Watson Research Center)
 
4:35 5:00       Cn yur cmputr raed ths?
                Linda G. Means (General Motors)
 
5:00 5:25       Building a Large Thesaurus for Information Retrieval
                Edward A. Fox, J. Terry Nutter (Virginia Tech), Thomas Ahlswede,
                Martha Evens (Illinois Institute of Technology), and
                Judith Markowitz (Navistar International)
 
6:30 ****      RECEPTION
               Microelectronics and Computer Technology Corporation (MCC)
 
 
        THURSDAY, FEBRUARY 11, 1988
 
        SESSION 5: SYSTEMS
 
8:30 8:55       Application-Specific Issues in NLI Development for
                a Diagnostic Expert System
                Karen L. Ryan, Rebecca Root and Duane Olawsky (Honeywell)
 
8:55 9:20       The MULTIVOC Text-to-Speech System
                Olivier Emorine and Pierre Martin (Cap Sogeti Innovation)
 
9:20 9:45       Structure from Anarchy: Meta Level Representation of
                Expert System Predicates for Natural Language Interfaces
                Galina Datskovsky Moerdler (Columbia University)
 
        SESSION 6: TEXT PROCESSING
 
10:15 10:40     Integrating Top-Down and Bottom-Up Strategies in a Text
                Processing System
                Lisa F. Rau and Paul S. Jacobs (General Electric)
 
10:40 11:05     A Stochastic Parts Program and Noun Phrase Parser for
                Unrestricted Text
                Kenneth W. Church (AT&T Bell Laboratories)
 
11:05 11:30     A Tool for Investigating the Synonymy Relation in a Sense
                Disambiguated Thesaurus
                Martin S. Chodorow, Yael Ravin (IBM Watson Research Center)
                and Howard E. Sachar (IBM Data Systems Division)
 
11:30 11:55     Dictionary Text Entries as a Source of Knowledge
                for Syntactic and Other Disambiguations
                Karen Jensen and Jean-Louis Binot (IBM Watson Research Center)
 
12:00 1:45      LUNCHEON
                Guest Speaker: Donald E. Walker
                Manager of Artificial Intelligence and Information Science
                Research at Bell Communications Research, and
                Secretary-Treasurer of ACL and IJCAII..
 
        SESSION 7: MACHINE TRANSLATION
 
1:45 2:10       EUROTRA: Practical Experience with a Multilingual Machine
                Translation System under Development
                Giovanni B. Varile and Peter Lau (Commission of the
                European Communities)
 
2:10 2:35       Valency and MT: Recent Developments in the METAL System
                Rudi Gebruers (Katholieke Universiteit Leuven)
 
3:00 5:00       PANEL: Natural Language Interfaces: Present and Future
                Moderator: Norman Sondheimer (USC/Information Sciences
                        Institute)
                Panelists: Robert J. Bobrow (BBN Laboratories),
                                Developer of RUS
                           Jerrold Ginsparg (Natural Language Inc.),
                                Developer of DataTalker
                           Larry Harris (Artificial Intelligence Corporation),
                                Developer of Intellect
                           Gary G. Hendrix (Symantec), Developer of Q&A
                           Steve Klein (Singular Solutions Engineering)
                                Co-Developer of Lotus HOW
 
5:00 6:00       RECEPTION
                Austin Marriott at the Capitol
 
 
        FRIDAY, FEBRUARY 12, 1988
 
        SESSION 8: SYSTEMS
 
8:30 8:55       Automatically Generating Natural Language Reports
                in an Office Environment
                Jugal Kalita and Sunil Shende (University of Pennsylvania)
 
8:55 9:20       Luke: An Experiment in the Early Integration of Natural
                Language Processing
                David A. Wroblewski and Elaine A. Rich (MCC)
 
9:20 9:45       The Experience of Developing a Large-Scale Natural
                Language Text Processing System: CRITIQUE
                Stephen D. Richardson and Lisa C. Braden-Harder
                (IBM Watson Research Center)
 
        SESSION 9: MORPHOLOGY AND THE LEXICON
 
10:15 10:40     Computational Techniques for Improved Name Search
                Beatrice T. Oshika (Sparta), Bruce Evans (TRW),
                Janet Tom (Systems Development Corporation), and Filip Machi
                (UC Berkeley)
 
10:40 11:05     The TICC: Parsing Interesting Text
                David Allport (University of Sussex)
 
11:05 11:30     Finding Clauses in Unrestricted Text by Stochastic and
                Finitary Methods
                Eva Ejerhed (University of Umea)
 
11:30 11:55     Morphological Processing in the Nabu System
                Jonathan Slocum (MCC)
 
        SESSION 10: SYNTAX AND SEMANTICS
 
1:30 1:55       Localizing Expression of Ambiguity
                John Bear and Jerry R. Hobbs (SRI International)
 
1:55 2:20       Combinatorial Disambiguation
                Paula S. Newman (IBM Los Angeles Scientific Center)
 
2:20 2:45       Canonical Representation in NLP System Design:
                A Critical Evaluation
                Kent Wittenburg and Jim Barnett (MCC)
 
 
 
 
REGISTRATION INFORMATION AND DIRECTIONS
 
PREREGISTRATION MUST BE RECEIVED BY 25 JANUARY; after that date, please
wait to register at the Conference itself.  Complete the attached
``Application for Registration'' and send it with a check payable to
Association for Computational Linguistics or ACL to Donald E. Walker
(ACL), Bell Communications Research, 445 South Street MRE 2A379,
Morristown, NJ 07960, USA; (201) 829-4312; walker@flash.bellcore.com;
ucbvax!bellcore!walker.  If a registration is cancelled before 25
January, the registration fee, less  $15 for administrative costs, will
be returned.  Full conference registrants will also receive lunch on
the 10th and 11th.   Registration includes one copy of the Proceedings,
available at the Conference.  Copies of the Proceedings at $20 for
members ($30 for nonmembers) may be ordered on the registration form or
by mail prepaid from Walker.
 
TUTORIALS : Attendance is limited.  Preregistration is encouraged
to ensure a place and the availability of syllabus materials.
 
RECEPTIONS : The Microelectronics and Computer Technology Corporation
(MCC) will host a reception for the conference at its site on
Wednesday evening.  To aid in planning we ask that you complete the
RSVP on the registration form.  In addition there will be receptions
at the conference hotel on Tuesday evening and Thursday afternoon.
 
EXHIBITS AND DEMONSTRATIONS : Facilities for exhibits and system
demonstrations will be available.  Persons wishing to arrange an
exhibit or present a demonstration should contact Kent Wittenburg,
MCC, 3500 W. Balcones Center Drive, Austin, TX 78759; (512)338-3626;
wittenburg@mcc.com as soon as possible.
 
HOTEL RESERVATIONS : Reservations at the Austin Marriott at the
Capitol MUST be made using the Hotel Reservation Form included with
this flyer.  Reservations subject to guest room availability for
reservations received after 25 January 1988.  Please mail to:
        Austin Marriott at the Capitol
        Attn: Reservation Office
        701 East 11th Street
        Austin, Texas 78701
        (512) 478-1111
 
AIR TRANSPORTATION : American Airlines offers conferees a special 35%
off full coach fare, 30% off full Y fares for passengers originating in
Canada, or 5% off any published roundtrip airfare applicable to and
from Austin.  Call toll free 1-800-433-1790 and give the conference's
STAR number S81816.  If you normally use the service of a travel agent,
please have them make your reservations through this number.
 
DIRECTIONS : There is one public exit from Robert Mueller Airport in
Austin; at the traffic light, turn right (onto Manor Rd.) and drive to
Airport Blvd.  (approx. 1/4 - 1/2 mile).  Turn right on Airport Blvd.,
and drive to highway I-35 (approx. 1-2 miles).  Turn left (south) onto
I-35, heading toward town.  Get off at the 11th-12th St. (Capitol)
exit, and drive an extra block on the access road, to 11th St.  The
Marriott is on the SW corner of that intersection (across 11th St., on
the right).  A parking garage is attached.
 
The Marriott at the Capitol operates a free shuttle to and from the
airport.  Cab fare would be approx. $6.
 
The Joe C. Thompson Conference Center parking lot is on the SW corner
of Red River and 26th Street; the entrance is on Red River, and a guard
will point out the center (adjacent, to the west).  Directions to JCT
from Marriott parking garage: Turn right (S) on I-35 frontage road,
turn right (W) on 10th St., turn right (N) on Red River, and drive
[almost] to 26th.
 
 
 
 
 APPLICATION FOR REGISTRATION
 
Association for Computational Linguistics, Second Conference on
Applied Natural Language Processing, 9 - 12 February 1988, Austin, Texas
 
 
NAME  _________________________________________________________________
      Last                             First                        Middle
AFFILIATION (Short form for badge ID)
___________________________________________________________
 
ADDRESS _______________________________________________________________
 
_______________________________________________________________________
 
_______________________________________________________________________
 
_______________________________________________________________________
 
TELEPHONE  ____________________________________________________________
 
COMPUTER NET ADDRESS  _________________________________________________
 
REGISTRATION INFORMATION  (circle fee)
NOTE: Only those whose dues are paid for 1988 can register as members.
 
                        ACL     NON-    FULL-TIME
                        MEMBER* MEMBER* STUDENT*
 
by 25 JANUARY           $170    $205    $85
at the Conference       $220    $255    $110
*Member and Non-Member fees include Wednesday and Thursday luncheons;
Students can purchase luncheon tickets at a reduced rate.
 
LUNCHEON TICKETS FOR STUDENTS:  $10  each; Wednesday _____;
Thursday ________; amount enclosed  $ ______
 
LUNCHEON TICKETS FOR GUESTS:  $15  each; Wednesday _____;
Thursday ________; amount enclosed  $ ______
 
SPECIAL MEALS: VEGETARIAN ______  KOSHER ______
 
EXTRA PROCEEDINGS:  $20  members;  $30  non-members; amount enclosed  $ ______
 
TUTORIAL INFORMATION  (circle fee and check at most two
tutorials)
 
FEE PER TUTORIAL        ACL     NON-    FULL-TIME
                        MEMBER  MEMBER* STUDENT
 
by 25 January           $75     $110    $50
at the Conference       $100    $135    $65
*Non-member tutorial fee includes ACL membership for 1988;
do not pay non-member fee for BOTH registration and tutorials.
 
Morning Tutorials:
 select ONE: INTRODUCTION: Allen  LEXICONS: Boguraev &  SPEECH: Roucos
                                              Levin
Afternoon Tutorials:
 select ONE: INTERFACES: Hafner   LOGIC: Moore          TRANSLATION: Nirenburg
 
TOTAL PAYMENT MUST BE INCLUDED :   $ ____________
 
(Registration, Luncheons, Extra Proceedings, Tutorials)
 
 
Make checks payable to  ASSOCIATION FOR COMPUTATIONAL LINGUISTICS  or
ACL.  Credit cards cannot be honored.
 
RSVP for MCC Reception: Please check if you plan to attend the MCC
reception on Wednesday evening, February 10th. _________
 
Send Application for Registration WITH PAYMENT before 25 January to
the address below; AFTER 25 January, wait to register at Conference:
 
        Donald E. Walker (ACL)
        Bell Communications Research
        445 South Street, MRE 2A379
        Morristown, NJ 07960, USA
        (201)829-4312
        walker@flash.bellcore.com
        ucbvax!bellcore!walker
 
 
 
 
 APPLICATION FOR HOTEL REGISTRATION
 
Reservations subject to guest room availability for reservations
received after 25 January 1988.  In the event of unanticipated demand,
rooms will be assigned on a first-come, first-served basis.  Please
send in your reservation request as early as possible.
 
 
NAME  _________________________________________________________________
      Last                             First                        Middle
AFFILIATION
 ___________________________________________________________
 
ADDRESS _______________________________________________________________
 
_______________________________________________________________________
 
_______________________________________________________________________
 
_______________________________________________________________________
 
TELEPHONE  ____________________________________________________________
 
Room Requirements
 
  Single  $64 ________
 
  Double  $74 ________
 
Date and time of arrival _________________________________________
 
Date and time of departure _______________________________________
 
Complete if arrival after 6PM
 
__________________________________________________________________
Credit Card Name                Number          Expiration Date
 
 
 
Send  Application for Hotel Reservation to:
        Austin Marriott at the Capitol
        Attn: Reservation Office
        701 East 11th Street
        Austin, Texas 78701
        (512) 478-1111
 
 
 
 
              ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
       SECOND CONFERENCE ON APPLIED NATURAL LANGUAGE PROCESSING
                              TUTORIALS
                           9 February 1988
   Joe C. Thompson Conference Center, University of Texas at Austin
 
                    Morning 8:30 A.M. - 12:30 P.M.
 
 
8:30 12:30 INTRODUCTION TO NATURAL LANGUAGE PROCESSING
          James Allen, University of Rochester
 
ABSTRACT
 
This tutorial will cover the basic concepts underlying the construction
of natural language processing systems.  These include basic parsing
techniques, semantic interpretation and the representation of sentence
meaning, as well as knowledge representation and techniques for
understanding natural language in context.  In particular, the topics
to be addressed in detail will include augmented transition networks
(ATNs), augmented context-free grammars, the representation of lexical
meaning, especially looking at case-grammar based representations, and
the interpretation of pronouns and ellipsis.  In addition, there will
be an overview of knowledge representation, including semantic
networks, frame-based systems, and logic, and the use of general world
knowledge in language understanding, including scripts and plans.
 
Given the large range of issues and techniques, an emphasis will be
placed on those aspects relevant to existing practical natural
language systems, such as interfaces to database systems.  The
remaining issues will be more quickly surveyed to give the attendee
an idea of what techniques will become important in the next
generation of natural language systems.  The lecture notes will
include an extensive bibliography of work in each area.
 
INTENDED AUDIENCE
 
This tutorial is aimed at people who are interested in learning the
fundamental techniques and ideas relevant to natural language
processing.  It will be useful to managers who want an overview of
the field, to programmers starting research and development in the
natural language area, and to researchers in related disciplines such
as linguistics who want a survey of the computational approaches to
language.
 
BIOGRAPHICAL SKETCH
 
Dr. James Allen is an Associate Professor and Chairman of the
Computer Science Department at the University of Rochester.  He is
editor of the journal Computational Linguistics and author of the
book Natural Language Understanding, published in 1987.  In 1984, he
received a five-year Presidential Young Investigator award for his
research in Artificial Intelligence.
 
 
 
 
8:30 12:30 MACHINE-READABLE DICTIONARIES: A COMPUTATIONAL LINGUISTICS
          PERSPECTIVE
          Branimir Boguraev, Cambridge University, and
          Beth Levin, Northwestern University
 
 
 
ABSTRACT
 
The lexical information contained explicitly and implicitly in
machine-readable dictionaries (MRDs) can support a wide range of
activities in computational linguistics, both of theoretical interest
and of practical importance.  This tutorial falls into two parts.
The first part will focus on some characteristics of raw lexical data
in electronic sources, which make MRDs particularly relevant to
natural language processing applications.  The second part will
discuss how theoretical linguistic research into the lexicon can
enhance the contribution of MRDs to applied computational
linguistics.
 
The first half will discuss issues concerning the placement of
rich lexical resources on-line; raise questions related to the
suitability, and ultimately the utility, of MRDs for automatic
natural  language processing;  outline a  methodology aimed at
extracting maximally usable subsets of the dictionary with minimal
introduction of errors; and present ways in which specific use can be
made of the lexical data for the construction of practical language
processing systems with substantial coverage.
 
The second half of the tutorial will review current theoretical
linguistic research on the lexicon, emphasizing proposals concerning
the nature of lexical representation and lexical organization.  This
overview will provide the context for an examination of how the
results of this research can be brought to bear on the problem of
extracting syntactic and semantic information encoded in dictionary
entries, but not overtly signaled to the dictionary user.
 
INTENDED AUDIENCE
 
This tutorial presupposes some familiarity with  work in both
computational and theoretical linguistics.  It is aimed at
researchers in natural language processing and theoretical linguists
who want to take advantage of the resources available in MRDs for
both applied and theoretical purposes.  The issues of providing
substantial lexical coverage and system transportability are
addressed, thus making this tutorial of particular relevance to those
concerned with the automatic acquisition, on a large scale and in a
flexible format, of phonological, syntactic, and semantic information
for nlp systems.
 
BIOGRAPHICAL SKETCHES
 
Dr. Branimir Boguraev is an SERC (UK Science & Engineering Research
Council) Advanced Research Fellow at the University of Cambridge.  He
has been with the Computer Laboratory since 1975, and completed a
doctoral thesis in natural language processing there in 1979.
Recently he has been involved in the development of computational
tools for natural language processing, funded by grants awarded by
the UK Alvey Programme in Information Technology.
 
Dr. Beth Levin is an Assistant Professor in the Department of
Linguistics, Northwestern University, Evanston, IL.  She was a System
Development Foundation Research Fellow at the MIT Center for Cognitive
Science from 1983-1987 where she assumed major responsibility for
directing the MIT Lexicon Project.  She received her Ph.D. in
Electrical Engineering and Computer Science from MIT in June 1983.
 
 
 
 
8:30 12:30 SPOKEN LANGUAGE SYSTEMS: PAST, PRESENT, AND FUTURE
          Salim Roucos, BBN Laboratories, Inc.
 
ABSTRACT:
 
This tutorial will present the issues in developing spoken language
systems for natural speech communication between a person and a
machine.  In particular, the performance of complex tasks using large
vocabularies and unrestricted sentence structures will be examined.
The first Advanced Research Projects Agency (ARPA) Speech Understanding
Research project during the seventies will be reviewed, and then the
current state-of-the-art in continuous speech recognition and natural
language processing will be described.  Finally, the types of spoken
language systems' capabilities expected to be developed during the next
two to three years will be presented.
 
The technical issues that will be covered include acoustic-phonetic
modeling, syntax, semantics, plan recognition and discourse, and the
issues for integrating these knowledge sources for speech understanding.
In addition, computational requirements for real-time understanding,
and performance evaluation methodology will be described.  Some of the
human factors of speech understanding in the context of performing
interactive tasks using an integrated interface will also be
discussed.
 
INTENDED AUDIENCE:
 
This tutorial is aimed at technical managers, product developers, and
technical staff interested in learning about spoken language systems
and their potential applications.  No expertise in either speech or
natural language will be assumed in introducing the technical details
in the tutorial.
 
BIOGRAPHICAL SKETCH:
 
Dr. Salim Roucos has worked for seven years at BBN Laboratories in
speech processing such as continuous speech recognition, speaker
recognition, and speech compression.  More recently, he has been the
principal investigator on integrating speech recognition and natural
language understanding for developing a spoken language system. His
areas of interest are statistical pattern recognition and language
modeling.  Dr. Roucos is chairman of the Digital Signal Processing
committee of the IEEE ASSP society.
 
 
 
                   Afternoon 1:30 P.M. - 5:30 P.M.
 
1:30 5:30 THE TECHNOLOGY OF NATURAL LANGUAGE INTERFACES
          Carole D. Hafner, Northeastern University
 
ABSTRACT
 
This tutorial will describe the development of natural language
processing from a research topic into a commercial technology.  This
will include a description of some key research projects of the 1970's
and early 1980's which developed methods for building natural language
query interfaces, initially restricted to just one database, and later
made "transportable" to many different applications.  The further
development of this technology into commercial software products will
be discussed and illustrated by a survey of several current products,
including both micro-computer NL systems and those offered on
higher-performance machines.  The qualities a user should look for in a
NL interface will be considered, both in terms of linguistic
capabilities and general ease of use.  Finally, some of the remaining
"hard problems" that current technology has not yet solved in a
satisfactory way will be discussed.
 
INTENDED AUDIENCE
 
This tutorial is aimed at people who are not well acquainted with
natural language interfaces and who would like to learn about 1) the
capabilities of current systems, and 2) the technology that underlies
these capabilities.
 
 
BIOGRAPHICAL SKETCH
 
Dr. Carole D. Hafner is Associate Professor of Computer Science at
Northeastern University.  After receiving her Ph.D. in Computer and
Communication Sciences from the University of Michigan, she spent
several years as a Staff Scientist at General Motors Research
Laboratories working on the development of a natural language
interface to databases.
 
 
 
 
 
1:30 5:30 THE ROLE OF LOGIC IN REPRESENTING MEANING AND KNOWLEDGE
          Robert C. Moore, SRI International
 
ABSTRACT
 
This tutorial will survey the use of logic to represent the meaning
of utterances and the extra-linguistic knowledge needed to produce
and interpret utterances in natural-language processing systems.
Problems to be discussed in meaning representation include
quantification, propositional attitudes, comparatives, mass terms and
plurals, tense and aspect, and event sentences and adverbials.
Logic-based methods (unification) for systematic specification of the
correspondence between syntax and semantics in natural language
processing systems will also be touched on.  In the discussion of the
representation of extra-linguistic knowledge, special attention will
be devoted to the role played by knowledge of speakers' and hearers'
mental states (particularly their knowledge and beliefs) in the
generation and interpretation of utterances and logical formalisms
for representing and reasoning about knowledge of those states.
 
INTENDED AUDIENCE
 
This tutorial is aimed at implementors of natural-language processing
systems and others interested in logical approaches to the problems
of meaning representation and knowledge representation in such
systems.
 
BIOGRAPHICAL SKETCH
 
Dr. Robert C. Moore is a staff scientist in the Artificial
Intelligence Center of SRI International.  Since joining SRI in 1977,
Dr. Moore has carried out research on natural-language processing,
knowledge representation, automatic  deduction, and nonmonotonic
reasoning.  In 1986-87 he was the first director of SRI's Computer
Science Research Centre in Cambridge, England.  Dr. Moore received
his PhD from MIT in 1979.
 
 
 
 
 
1:30 5:30 MACHINE TRANSLATION
          Sergei Nirenburg, Carnegie Mellon University
 
ABSTRACT
 
The central problems faced by a Machine Translation (MT) research
project are 1) the design and implementation of automatic natural
language analyzers and generators that manipulate morphological,
syntactic, semantic and pragmatic knowledge; and 2) the design,
acquisition and maintenance of dictionaries and grammars.  Since a
short-term goal (or even medium term goal) of building a system that
performs fully automated machine translation of unconstrained text is
not feasible, an MT project must carefully constrain its objectives.
 
This tutorial will describe the knowledge and processing requirements
for an MT system.  It will present and analyze the set of design
choices for MT projects including distinguishing features such as
long-term/short-term, academic/commercial, fully/partially automated,
direct/transfer/interlingua, pre-/post-/interactive editing.  The
knowledge acquisition needs of an MT system, with an emphasis on
interactive knowledge acquisition tools that facilitate the task of
compiling the various dictionaries for an MT system will be
discussed.  In addition, expectations, possibilities and prospects
for immediate application of machine translation technology will be
considered.  Finally, a brief survey of MT research and development
work around the world will be presented.
 
INTENDED AUDIENCE
 
This tutorial is aimed at at a general audience that could include
both students looking for an application area and testbed for their
ideas in natural language processing and people contemplating
starting an MT or machine-aided translation project.
 
BIOGRAPHICAL SKETCH
 
Dr. Sergei Nirenburg, Research Scientist at the Center for Machine
Translation at Carnegie-Mellon University, holds an M.Sc. in
Computational Linguistics from Kharkov State University, USSR, and a
Ph.D. in Linguistics from the Hebrew  University of Jerusalem,
Israel.  He has published in the fields of parsing, generation,
machine translation, knowledge representation and acquisition, and
planning.  Dr. Nirenburg is Editor of the journal Computers and
Translation.
 
 
 
 
       SECOND CONFERENCE ON APPLIED NATURAL LANGUAGE PROCESSING
                         Conference Committee
 
General Chair
Norman Sondheimer, USC/Information Sciences Institute
 
Secretary-Treasurer
Donald E. Walker, Bell Communications Research
 
Program Committee
Bruce Ballard (Chair), AT&T Bell Laboratories
Madeleine Bates, BBN Laboratories
Tim Finin,  Unisys
Ralph Grishman, New York University
Carole Hafner, Northeastern University
George Heidorn, IBM Corporation
Paul Martin, SRI International
Graeme Ritchie, University of Edinburgh
Harry Tennant, Texas Instruments
 
Tutorials
Martha Palmer, Unisys
 
Local Arrangements
Jonathan Slocum, MCC (Chair)
Elaine Rich, MCC
 
Exhibits and Demonstrations
Kent Wittenburg, MCC
 
Publicity
Jeffrey Hill and Brenda Nashawaty, Artificial Intelligence Corporation
 
 
------------------------------
=========================================================================
Date:         6 December 1987, 18:22:13 EST
Reply-To:     MCCARTY@UTOREPAS
Sender:       HUMANIST Discussion <HUMANIST@UTORONTO>
From:         MCCARTY@UTOREPAS
 
Contibuted by Robert Amsler <amsler@flash.bellcore.com>
Subject: Reply to James H. Coombs `ACH Text markup' message (109 lines)
 
(I'll make this reply public from the start since Nancy Ide already
had to double-back and make her's public afterwards. It may,
however, become a suitable topic for a more extended private
discussion between those with an interest in text encoding
standards.)
 
As Nancy already noted, SGML is the most likely model which will be
used for the Humanities Text Standard, however there was considerable
concern at the meeting by the French delegation about the workshop
endorsing SGML as the official standard to be emulated.  In view of
that, it was deemed essential to avoid specifically saying this in
favor of the broader statement that we'd attempt to be compatible
with applicable existing standards where possible. Specifically,
this also includes character transliteration standards--which are a
considerable part of a humanities text standard's encoding problems.
(I can hardly wait for ISO to adopt an official standard for encoding
Egyptian hieroglyphics in ASCII!)
 
I would also however like to make a strong statement that from a
computational perspective there is no need for any one format to be
the only one used. What is needed is that any format must be fully
documented and an information-preserving transformation of the
contents of any approved standard format.  This was captured in the
statement that the standard would be an `interchange' format.
 
This does beg the issue of how the transformation takes place, i.e.
a program needs to be written or capable of being run on the `other'
format and on hardware available to the recipient of the data, but it
is important to note that an SGML-like format may appear as very
formidable to users who believe they will have to type in all the
special codes manually--whereas a `keyboarding' format may be just as
faithful in representing the information without undo burden to the
typist. I'm sure you will agree to this since your excellent CACM
article notes that one of the most overlooked forms of markup is the
use of traditional English punctuation and spacing conventions.
 
Returning to your message's points, your 4th point seems to  me to be
exceptionally good and something that we did not explicitly get to in
the Poughkeepsie meeting, i.e.,
 
``4) There should be no attempt at establishing a "closed" tag set. The
 current AAP SGML application allows for definition of new tags, but
 it does not support such definition in a practical way. The
 consequence is that people will use "list items," for example, when
 they should be using "line of poetry." Within these guidelines, it
 can only be healthy to provide a list of tags that people should
 choose from when tagging certain entities.
 
 The point of this is that we cannot predict what textual elements
 wil