3.554 supercomputing the humanities, cont. (92)

Willard McCarty (MCCARTY@vm.epas.utoronto.ca)
Mon, 9 Oct 89 21:44:37 EDT

Humanist Discussion Group, Vol. 3, No. 554. Monday, 9 Oct 1989.


(1) Date: Sat, 7 Oct 89 09:44:04 CDT (37 lines)
From: Mark Olsen <mark@gide.uchicago.edu>

(2) Date: Mon, 9 Oct 89 07:28:23 EDT (14 lines)
From: David.A.Bantz@mac.dartmouth.edu
Subject: Re: 3.550 supercomputing the humanities, cont. (47)

(3) Date: Mon, 9 Oct 89 12:20:00 EDT (89 lines)
From: "Vicky A. Walsh" <IMD7VAW@OAC.UCLA.EDU>
Subject: Re: 3.543 supercomputing the humanities, cont. (64)

(1) --------------------------------------------------------------------
Date: Sat, 7 Oct 89 09:44:04 CDT
From: Mark Olsen <mark@gide.uchicago.edu>

There are few applications in the humanities that require the
cost and power of super-computing facilities. These machines
are, in my opinion, best used for very complex calculations
representing physical processes. Most humanities applications
can be run very adequately on surprisingly small systems.

Robert Amsler proposes the following tasks might warrant
super-computer applications:
Suppose you wanted to output every collocation in a text
whose frequency as a collocation was at least one quarter of the
frequency of the least frequent isolated word in the collocation.
Or, suppose you wanted to find the average distance in words between
all reoccurrences of words in a text? (That is, in the last sentence
the words `words' and `in' reoccur at distances of 5 and 7 from their
previous occurrences).
I do work on collocations of common words, such as 'femme', in the ARTFL
database (120 million words), using a SUN 3/50 work station. The important
thing is not processing power, but sophisticated software and indicies
of the full texts. There are several such systems, such as PAT and
ARTFL's search engine, which would allow a user to perform both types
of analysis proposed by Amsler on workstation class machines in reasonable
amounts of time. We are currently running the production version of ARTFL
software -- PhiloLogic -- on a Sun 4/110 with 1.2 gigabytes of magnetic
disk. This allows high speed searching, including collocation searches,
of the ARTFL database. The Sun's performance does not degrade even with
half a dozen users performing very large searches on the database at the
same time. A Sun in that configuration is worth well under $20K, including
disk. We have found that workstation and small mini-computers are
sufficiently powerful for very large full-text applications.

Mark



(2) --------------------------------------------------------------25----
Date: Mon, 9 Oct 89 07:28:23 EDT
From: David.A.Bantz@mac.dartmouth.edu
Subject: Re: 3.550 supercomputing the humanities, cont. (47)

<<Here is a text calculation that might consume a lot of cycles.

Suppose you wanted to output every collocation in a text
whose frequency as a collocation was at least one quarter of the
frequency of the least frequent isolated word in the collocation.>>

This task is almost certainly bound by the input of large quantities of (text)
material, rather than computation per se. If it isn't very important to get
the answer in seconds rather than hours, it might even be more efficient
overall to do the calculation on a PC.
(3) --------------------------------------------------------------95----
Date: Mon, 9 Oct 89 12:20:00 EDT
From: "Vicky A. Walsh" <IMD7VAW@OAC.UCLA.EDU>
Subject: Re: 3.543 supercomputing the humanities, cont. (64)

I can't believe how nervous some of you are about supercomputing. It is only
a bigger faster box. U of Minn has a concordance program that runs on the
Cray, e-texts can be easily transported there as needed, would you really
rather wait 45 minutes for TLG output as Bob Kraft recently mentioned or
would you prefer truly interactive access? There are data base programs
now available on the Cray and probably soon a lot more. I know if you just
ask what applications are available from the center consultants you will not
likely get a satisfactory answer; they rarely hire people who have any
knowledge of text based applications; so you shouldn't give up. Ask them
to contact Cray or other universities to see what's being done. I know
there are not really easy to use solutions there yet, but there won't be
if we don't keep asking.
Vicky Walsh, UCLA