9.641 on TUSTEP

Humanist (mccarty@phoenix.Princeton.EDU)
Wed, 20 Mar 1996 20:31:52 -0500 (EST)

Humanist Discussion Group, Vol. 9, No. 641.
Center for Electronic Texts in the Humanities (Princeton/Rutgers)
Information at http://www.princeton.edu/~mccarty/humanist/

[1] From: F12016%BARILAN.BitNet@pucc.Princeton.EDU (89)
Subject: commentary on TUSTEP

Dear Prof. Ott,

I'm not quite sure that HUMANIST is the place for this rather
long reaction to your reaction to Bob Kraft's note, but since
other people may also be interested, I decided to post it here.

First of all, I'm not sure that TUSTEP is a practical answer to
the first of Bob Kraft's "requests". He seems to be dealing
there with two extant versions of the same text, of which only
one will remain and the other will be "lost" in the final output
version, with the user being able to interactively decide at any
given variant which one to keep and which one to lose. I am sure
that something like this could be done in TUSTEP -- my sense is
that anything short of sending a man to the moon can be done in
TUSTEP -- but not interactively. One would have to markup the
texts beforehand and then run the program which would have to be
written (TUSTEP is partially an interpreted higher-level computer
language) to create the new output text. This of course defeats
the entire sense of what Bob wants.

Since you end your note with the invitation "To improve TUSTEP,
we need input from users telling us what they are missing there.
You are warmly invited to contribute", this is my first point.
To the best of my knowledge -- I received my tutorial in TUSTEP
in December 1994 -- there is absolutely no interactivity in
TUSTEP at all. Now some level of interactivity would probably be
relatively trivial to create in TUSTEP, but my sense is that the
basic algorithms are deeply geared to a batch mode of operating.
This doesn't bother me that much, but somehow I doubt that any
such set of programs could become popular in this day and age.

Now on to a more important -- for me -- matter. First let me say
that though I do not use TUSTEP myself -- one cannot just "use"
it, one has to immerse oneself in it -- I have spent many dozens
of hours studying TUSTEP-generated comparison texts. Prof.
Margarete Schlueter of the Goethe University in Frankfurt and I
are involved in a research project studying the history of the
textual transmission of Vayyiqra Rabba, an early midrashic text.
We are inputting all manuscripts of selected chapters, running
a comparison program to generate what is often called a partitur
text, i.e. line-under-line, and then using our meagre mental
abilities to try to determine what went on.

The TUSTEP-generated texts are not sufficient for the purposes of
analysis!

(The following is based upon discussions with Dr. Gottfried Reeg
and Mr. Gert Wildensee of the Freie Universitaet of Berlin, the
former in charge of all computer research activities at the
Institute for Jewish Studies and the latter a research assistant
there, who has also been working for our project, with many years
experience with TUSTEP).

TUSTEP-generated generated partitur texts are based on the TUSTEP
text comparison program which works by comparing all the "other"
texts to one "base" text. This means that when the base text is
lacking -- and there generally is no version of any text which is
not lacking at times -- TUSTEP will not be able to line up the
other texts, but will simply throw them all together just about
helter-skelter until the base text kicks back in and line-up
begins again.

First of all this means that the choice of the base text is
unrealistically crucial -- when beginning to study a text you
simply don't know which has the least text missing (I imagine
this is generally decided by going with the one that has the most
bytes).

But much more importantly, this means that the output text can
never be completely line-under-line, and so the purpose of
generating such a text is defeated.

For this reason, we have had to use -- in addition to the
TUSTEP-generated output -- output generated by a program written
many years ago for the Saul Lieberman Institute for Talmudic
Study of the Jewish Theological Seminary of America, a very
unwieldy, difficult-to-use program whose only function is to
generate a partitur output text of any given (up to 15) input
versions of that text.

This program compares all input texts with all other input
texts, and even more importantly, it has an interactive component
which is run after the pattern matching algorithm finishes, and
so any word of any text can be lined-up with any other word of
any other text or with null.

So going back to your request that users "warmly invited to
contribute", this is what I think TUSTEP needs to be a full
collation program: (1) comparison of all texts to all texts;
(2) interactive ability to line up anything to anything.

As an aside, I did try to run the COLLATE! program of Peter
Robinson on our texts, but it only runs on the MAC, to which I do
not have easy access, and when I did try it, I found that the
relatively large differences between the various manuscript
versions created too many difficulties. (But it is possible I
did not play with it enough).

I had hoped to present a talk on these matters at Bergen this
summer, detailing my experiences with text collation programs but
unfortunately I only wrote to Espen Ore after the deadline for
submissions.

Chaim Milikowsky
Talmud, Bar Ilan