9.207 TuSTEP does most of it

Humanist (mccarty@phoenix.Princeton.EDU)
Tue, 3 Oct 1995 21:32:18 -0400 (EDT)

Humanist Discussion Group, Vol. 9, No. 207.
Center for Electronic Texts in the Humanities (Princeton/Rutgers)
http://www.princeton.edu/~mccarty/humanist

[1] From: Wilhelm Ott <zrlot01@zdv.uni-tuebingen.de> (87)
Subject: "Swiss army knife" or toolbox for text processing?

Paul Schaffner, in Humanist Vol 9 No 195, lists some features he looks for
in vain combined in a single text-processing program.

May I direct your attention to a text processing system which offers
many of these features (attention, I may be biased, since I am
one of its developers):

Of the 6 features missed by Schaffner, 3 are fully available, the other 3
at least partially in TUSTEP, the Tuebingen System of Text Processing Programs
developed since the mid-60ies at the University of Tuebingen (and still
being alive: available for DOS, VMS and many UNIX versions,
by licence agreement, without fee for academic institutions):

o Decent footnote formatting, typical for a critical edition: check it by
having a look at Gabler's edition of James Joyce's "Ulysses" (New York:
Garland 1984), or "Dubliners" (1993), which have been prepared and
typeset using TUSTEP (cf Gablers afterword on pp. 1906 ff of "Ulysses").
Up to 9 apparatuses ara allowed at page bottom, in addition to
"ordinary" footnotes.

o Regular-expression search and replace: in a syntax different from that
used by the well known UNIX tools (and perhaps more flexible than these),
TUSTEP offers it for different purposes (e.g., for text modification,
but also for building sort keys). Their analytical power (which goes far
beyond that shown below) may be demonstrated by a short example
which identifies elision in Latin poetic texts:
- define a character group containing vowels:
>1z aeiouy
- define a second group for inter-word characters (space, punctuation):
>2z ., :;!?'"()-
- Search for a pattern consisting of:
- - a vowel (">1")
- - optionally ("><") the letter "m"
- - an arbitrary number of ("<>") inter-word characters (">2")
- - an optional ("><") letter "h"
- - a further vowel (">1")
and replace the characters found by converting them in
upper case and enclosing the whole string in square brackets:
xx ->1><m<>>2><h>1-[>+01>+02>=03>+04>+05]-
This converts, e.g., the line
hinc repetit. 'paucorum hominum et mentis bene sanae.'
into
hinc repetit. 'paucor[UM HO]min[UM E]t mentis bene sanae.'

o ability to handle large files easily, quickly and safely:
the interactive editor (which does not include formatting functions)
allows you to work on text files of up to 7 GB (= 7000 MB) length.
Speed: on an IBM RS 6000/580 under AIX, it takes less than 15 seconds
of CPU-time to format a book of 1000 pages (55 characters/line,
38 lines/page) for typesetting (the time not including the
PostScript driver). It takes less than 2 seconds to scan the same
text in search for a non-occurring character string (full-text search).

At least partially, TUSTEP includes also

o SGML support: since TUSTEP is open for (almost) any encoding
conventions, it has been successfully used also for
working on texts (including formatting and typesetting)
with SGML markup or for generating SGML markup from other
encoding schemes. However, it is up to the user to write down
the tags used.

o 16-bit-characters for multi-linqual computing:
TUSTEP uses 8-bit, but offers easy-to-use and efficient
multi-lingual computing (which means not only printing)
by the use of shift-sequences for coding non-latin
characters (e.g., #g+ for greek or #h+ for hebrew, #a+
for arabic). Quotations in hebrew, arabic, syriac
may be written in "reading sequence"; when formating
the text, TUSTEP provides for the correct "writing" sequence,
from right to left, also over line-breaks.

o Complete customizability for sort-sequences is guaranteed:
regular expression search and replace operations are only
one of the features available for building the sort keys.
The complexity of the rules to be adopted is limited only
by the working storage available to the program.
Regarding keyboard, printer drivers and file naming conventions,
TUSTEP shows sufficient flexibility for many purposes.

I agree that information on this package is not avaliable in every
comptuer store or magazine; and the necessity to be short and to list
every software only once makes it perhaps hard to find it even in a
guidebook like Ian Lancashire's "The Humanities Computing Yearbook
1989-1990" where it is listed under "18.3 Collation, Stemmatics and
Textual Editing".

I further agree that it is not the "Swiss army knife" type of software,
but a toolbox for professionals knowing how to analyse their text
processing needs, how to break down a complex task into more basic
functions and then to select the adequate tool for every step.

Apologies for the length of this contribution.

Wilhelm Ott

----------------------------------------------------------------------
Prof. Dr. Wilhelm Ott phone: +49-7071-292933
Universitaet Tuebingen fax: +49-7071-296958
Zentrum fuer Datenverarbeitung e-mail: ott@zdv.uni-tuebingen.de
Brunnenstrasse 27
D-72074 Tuebingen