9.756 e-texts and scholarship

Humanist (mccarty@phoenix.Princeton.EDU)
Sun, 28 Apr 1996 18:27:02 -0400 (EDT)

Humanist Discussion Group, Vol. 9, No. 756.
Center for Electronic Texts in the Humanities (Princeton/Rutgers)
Information at http://www.princeton.edu/~mccarty/humanist/

[1] From: "Gregory J. Murphy" <rejek@phoenix.Princeton.EDU> (98)
Subject: Re: 9.751 e-texts and scholarship

Dear Fellow Humanists,

The recent thread on e-texts and scholarship repeats a dozen
similar conversations held over the past decade on this list and others.
Though the nature of my work at CETH requires me to deal with marked-up
texts and with issues of mark-up on a regular basis, I am by no means a
partisan. I have no philosophical notions of its purpose or merits, and I
have a very sober appreciation of its practical limitations.

Nonetheless, I feel compelled to comment on some of the
statements made in the context of this thread.

First of all, it seems to me that much of the disagreement over
the value of markup is predicated on a false distinction. Markup's
detractors speak of "plain," "average," or "simple" text, electronic
mirrors of their print counterparts (or, more exactly, of the letters,
spaces and punctuation contained therewithin). But as Steve DeRose
pointed out in a recent posting, even this represents a level, if not of
markup, at least of encoding. You are able to download a text from
Project Gutenberg and read it as is only because a long, long time ago
there was some agreement made about which sequence of bits would be used
to represent which characters. This point seems silly of course until
you contemplate the current difficulties one experiences when trying to
download a text encoded, say, in ISO-8879: unless I tell my mail program
to decode bits according to ISO-8879, my mail from France looks perfectly
horrid.
If we have agreed de facto to standards for encoding at one level
of abstraction, does it require such a leap of faith to desire them at
higher levels?
Most of Gutenberg's pundits who have posted to this group mention
that they enjoy being able to "browse" through a variety of texts, by
which is usually meant a combination of simple word searches followed by
some reading in context before and after the match point. I am perfectly
happy to label this kind of activity an "application," one towards which
the level of encoding - 126 codes, to be exact - found in a Gutenberg
etext is well suited.
More advanced applications require more advanced levels of
encoding. Some of those 126 codes (or 257, or 2^nbits in whatever word
size meets your needs) must be made to form sequences that
mark certain events. The events are understood by humans to have some
meaning, e.g., this is where chapter one begins. The computer need only
know of the sequence itself, and its syntactic relationship to all other
meaningful sequences.
The codes you use might look like BETA code, or COCOA, or yes,
even SGML. The codes may interfere with the original application -
reading - unless of course you change software from one which
handles only the original application to one which handles the higher
level application, and also, presumably, inherits the properties of those
it replaces. BETA code is very nicely readable in Pandora, for example.
At this point, a quite reasonable objection is frequently made: why
should I have to buy specialized software to deal with encoded text? The
answer, of course, is that you already have. Notepad is a software that
specializes in translating 1-byte words, minus 1 bit, into characters of
the Latin alphabet, and mapping them to a display font.

Given all this, I will risk a few direct comments on statements
made about etexts thus far in the thread.

Richard Tuerk writes:

> [e-texts] make finding quotations extremely easy. If, for example, I'm
> working with _Alice in Wonderland_, and I discover that I've forgotten to
> note the location of a particular quotation, I can find it in a matter of
> seconds using my etext version. Then, all I have to do is use the location
> in the etext to locate it in a more reliable hard copy version.

Without any structural mark-up, how exactly is a computer
application supposed to construct a citation? When you have found that
long sought-after hapax, how do you know "where you are?" If your search
software is aware of logical units like chapters, pages, or verses, the
answer is easy. If it isn't, my God, do you scroll backwards until you
discern a page number? What if the typist decided pages meant nothing in
an electronic environment? Personally, I like computers to save me labor.

Dennis Cintra Leite writes:

>I, for one, have read several of Project Gutenberg's e-texts. I've copied
>many of them on disquetes and given them away to friends. I don't
>particularly give a d... if they are "unreliable, unattributed,
>unmarked-up texts". I'm not a medieval monk who dedicates his life to
>minutiae. If I want to read Thoreau's Walden or Hardy's Tess of the
>d'Urbervilles for the sheer joy of it (obviously many scholars have lost
>the capacity for doing this) the text needn't be pristine. An
>approximation is more than enough. Thus, to me, friends and other such
>unscholarly riffraff who've savored M.Harts offerings, Project Gutenberg
>is "quite valuable".

Humans beings are really great at deciphering approximations, at
being approximate... that's what makes being human so much fun.
Computers aren't so comfortable with the notion. They can approximate
only systematically, which means according to a set of rules, which
means, either hard-coded rules (the way your chip calculates
floating-point numbers, for example) or, yes, you can see it coming,
according to encoded rules. If you don't believe me, take a look at the
topic of random number generation in any elementary book on algorithm
design.
What if the ASCII encoding scheme became a political issue, people
took sides, factions arose. Would you be comfortable "approximating"
character mappings?
OK, I'm arguing ad absurdiam, but my point is a simple one: to the
extent that electronic texts are for electronic processing, there is no
room for approximation. At all. Period. I'm not a medieval monk either,
which is why I like to have my computer do my work for me.

Flames and other forms of verbal abuse warmly welcomed.

- Gregory Murphy
Text Systems Manager
CETH, the Center for Electronic Texts in the Humanities

NOTE: opinions expressed do not necessarily reflect those of CETH or
Rutgers University.