18.541 the plain text story

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty_at_kcl.ac.uk>
Date: Sat, 29 Jan 2005 10:07:02 +0000

               Humanist Discussion Group, Vol. 18, No. 541.
       Centre for Computing in the Humanities, King's College London
                   www.kcl.ac.uk/humanities/cch/humanist/
                        www.princeton.edu/humanist/
                     Submit to: humanist_at_princeton.edu

   [1] From: Norman Gray <norman_at_astro.gla.ac.uk> (41)
         Subject: Re: 18.535 why do we get "=20"?

   [2] From: Norman Hinton <hinton_at_springnet1.com> (7)
         Subject: Re: 18.536 plain text

   [3] From: "Eric H." <eric.homich_at_utoronto.ca> (14)
         Subject: re: plain text and LaTeX

   [4] From: Michael Hart <hart_at_pglaf.org> (51)
         Subject: Re: 18.536 plain text

--[1]------------------------------------------------------------------
         Date: Sat, 29 Jan 2005 09:56:02 +0000
         From: Norman Gray <norman_at_astro.gla.ac.uk>
         Subject: Re: 18.535 why do we get "=20"?

Greetings,

> Date: Fri, 28 Jan 2005 09:37:14 +0000
> From: Norman Hinton <hinton_at_springnet1.com>
> >
>Why do we get messages (like the one this responds to) in which every line
>ends with "=20" ? I never read them -- it's too intrusive.

Was this intentionally sly wit? These sequences attest that plain text is
more complicated than one might expect.

The sequences indicate that the original message was sent in MIME's
`quoted-printable' encoding. MIME is the way that more structured email
messages are sent: when you send an attachment, it's MIME that specifies
how that extra file is encoded within the fundamentally ASCII medium of a
mail message.

If there are any non-ASCII characters in a message you send -- most
commonly accented characters -- then your mail client may decide to send
them verbatim, with a warning in the mail headers that it has done so; or
it may encode them into pure-ASCII, again with a suitable indication in the
headers. This latter is probably unnecessary these days, but commendably
conservative. Quoted-printable encoding is a fairly light encoding, which
replaces all `problematic' characters with '=' plus a two-digit (hex)
number. Line-ending spaces count as problematic in this context, so are
replaced by =20, since the space character is number 20 (hex) in
ASCII. When this is finally displayed by your mailer, it sees the header
indicating the encoding, undoes it, and displays what was originally
injected into the system.

What happened with the message Norman refers to is that, somewhere, a
program ignored the extra descriptive headers when it shouldn't have, or
failed to add them when it should have. I suspect that some mail clients
send quoted-printable mail when they don't have to, or send it without the
correct headers.

Thus this still, I think, counts as `plain text' in the sense that there is
no _markup_ enhancing the text. The encoding we're seeing is an accidental
part of the message rather than substantial, since it's a detail of the
mechanism which, through an error, has become visible.
Possibly fancifully, it's analogous to a book with signatures bound out of
order, or uncut.

Norman

--
----------------------------------------------------------------------
Norman Gray  :  Physics & Astronomy, Glasgow University, UK
http://www.astro.gla.ac.uk/users/norman/  :  www.starlink.ac.uk
--[2]------------------------------------------------------------------
         Date: Sat, 29 Jan 2005 09:56:40 +0000
         From: Norman Hinton <hinton_at_springnet1.com>
         Subject: Re: 18.536 plain text
The "death of plain text" is like "the death of the novel" or "the death of
theater" or "the death of scholarly meetings" -- it just doesn't happen.
And listening to the forecasts gets very tiresome after a decade or two. (I
recall the idea being bruited about at CHum meetings as long ago as the
late 70s or early 80s, with almost precisely the same declarations made.)
If anything replaces plain text, Unicode is the best bet, it seems to me --
and there are no signs it will happen anytime soon, if ever.
--[3]------------------------------------------------------------------
         Date: Sat, 29 Jan 2005 09:57:15 +0000
         From: "Eric H." <eric.homich_at_utoronto.ca>
         Subject: re: plain text and LaTeX
Hello all:
Speaking of plain text, I've been wondering whether people in the
humanities community are aware of LaTeX? (More at
http://www.latex-project.org/ ) LaTeX is a document preparation system that
is free, and available for Windows, Mac, and Unix (I'm not sure about
Linux). It is not, however, WYSIWYG, but instead uses markup commands. Has
anyone used LaTeX? If so, I would be interested in your experiences with
it, as well as experiences and opinions on WYSIWYG vs. markup-based text
editors.
Thanks,
Eric Homich
PhD student, Faculty of Information Systems
University of Toronto
eric.homich_at_utoronto.ca
--[4]------------------------------------------------------------------
         Date: Sat, 29 Jan 2005 09:58:11 +0000
         From: Michael Hart <hart_at_pglaf.org>
         Subject: Re: 18.536 plain text
Interesting that those contrary to plain text resorted to the
"reductio ad absurdum fallacy" and "assassination by wit,"
rather than actually adressing the differences at hand in any
kind of rational manner.
BTW, that particular argument was already part of our history
before that meeting.
Obviously, when I started Project Gutenberg in 1971, we were
using 6-bit ASCII, CAPS only, limited punctuation, symbols, etc.
However, when 7-bit ASCII became available, we quickly remade
the ALL CAPS files, which was labor intensive, but since there
weren't that many files, and since they weren't that long in
those days, it was certainly feasible.
Equally, when 8-bit ASCII came along, Project Gutenberg started
to make files available in both 7-bit and 8-bit ASCII, so those
who could only read-write 7-bit characters could still do eBook
volunteers work, and receive the results as well, and also, the
8-bit readers-writers could work with a greater number of ASCII
accented characters, symbols, etc., including those for French,
Greek, etc.
However, the movement beyond 8-bit ASCII has, as mentioned, not
be as smoothly implemented, and various markup schemes tried to
take over the world, each in their own ways:  each with its own
claims to success. . .but the overall identity of post-8-bit is
obviously still in serious doubt, other than HTML.
While Project Gutenberg is presenting eBooks in a wider variety
of formats than is generally available elsewhere, we are in the
process to making eBooks in a format that allows us to deliver,
on demand, eBooks translated into nearly every non-proprietary,
freely readable format.
However this will take a while, and there is little guarantee a
file in any of these formats will be all that useful, 3 decades
down the road, even though our original 6-bit ASCII files could
still be read, in CAPS, on most of today's computers. . .with a
possible exception of HTML, which might last a long time.
At this time, Project Gutenberg still receives enough emails of
thank you notes for plain text, from those who use audioreaders
and those who are visually impaired, to those who simply are as
grateful as can be for eBooks that they can read, search, quote
and generally use as they please in virtually any program their
hearts and minds desire, without being forced into an unlimited
cycle of "upgrades" that often tend to be "downgrades."
It's not that plain text offers EVERYTHING. . .it's just that a
plain text file offers over 99% of what most authors wrote in a
library of books currently freely available for download.
How much more effort is it worth to get from 99% to 99.5% ???
For some, it's worth the moon and the stars, while for others a
plain text eBook provides all they ever wanted.
It all depends on your target audience. . . .
My target audience is "everyman."
Michael S. Hart
Project Gutenberg
Received on Sat Jan 29 2005 - 05:12:08 EST

This archive was generated by hypermail 2.2.0 : Sat Jan 29 2005 - 05:12:11 EST