5.0183 Etaoin Shrdlu Lives (5/180)

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Mon, 24 Jun 91 21:56:13 EDT

Humanist Discussion Group, Vol. 5, No. 0183. Monday, 24 Jun 1991.


(1) Date: Sun, 23 Jun 91 19:53:18 BST (54 lines)
From: Martin Wynne <LNP5MW@CMS1.LEEDS.AC.UK>
Subject: ETAOIN SHRDLU

(2) Date: Sun, 23 Jun 91 15:55 EDT (63 lines)
From: Michel LENOBLE <LENOBLEM@umtlvr.bitnet>
Subject: Letter frequency

(3) Date: Fri, 21 Jun 1991 16:39:54 EDT (35 lines)
From: J_CERNY@UNHH.UNH.EDU
Subject: yet more comments on etaoinshrdlu.

(4) Date: Mon, 24 Jun 91 10:35:59 EDT (16 lines)
From: Lorne Hammond <051796@UOTTAWA>
Subject: etaoin shrdlu

(5) Date: Sun, 23 Jun 91 11:02:51 CST (12 lines)
From: (James Marchand) <marchand@ux1.cso.uiuc.edu>
Subject: etaoin

(1) --------------------------------------------------------------------
Date: Sun, 23 Jun 91 19:53:18 BST
From: Martin Wynne <LNP5MW@CMS1.LEEDS.AC.UK>
Subject: ETAOIN SHRDLU

I've just collated a few statistics that should enlighten the
debate on the frequency of occurence of letters in English.
The following list is of letter frequencies in the Lancaster-
Oslo/Bergen corpus of modern British English (approximately
1 million words of 'real' text).


Total number of letters sampled = 4553156

e - 577230
t - 418668
a - 364302
o - 345419
i - 330074
n - 323360
s - 293976
r - 281270
h - 255365
l - 188647
d - 181973
c - 133292
u - 125487
m - 112287
f - 106172
g - 89612
w - 88413
p - 85086
y - 81787
b - 70994
v - 45186
k - 30182
x - 10081
j - 6462
q - 5079
z - 2752


Not surprisingly, the results for a dictionary (counting just the
headwords) were very different. I obtained the following list from
the computer-usable versoin of the Oxford Advanced Learners'
Dictionary:

ENTSRL AIDOPM UKCBGH FYUWZJ XQ
(compare to the above list:
ETAOIN SRHLDC UMFGWP YBVKXJ QZ )

I've got the breakdown of the stats for the different text categories
in LOB, and the stats for the dictionary. I'm also going to check
the Brown corpus of American English to see if it yields the same list.
I'd be happy to pass these stats on if anyone's interested.
(2) --------------------------------------------------------------71----
Date: Sun, 23 Jun 91 15:55 EDT
From: Michel LENOBLE <LENOBLEM@umtlvr.bitnet>
Subject: Letter frequency

Letter frequency in english.

It is very strange that a group of people devoting their research
time to "humanistic computing" would indulge in such long subjective
debates about the frequency of letters in the english language without
trying to come with a "real bit of evidence" in support of their assertions.
I was pleased that at least one had the idea to run a simple program
to have the frequency of the letters within his corpus counted. I can
give you, with the help of my colleague Bernard Derval a table of letter
frequency sorted both by alphabetical and frequencial order for the well
known and entire Brown Corpus.



Alphabetique | Frequence
-------------+-----------
378602 a | 588441 e
72257 b | 435707 t
145711 c | 378602 a
186853 d | 357304 o
588441 e | 342873 i
108816 f | 333890 n
91690 g | 307900 s
255817 h | 288319 r
342873 i | 255817 h
7549 j | 194577 l
30946 k | 186853 d
194577 l | 145711 c
119566 m | 127675 u
333890 n | 119566 m
357304 o | 108816 f
94928 p | 94928 p
5039 q | 91690 g
288319 r | 88639 w
307900 s | 81175 y
435707 t | 72257 b
127675 u | 46948 v
46948 v | 30946 k
88639 w | 9320 x
9320 x | 7549 j
81175 y | 5039 q
4466 z | 4466 z

--
Michel Lenoble
Litterature Comparee
Universite de Montreal
C.P. 6128, Succ. "A"
MONTREAL (Quebec)
Canada - H3C 3J7
E-MAIL: lenoblem@cc.umontreal.ca
--
 
Bernard DERVAL                         pavillon principal
Dept I.R.O.                            bureau S-160-3
Universite de Montreal
C.P. 6128, succ. A                     tel : (514) 343-6111 poste 3497
Montreal, Quebec                       fax : (514) 343-5834
H3C 3J7                             e-mail : derval@iro.umontreal.ca
(3) --------------------------------------------------------------42----
Date:    Fri, 21 Jun 1991 16:39:54 EDT
From:    J_CERNY@UNHH.UNH.EDU
Subject: yet more comments on etaoinshrdlu.
 
The on-going commentary on 'etaoinshrdlu' prompted me to dig into some
15-year old material I'd saved involving letter frequenceies -- I hope
someone else is aware of newer work and will give us some pointers to it.
 
R.L. Solso and J.F. King published "Frequency and Versatility of Letters
in the English Language," Behavior Research Methods and Instrumentation,
1976, v.8, n. 3, pp.  283-286.  They reported the frequency order as
'ETOAINSRHLDCUMFPGWYBVKXJQZ'.  The analysis was based on frequency count
of about one million words in Kucera and Francis, Computational Analysis
of Present-Day American English, 1967, Brown University Press [which I
have not seen].  Solso and King published a paper on bigram and trigram
letter frequencies, based on the same source of words.
 
Earlier, in the seminal book by Claude Shannon and Warren Weaver, The
Mathematical Theory of Communication, 1969, Univ.  Illinois Press, letter
probabilities are based frequencies given in the book F. Pratt, Secret
and Urgent, 1939, Blue Ribbon Books.
 
There are some thought-provoking examples and discussions of letter
probabilties in the textbook by William R. Bennett, Jr., Introduction to
Computer Applications for Non-Science Students (BASIC), 1976,
Prentice-Hall.  [I have no idea how long the market-life of this book has
been.] For example, Bennett discusses the letter frequencies presented by
Edgar Allan Poe in "The Gold Bug" and wonders where Poe got the idea that
the English letter frequency list should be 'EAOIDHNRSTUYCFGLMWBKPQXZ'
and notes that putting 'T' tenth on the list makes solving the cipher
much more obscure, that even in Poe's own writing 'T' is the second most
frequent!
 
Jim Cerny, Computing and Information Services, Univ. N.H.
j_cerny@unhh.unh.edu
 
(4) --------------------------------------------------------------22----
Date:         Mon, 24 Jun 91 10:35:59 EDT
From:         Lorne Hammond <051796@UOTTAWA>
Subject:      etaoin shrdlu
 
A good summary is the chapter "In Memoriam Etaoin Shrdlu" in Hugh
Kenner's The Mechanical Muse, Oxford UP, 1987, pp.  3-16.  It is slug of
a rejected but perhaps not ejected line of type and consists of the
letters of the two leftmost columns.  However my favorite is a photo I
saw in a 1920s newspaper of an artists ball in New York or Paris in
which a young woman had this written down her side as part of her
costume.  Kenner's book, by the way is wonderful.
 
                                          Lorne Hammond
                                          Department of History
                                          University of Ottawa
                                          canada
                                                K1N 6n5
(5) --------------------------------------------------------------22----
Date: Sun, 23 Jun 91 11:02:51 CST
From: (James Marchand) <marchand@ux1.cso.uiuc.edu>
Subject: etaoin
 
David Kahn, The Codebreakers (NY: Macmillan, 1967), 741 ff. repeats the
Morse story (about 1838), but adds that it was Mergenthaler himself who
decided that "the letter matrices in his Linotype should be arranged in
order of the demand for each letter."  The keyboard for a Linotype
machine, for those of you who have not used one, is depicted on p. 742,
with the first line: etaoin and the second shrdlu, the third cmfwyp, the
fourth vbgkqj, and the last xz,fi, fl, ff, ffi.  Jim Marchand.
Jim Marchand