14.0309 Latin letter frequency

From: by way of Willard McCarty (willard@lists.village.Virginia.EDU)
Date: 10/05/00

  • Next message: by way of Willard McCarty: "14.0310 readings & thoughts on hyperlinking"

                   Humanist Discussion Group, Vol. 14, No. 309.
           Centre for Computing in the Humanities, King's College London
                   <http://www.princeton.edu/~mccarty/humanist/>
                  <http://www.kcl.ac.uk/humanities/cch/humanist/>
    
    
    
             Date: Thu, 05 Oct 2000 06:20:44 +0100
             From: "Jim Marchand" <marchand@ux1.cso.uiuc.edu>
             Subject: Re: 14.0296 Latin letter frequency
    
    I think that, to be of real help, you are going to have to have better than
    monogram statistics.  You need di- and trigram statistics to be of help in
    decoding.  What would be of much, much greater help would be transitional
    probabilties between words.  For example, in the English sentence (in
    non-poetic texts):"The door was hermetically ..." the next word is _sealed_,
    i.e. 100 percent.  The best you can probably do if you have a corpus as
    large as that of Perseus is to do a markov analysis, but even that is quite
    large.  A good Latinist will have many of the transitional probabilities in
    his stomach, but at some time or other, we will have transitional
    probabilities for each word in our corpora (corpus linguistics), an enormous
    aid for machine translation and/or decoding. "The meaning of a word is a
    function of the totality of its environments."
    
    -----Original Message-----
    From: Humanist Discussion Group <willard@lists.village.virginia.edu>
    To: Humanist Discussion Group <humanist@lists.Princeton.EDU>
    Date: Wednesday, October 04, 2000 5:44 AM
    
    
      >              Humanist Discussion Group, Vol. 14, No. 296.
      >      Centre for Computing in the Humanities, King's College London
      >              <http://www.princeton.edu/~mccarty/humanist/>
      >             <http://www.kcl.ac.uk/humanities/cch/humanist/>
      >
      >        Date: Tue, 3 Oct 2000 04:14:49 -0700 (PDT)
      >        From: "[iso-8859-1] Melissa Terras" <melslists@yahoo.com>
      >        Subject: Re:  Latin Letter Frequency?
      >
      >Thanks to all who have replied to this thread, its
      >been very helpful. I was very aware that I asked a
      >vague "how long is a piece of string" question, and
      >know that the use of classical corpora is fraught with
      >a number of problems.
      >
      >What I'm actually doing is undertaking a statistical
      >analysis of the Latin in the Vindolanda Writing
      >Tablets to help propogate some probabilities that will
      >help the  papyrologists at Oxford read the Vindolanda
      >Stylus Tablets, which are so deteriorated they are
      >practically illegible. I havent found much of this
      >type of work done before with Latin - or any language
      >in humanities research- although natural language
      >processing and cryptography have developed many
      >techniques to undertake this kind of "code-cracking",
      >and so I'm adopting some of those. Or plan to at the
      >moment ;)
      >
      >Nevertheless, the pointers given on the list have
      >given me plenty to chase up. Thanks.
      >
      >Melissa
      >
      >___________________________________________
      >Melissa M Terras MA MSc
      >Engineering Science / Centre for the Study of Ancient
      >Documents
      >Christ Church
      >University of Oxford
      >Oxford 0X1 1DP
    



    This archive was generated by hypermail 2b30 : 10/05/00 EDT