4.1129 Responses: Sentence Length; Linguistic Databases (2/53)

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Tue, 5 Mar 91 20:37:33 EST

Humanist Discussion Group, Vol. 4, No. 1129. Tuesday, 5 Mar 1991.


(1) Date: Tue, 5 Mar 91 08:14:56 cet (33 lines)
From: Jan-Gunnar Tingsell <jgt@glader.hum.gu.se>
Subject: Re: Q: sentence length distribution in corpora

(2) Date: Tue, 5 Mar 91 17:03:28 GMT (20 lines)
From: DJT18@hull.ac.uk
Subject: Re: 4.1109 Queries

(1) --------------------------------------------------------------------
Date: Tue, 5 Mar 91 08:14:56 cet
From: Jan-Gunnar Tingsell <jgt@glader.hum.gu.se>
Subject: Re: Q: sentence length distribution in corpora

>
>Typically, the sentence length distribution in a corpus looks like this:
>
> !
> ! ***
> ! * **
> ! * **
> ! **
> !* ***
> ! *****
> ! ********
> ! ******************
> +------------------------------------------------
>

The general form of the above function is
-nx
y=nxe , where x>=0

The curve goes through origo and has a maximum (1/e) for x=(1/n).
For n=1 the "top" is rather broad and goes narrower when n increases.

This statistical distribution formula represents the chi-square
distribution and discribes some psychological and physical phenomenons,
for example the distribution of velocities of the molecules in a gas.

I think most standard books in statistics will handle this function.


(2) --------------------------------------------------------------28----
Date: Tue, 5 Mar 91 17:03:28 GMT
From: DJT18@hull.ac.uk
Subject: Re: 4.1109 Queries (3/59)

Linguistic databases:

Termdok, on CD-ROM, includes Finnish, Norwegian and Swedish. Suppliers
will have information about other languages. They are Multi Lingua, 61
Chiswick Staithe, Hartington Road, London W4 3TP, tel 081 995 0478, fax
081 747 1853.

Termtracer is a memory-resident, on-line dictionary, which includes
Italian, Dutch, Norwegian and Swedish, of the languages you listed.
Termtracer is available from INK International BV, Baarsjesweg 224,
1058-AA Amsterdam, The Netherlands, tel 31 20 164591, fax 31 20 163851.
Please mention the CTI Centre for Modern Languages if you decide to
contact these suppliers.

Regards,
June Thompson, CTI Centre for Modern Languages, University of Hull, UK