18.063 Joyce? Publication?

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty@kcl.ac.uk)
Date: Tue Jun 15 2004 - 03:46:42 EDT

  • Next message: Humanist Discussion Group (by way of Willard McCarty

                    Humanist Discussion Group, Vol. 18, No. 63.
           Centre for Computing in the Humanities, King's College London
                       www.kcl.ac.uk/humanities/cch/humanist/
                            www.princeton.edu/humanist/
                         Submit to: humanist@princeton.edu

       [1] From: Matt Kirschenbaum <mk235@umail.umd.edu> (6)
             Subject: Re: 18.057 new book: JoyceMedia

       [2] From: "Yuri Tambovtsev" <yutamb@mail.cis.ru> (588)
             Subject: advise me on publication

    --[1]------------------------------------------------------------------
             Date: Tue, 15 Jun 2004 07:54:23 +0100
             From: Matt Kirschenbaum <mk235@umail.umd.edu>
             Subject: Re: 18.057 new book: JoyceMedia

    > A brief note to direct your attention to a new book on Joyce and
    > hypermedia, entitled: JOYCEMEDIA

    Which Joyce?

    ;-)

    Matthew G. Kirschenbaum_____________________________
    _______________________http://www.otal.umd.edu/~mgk/

    --[2]------------------------------------------------------------------
             Date: Tue, 15 Jun 2004 07:53:05 +0100
             From: "Yuri Tambovtsev" <yutamb@mail.cis.ru>
             Subject: advise me on publication

    Dear Hunanist members,
    may I ask you to advise me where I can send for publication my book on the
    use of statistic in linguistics, especially phonology? I used some methods
    of statistics (e.g. standard deviation, confodance intervals, coefficient
    of variation, chi-square, Kolmogorov-Smirnov) to study the compactness of
    Finno-Ugric, Turkic, Tungus-Manchurian, Paleo-Asiatic and Indo-European
    language families, which include 157 world languages. I enclose in this
    message a short description of my book, i.e.
    THE BOOK BY TAMBOVTSEV, Yuri Alekseevich.
    "Typology of functioning of phonemes in a sound chain of Indo-
    European, Palaeo-Asiatic, Ural-Altaic and other world languages:
    compactness of subgroups, groups, families and other language taxons"

         This book is the addition to Tambovtsev's theories, methods
    and data published earlier (Tambovtsev. 1994-a; 1994-b; 2001-a; 2001-b;
    2001-c). I think that linguistics needs new data to support or to reject the
    classical theories. More often than not, linguists argue about this or that
    linguistic theory (e.g. Uralic or Altaic language unities) without any new
    data at hand. This new book by Yuri Tambovtsev provides such new data.
    Speaking about applications of statistical methods in linguistics, one must
    agree with Chris Butler that very often only statistical techniques are
    relevant for some linguistic research because it is difficult otherwise to
    understand the language phenomenon. It is especially important in any
    type of linguistic study involving differences in people's linguistic
    behaviour or in the patterns of language itself (Wray et al., 1998: 255).
    Tambovtsev adds much data on phonological statistics of world languages.
    He is one of the very few linguists who applied phonology to stylistics and
    typology (Teshitelova, 1992: 157 - 181). In this book, as in the previous
    books, Yuri Tambovtsev considers the typology of regulation and chaos of
    distribution of consonant phonemes in a sound chain of world languages.
    In fact, Tambovtsev concentrates on variability in sound chains of world
    languages. Actually, he adds much to the essential parts of his theories and
    methods in the analysed monograph under review, especially on the
    phonostatistical universals of Finno-Ugric, Turkic, Indo-European ans
    other world languages. The author examines the homogeneity of texts in
    various languages from the point of view of the occurrence of phonemic
    groups in their sound speech chains with the help of phonological
    statistics. Tambovtsev also investigates the rules of a sound chain division,
    as well as frequency of occurrence of certain phonemic groups of
    consonants in the phonetic systems of various world languages. Many new
    languages are investigated by his method, in comparison to his previous
    books (Tambovtsev, 1994-a; 1994-b; 2001-a; 2001-b; 2001-c).
         In fact, Yuri Tambovtsev has computed phonostatistical data on the
    occurrence of labial, front (i.e. forelingual), palatal (mediolingual), back
    (velar, pharengeal and glottal), sonorant, occlusive, fricative
    (constrictive)
    and voiced consonants in speech in a great number of languages. It
    comprises 8 phonological features. The articulation system of these
    languages is also discussed in brief. There is as well a short review of
    ethnic history (ethnogenesis) of the nations speaking these languages. The
    author thinks it of great importance to analyse these language contacts
    during the history of their ethnic development.
       As far I can judge, Tambovtsev's first article in the field of phonological
    statistics was published in 1976. So, he has been working on the problems
    mentioned above for a long time, i.e. for some 30 years. Unfortunately, I
    cannot mention all Tambovtsev's publications since he is the author of 8
    monographs and about 250 articles on language typology, phonostatistics
    and phonetics. His study involves the sound pictures of 156 world
    languages. In the book under review, Tambovtsev's conclusions are based
    on the data of the occurrence of the frequency of phonemes in the
    languages of the following families and groups:
    1. Indo - European language family (the language groups: Indo - Aryan (8
    languages), Iranian (4 languages) , Celtic (1 language), Italic (1 language),
    Romanic (5 languages) , Germanic (7 languages) , Baltic (2 languages) ,
    Slavonic (8 languages) , genetically isolated Indo-European languages (5
    languages) , artificial languages(1).
    2. Ural-Altaic language community which include the Uralic and Altaic
    language communities:
    A. Uralic language community, Finno-Ugric language family, Ugric
    subgroup of Finno-Ugric language family (5 languages), Permic
    subgroup of Finno-Ugric language family (2 languages) , Volgaic
    subgroup of Finno-Ugric language family (5 languages) , Balto - Finnic
    subgroup of Finno-Ugric language family (9 languages) , Samoyedic
    language family (3 languages).
    B. Altaic language community, Turkic language family (22 languages) ,
    Mongol language family ( 3 languages).
    3. Tungus - Manchurian language family (6 languages),
    4. Yenisseyic language family (1 language).
    5. Caucasian language family (2 languages).
    6. Palaeo - Asiatic language family (8 languages).
    7. Sino - Tibetan language family (2 languages).
    8. Afro - Asiatic language family (3 languages).
    9. Bantu language family (2).
    10. Austro -Asiatic language family (2).
    11. Austronesian language family (5 languages).
    12. Australian language family (6 languages).
    13. The language community of American Indians (20 languages).
         As a linguist I often feel I must use statistical methods in my studies of
    the English, German and other languages. However, it is hard for a linguist
    to understand how to use them correctly, but at the same time in the easiest
    simple way. The author of the book teaches us how to do it. He does it on
    the example of the following methods of statistical calculation: standard
    quadratic deviation, variation coefficient, level of significance, confidence
    interval, T-criterion of Student, criterion of Kolmogorov-Smirnov, Chi-
    square criterion, and Euclidean distance. He also shows how to measure
    the statistical reliability of the linguistic results. Very often a
    linguist, who
    is a layman in linguistic statistics, may draw wrong linguistical results
    because his results are not statistically reliable.
    The book by Yuri Tambovtsev focuses not only on the mathematical
    statistical methods, which have been employed by him in his linguistic
    research, but also discusses the important problems of classification of
    world languages. The author touches the topics of reliability of
    mathematical statistical methods in linguistics. The target of his research is
    to compare various languages within a single family as well as languages
    belonging to different families and groups. For this sake, Tambovtsev has
    generated mean values of frequency rates of various phonemes and
    phonemic groups in speech. In fact, these mean values provide reliable
    correlation between different languages. There are several mathematical
    methods allowing estimations of variation of major statistical values.
    Tambovtsev aims to estimate regularities in usage of particular phonemes
    or phonemic groups in particular languages. He has chosen several
    methods of variability estimation and described techniques of their
    application to phonetic studies.
    In this respect, the issues of a size of a sample are important. In fact, the
    greater the sample, the more reliable results. One of the most important
    problems is the problem of the size of the portions (units) into which the
    text is divided. The portion should not be too small or too big. Tambovtsev
    correctly takes the generally accepted sample portion in phonological
    research, which is 1000 phonemes. Tambovtsev separates all his texts of
    the languages under discussion into units comprising 1000 phonemes. In
    statistics, the most reliable results are obtained on large samples. Thus,
    Tambovtsev argues that the minimum necessary sample should include not
    less than 30 thousand phonemes.
    The author has applied the method of evaluation of the mean quadratic
    deviation in his research among other methods estimating statistical
    variations. The mean quadratic deviation index is used in generating other
    evaluating indices. Quadratic deviation indices generated for two different
    texts can be compared if the sample sizes of basic texts are equal. Standard
    deviation data cannot be compared if the samples of texts are not equal. In
    cases, when the sample sizes are different, other mathematical functions
    should be used. Tambovtsev correctly chooses the estimation of the
    confidence interval, "chi-square" criterion, coefficient of variance, etc.
    In my opinion, it is important to provide the reader with the exact
    examples of how to calculate the mean quadratic deviation or standard
    deviation because a layman in phonostatistics, as myself, may do it in the
    wrong way. Yuri Tambovtsev provides us with the data on the occurrence
    of the labial consonants in the Old English texts: "Boewulf, Ohthere's and
    Wulfstan's Story, the Description of Britain, Julius Caesar", etc. He
    compares the use of labials in Old English to the analogical use in modern
    English.
    Variation coefficient represents another important tool in comparative
    linguistic research. It helps to compare incommensurable values. As it was
    stated above, the mean quadratic deviation characterises the degree of
    deviation of the frequency rate of a particular phoneme from the mean
    value. However, the mean quadratic deviation values do not take into
    account the fact that the number of labial phonemes is greater that that of
    the mid-lingual (palatal) phonemes. Consequently, the absolute mean
    index of labial sounds is considerably greater than that of the palatal ones.
    On the other hand, front-lingual phonemes are usually more frequent than
    labial. This heterogeneity of features asks for additional methods of
    comparison, i. e. the variation index called the "coefficient of variance".
         Unlike the mean quadratic deviation, the coefficient of variation allows
    correlation of frequency rates of those phonemes and phonemic groups,
    which have produced different mean values. It is possible to make the
    measure of variability comparable using the coefficient of variation. It can
    be used in linguistics in the way it is recommended by Fred Fallik and
    Bruce Brown for behavioural sciences (Fallik et al., 1983: 111 - 112). The
    coefficient of variation is used as an indicator of variation/stability of
    particular linguistic elements in a sample. The minimum necessary size of
    such samples should be not less than 30 units. The larger is the value of
    variation coefficient, the higher is the variability of a particular
    pholological feature (phonemic frequency in this case).
         Another important statistical notion is the significance level. In his
    research Yuri Tambovtsev has chosen the significance level value of 0.05,
    or 5%. To my mind, Tambovtsev chose it correctly since such a level of
    significance is usually used by the majority of researchers in linguistics
    and phonology. This sort of significance level (i.e. 5%) tells us that we
    have 95% confidence in our linguistic research. This significance level. I
    believe, is important in any linguistic research, but especially important for
    correlations carried out on small samples, i.e. in the samples less than 30
    thousand phonemes.
         Confidence interval evaluation is closely related to other statistical
    procedures like estimations of the minimum necessary sample at the fixed
    significance level. Tambovtsev proposes to fix it always at 5%, for a
    layman in statistics not to break his brain over the other possible levels.
    Actually, it is so specific mathematical, that a linguist should not try to
    understand its mathematical foundation. I'm sure, if a linguist learns how
    operate with all necessary statistical criteria correctly, then using only one
    level of significance (e.g. 5%) is quite all right. The higher level of
    significance usually requires larger samples, and thus, much more labour,
    than necessary.
       In certain cases, I guess, one is advised to use the values of the
    confidence
    interval. The confidence interval evaluation is more reliable for
    phonological research since it provides us with a greater precision. The
    general rule is the narrower the confidence interval, the higher is the
    homogeneity of a parameter under discussion, i.e. a frequency parameter
    of a particular phonemic class or phoneme in speech. Usually, a text
    allows us to obtain narrower confidence intervals than the collection of
    phrases and words.
         In his book, the author correctly provides a correlation between these
    three important parameters: sample size and the confidence interval at the
    fixed significance value. Available data have shown that the greater the
    sample size, the lower is the confidence interval at the fixed significance
    level in all languages of the world, irrespective of their genetic affiliation
    or grammatical type.
    Tambovtsev has also paid attention to reliability of statistical results
    obtained in the course of his phonological research. He has received
    indices representing statistical error resulting from the fact that each
    sample represents only some portion of the general language aggregate.
    Such indices are called representation errors. The value of the
    representation error depends mostly on the sample size and on variation
    rate of a particular parameter. It is noteworthy that texts in different
    languages produce similar representation error, which does not depend on
    their morphological structures. This fact suggests a certain universal in
    consonant phonemic groups functioning in genetically different languages.
    However, I think, that Tambovtsev has applied the strictest way of
    estimating the representation error. On the one hand it is bad, since it
    requires larger samples for a fixed error (e.g. the error of 5% or less), but,
    on the other hand, it means that one can be surer of his linguistic result.
    Yuri Tambovtsev rightly mentions that many linguists who use statistics
    do not know that the T-test or "Student's" criterion was proposed by
    William Gosset, and not by some scholar called Student. "Student" was the
    name that William Gosset assumed as a pseudo-name. The Student's
    criterion is employed in cases when it is necessary to compare two mean
    values found for two different texts. The reliability of difference between
    two mean values depends on variability of involved parameters and on the
    sizes of the sample, for which these variables have been generated. The
    "student's" criterion can be applied for variables subordinating to normal
    dispersion. Within a sample of not less than 30 units, dispersion is
    considered normal. In the course of research, the "student's" criterion has
    been calculated for two samples of equal size of 31 thousand phonemes.
    On the one hand, a scientific text was compared with fiction, and on the
    other hand, two scientific texts were compared. The value the former is
    nearly four times greater than the latter. It convinces us that the
    "student's"
    criterion can be applied for the stylistic analysis of texts all right.
    The statistical criterion, called Kolmogorov-Smirnov test, provides
    researchers with mathematical method of analysis, which does not depend
    on the restrictions applied to statistical analyses. It concerns the following
    conditions:
    1) Statistical analyses are carried out with independent accidental
    variables;
    2) Aggregates of accidental variables should demonstrate close mean
    and dispersion values;
    3) Aggregates should subordinate to the law of normal
    dispersion.
    The Kolmogorov-Smirnov criterion belongs to the so-called "robust" non-
    parameter methods, which are not sensitive of deviations from the standard
    conditions. Low values of the Kolmogorov-Smirnov (K-S) criterion mean
    that the fluctuation of the analysed linguistic parameters is minor, that is
    not linguistically significant. Tambovtsev argues that the low value of K-S
    criterion in his research supports his hypothesis on a normal dispersion of
    the established eight groups of consonants within the speech sound chains.
    Representation of any language with the help of eight groups of
    consonants has served as a basis for his phono-statistical research.
    Tambovtsev has also employed the "chi-square" criterion in his
    investigations. With the aid of this criterion, he estimates differences
    between the empirical and expected values. If the difference is
    insignificant, it can be a result of accidental deviation. Otherwise, it
    reflects significant differences between factitious (empirical) and expected
    (theoretical) values of frequencies of phonemic group occurrences in
    speech. L. Bolshev and N. Smirnov (Bolshev et al., 1983: 166 - 171) have
    generated the list of maximum frequency values reflecting insignificant
    fluctuations of variables through the "chi-square" technique, which
    Tambovtsev provides on page 33. It is quite handy because usually
    linguists do not have books on statistics at hand. Christopher Butler
    recommends the chi-square test to measure the independence and
    association of linguistic units in various sorts of linguistic material
    (Butler,
    1985: 118 - 126). Tambovtsev shows how to use it on the material of the
    occurrence of labial consonants in British and American prose (Agatha
    Christie, John Braine, W. S. Maugham, Jack London, F. Scott Fitzgerald,
    Ernest Hemingway, etc.). The chi-square values show that labials are
    distributed rather homogeniously. Tambovtsev draws the attention of the
    reader to calculate the degrees of freedom correctly (p.30). He also
    compares how similar is the distribution of labials, front, palatal, and velar
    consonants in Kalmyk (a Mongolian language) and Japanese (a genetically
    isolated language). It is not by this statistical criterion (p.31).
    However, the
    same criterion shows close similarity between the distribution of the 5
    consonantal groups in Turkish and Uzbek (p.32). The T coefficient is less
    than 1 in 5 parameters, i.e. front, palatal, velar, sonorant and occlusive.
    Tambovtsev explains T coefficient as the ratio of the obtained values of
    chi-square and the theoretical values which can be found in the chi-square
    tables. It T coefficient is less than 1, the statistical results are
    similar p.31 -
    33). It also shows great similarity between some other Turkic, Finno-
    Ugric, Samoyedic, Tungus-Manchurian, Slavonic, Germanic, Iranian and
    other Indo-European languages inside their taxons.
    Chapter 2 is dedicated to the issues of genetic and typological
    classifications of languages of the world. The author does not go into
    details and debates concerning inclusion of certain languages into
    particular genetic groups and families, or identification of a particular
    language as a separate language or a dialect. The major aim of the author
    is to provide a technique, which would allow linguists to check the
    rightfulness of inclusion of a particular language into a certain language
    group or a family. Before analysing the compactness of subgroups, groups,
    families and other language taxons, Tambovtsev warns the reader that the
    problem of the division of world languages into families has not been
    completely solved. For instance, it is quite necessary to discuss the
    problem if Turkic languages constitute a family themselves or a branch in
    some other family, called Altaic family. Actually, Turkic languages are
    considered to form a family by some linguists (e.g. Baskakov, 1966 and
    other Russian linguists). However, some other linguists, especially those in
    the West, consider Turkic languages to be a group within the Altaic family
    spoken in Asia Minor, Middle Asia and southern Asia (Crystal, 1992: 397;
    Katzner, 1986:3). The other two branches of Altaic family are Tungus-
    Manchurian and Mongolian. To my mind, it is more logical to consider
    Turkic languages a family, rather than a subgroup within Altaic family.
    Altaic languages should be called a super family, Sprachbund, language
    community or unity, since the true genetic relationship of Turkic, Tungus-
    Manchurian and Mongolian languages have not been proved. If one goes
    along this line, then all languages on the Earth may be called one family
    with lots of groups and branches. On the other hand, it is not productive to
    form separate language family consisting of one language. For instance, in
    1960s Ket was considered an isolated language of Paleo-Asiatic family
    (Krejnovich, 1968: 453). However, now it is considered to form the so-
    called Yeniseyan family, though consisting of only one language with its
    dialects and subdialects. Summing up the modern point of view, David
    Crystal remarks that Yeniseyan is a family of languages generally placed
    within the Paleosiberian grouping, now represented by only one language -
    Ket, or Yenisey-Ostyak (Crystal, 1992: 424). I don't think it is wise to
    multiply language families like that. Other linguists (e.g. Ago Kunnap,
    Angela Marcantonio, etc.) question the very existence of the Uralic
    language family (Marcantonio, 2002).
    Among other language families, Tambovtsev describes the Finno-Ugric
    family. He argues, that this language family includes two major groups:
    Baltic-Finnic and Ugric groups.
    The author considers the theories of those linguists who identify the
    following four groups in the Finno-Ugric family:
    1) The Baltic-Finnic group including Estonian, Finnish, Karelian,
    Vepsian, Izhorian, Vodian, Livonian, and Saami possessing some specific
    features;
    2) The Volga group including Erzia-Mordovian, Moksha-Mordovian,
    Mountain Mari, and Lawn or Meadow East Mari;
    3) The Permic group comprising Udmurdian, Komi-Zyrian, and Komi-
    Permian;
    4) The Ugric group comprising Hungarian, Manty, and Khansi.
    Together with the Samoyedic language family comprising the Nenets,
    Selkup, Nganasan, and Enets languages.
    The Finno-Ugric and Samoyedic are said to form the Uralic language unit.
    Tambovtsev argues that until present, no fore-language of this unit has
    been established. The languages of the Uralic unit do not form a compact
    unity from the point of view of dispersal and frequency of phonemic
    groups. With the aid of the coefficients that have been received by
    Tambovtsev in his studies, the author has shown that the consonant indices
    and the compactness (dispersion) coefficients suggest a more compact
    unity for Samoyedic languages family (the meanV=18.29%; T=0.16),
    rather than for the Finno-Ugric (the mean V=24.14%; T=0.47). The Uralic
    language unity has a greater dispersion (the mean V=28.31%; T=0.57).
    This fact has been interpreted as a support of the idea that languages of the
    Samoedic and Finno-Ugric family are more closely related to one another
    within the family, than between the families. Thus, the idea of the Uralic
    taxon as a language family should be either rejected or considered with
    caution (p.125).
    The Turkic language group includes Azeri, Baraba-Tatar, Bashkir,
    Gagauz, Karaim, Dolgan, Kazakh, Kamasin, Karakalpak, Karachai-
    Balkarian, Kyrgyz, Crimea-Tatar, Kumyk, Nogai, Tatar, Tofalar, Tuvin,
    Turkish, Turkmenian, Uzbek, Shor, and Yakut. The author argues that a
    Turkic fore-language can be regarded as a real basic language for all the
    Turkic languages. He points out that the Turkic fore-language (Ursprache)
    demonstrates closer relations to any of the present Turkic languages, than
    these languages may have between one another now. However, he did not
    include the Ancient Turkic into his studies because of the uncertainty in
    the pronunciation.
    The Mongolian language family includes only three languages: Buriat,
    Kalmyk, and Mongolian. It is the minimum possible group for statistical
    analysis.
    The Tungus-Manchurian language group includes 10 languages:
    Manchurian, Nanai, Negidal, Oroch, Orok, Solon, Udege, Ulchi, Evenk
    (Tungus), and Even.
    Inclusion of the Turkic, Mongolian and Tungus-Manchurian language
    family into one language unity represents the debatable topic in linguistics
    to day.
    The Indo-European language family seems to be the most thoroughly
    investigated. Major linguistic methods of investigations and comparative
    linguistic analysis were elaborated during the long history of studies of
    European languages. However, currently the major question concerning
    the existence of a single Indo-European fore-language has not been
    resolved.
    It is noteworthy, that many linguistic debates have been often carried out
    in terms of "similarity" and "linguistic distance". Yet, the terms themselves
    have not been clearly defined yet.
    Tambovtsev thinks that at the present state of understanding, modern
    languages represent either products of divergence or the reverse process,
    i.e. convergence. In historical perspective, both processes produced their
    impacts on development of languages. Tambovtsev agrees with those
    researchers who think that origin of all Indo-European languages from a
    single fore-language is fiction, while their co-existence and convergence in
    their development resulting in appearance of certain common features is a
    scientific fact. The noted uniformity of the Indo-European languages can
    be explained as a secondary, later phenomenon, and differentiating
    features represent the original and early characteristics of each language of
    this family.
    However, no classifications other than the genealogic one have been
    elaborated, Tambovtsev accepts the following classification of the Indo-
    European family: the Indian, the Iranian, the Baltic, the Slavonic
    (including Eastern, Western, and Southern Slavonic sub-groups),
    Germanic, Romanic, and Celtic language groups.
    Following Illich-Svitych, Tambovtsev believes that the Nostratic language
    unity can serve as a good model for linguistic investigations of various
    sorts, but he does not think these languages should be considered a
    language unity; moreover, this rather arbitrary construct is not recognised
    by all the linguists. The Nostratic language unity includes the following
    language families: Indo-European, Finno-Ugrian, Samoyedic, Turkic,
    Mongolian, Tungus-Manchurian, Cartvelian, and Semito-Hamitian.
    Tambovtsev proposes a concept of compactness for linguistic studies. He
    defines compactness as more or less closely related languages within
    language sub-groups, groups, families, etc. In other words, he attempts to
    measure the distance between languages within analysed taxons or
    clusters. The distances are measured on the basis of frequency rates of
    particular linguistic (phonological) characteristics.
    The author uses the concepts of image recognition and regards language
    families as a unit with more of less compact structure. In the branch of
    applied mathematics called pattern recognition different images of various
    sorts are recognised. One can consider language to be a sort of such image.
    Therefore, one can use the methods of pattern recognition to develop
    various types of classifications based on exact values of some coefficients
    (Zagorujko, 1999: 195 - 201). The generated index of compactness can be
    regarded as an indicator of an opposing process of diffusion. Values of
    frequency rate of particular parameter should not considerably deviate
    from the mean value established for a given language family or group. If
    the values of deviation are considerably greater than the established mean
    value, the given language does not belong to the language family under
    discussion. If majority of languages produce these deviation indices higher
    than the mean value, we should state that the languages under study do not
    form a language group but rather a set of separate languages.
    Tambovtsev has forwarded his hypothesis that typological similarity of
    languages can be tested by statistical methods resulting in generation a set
    of indices described above. The hypothesis holds that when a language is
    included into a particular language group, the generated indices of this new
    formation will show either a
    higher or lower compactness. Closely related language would increase the
    compactness indices and vice versa.
    The author illustrates this presupposition by a series of examples. Thus, he
    analyses frequency rates of labial consonants in the Turkic languages
    compared to Mongolian. The frequency of labial consonants in Mongolian
    is 7.52%. In the Turkic languages the relevant figures vary from 5.98% to
    12.80%. The total fluctuation index is 6.28, the difference between the
    neighboring languages is 0.49. The Altai language has produced the lowest
    index of labial consonant frequency, while the Karakalpakian has shown
    the highest index. The Turkic languages can be classified in the following
    way by the labial consonant frequency indices: Karakalpakian - 12.80%;
    Turkish - 10.41%; Uigur - 9.83%; Azerbajanian - 9.66%; Uzbekian -
    9.42%; Kumandinian - 9.22%; Baraba-Tatarian - 9.04%; Turkmenian -
    8.50%; Kirgizian - 8.43%; Kazakn-Tatarian - 8.03%; Kazakhian - 7.99%;
    Khakassian - 7.82%; Yakutian - 6.10%, and Altaian - 5.98%. The place of
    the Mongolian language (7.52%) is between Khakassian and Yakutian
    suggesting the distribution of labial consonants is more similar in these
    three languages compared to other languages of the Turkic group.
    The Mongolian group has produced the following indices: Mongolian
    (7.52%), Buriatian (7.67%), and Kalmykian (6.65%). This distribution
    indices fall within the same range as above - from 5.98% to 12.80%, while
    the total fluctuation and the difference between the neighboring languages
    are lower (1.02 and 0.34 respectively).
    The Uralian language unity yields the labial frequency indices in the range
    of 7.71% - 13.72%, the difference between the neighboring languages is
    0.30. Indices of language group compounding Mongolian and Tungus-
    Manchu languages are from 7.52% to 12.46%, with the mean difference
    between the neighboring values of 0.70. Consequently, we may infer on
    considerable differences in the sound chains of the Mongolian and the
    Tungus-Manchurian languages.
    On the contrary, introduction of the Mansi language belonging to the
    Finno-Ugrian language family, on which language Turkic and Mongolian
    languages did not produced considerable influence, into the Turkic
    languages increases the diffusion index of this group. Consequently, the
    Mansi language, unlike Mongolian, does not belong to the Turkic language
    group.
    Analysis of frequency rates of the front (i.e. forelingual) consonants may
    serve as another example of compactness of Turkic and Mongolian
    languages. Front-lingual consonants represent the most frequent sounds in
    the Turkic languages as well as in many other languages of the world. The
    range of frequency of front-lingual sounds in the Turkic languages varies
    from 32.35% to 40.24%. The overall fluctuation index is 7.89, the
    difference between the neighboring languages (the mean difference) is
    0.564. In Mongolian, the range of frequency of front-lingual sounds is
    36.57%of the total number of sounds. The mean difference for a
    compound group of Turkic languages and Mongolian becomes lower
    (0.526). The relevant figures found for the UraliĀ languages are: frequency
    range 24.79% - 36.78%; the fluctuation index is 11.99; the mean
    difference is 0.6. Apparently, the Turkic language group is more compact
    than the Uralic.
    The Mongolian and Tungus-Manchu language families have yielded
    similar indices in the range of 17.31% to 36.57%; the fluctuation index is
    19.26; the mean difference is 2.75.The Paleo-Asian group of languages
    represent still less compact group, their frequency rates varying from
    20.02% to 36,74%; the fluctuation index is 16.64; the mean difference is
    2.38.
    The author provides frequency indices on many languages and language
    groups. In order to show the general tendency in the distribution of speech
    sounds he proposes to use the general coefficients of variation resulting
    from adding generated indices on each group of phonemes. He also uses
    the T coefficient, which is generated on the basis of "chi-square" index, as
    a reference index. The resulting general coefficients of variation (V) allow
    him to form the following sequence. The Ugric language group
    demonstrates the highest diffusion (V = 221.27%, T = 3,77). The Baltic-
    Finnish languages yield V = 185.90%, T=2,79). The group of Volga
    languages is the most compact group with V =143, 19, T=1.02).
    Another interesting method of comparative analysis implies introduction
    of isolates Asian languages into various language families in order to
    establish possible relationships. Thus, introduction of the Ket language
    into the Finnish-Ugric family (V = 193.13%, T = 3.77) results in the
    higher diffusion (V =198.04, T = 3.94). The same procedure with
    Yukaghir yields V = 199.17%; with Korean V is 199.24%, T = 3.88; with
    Japanese V is 200.51%, T = 3.91; Nivkhi yields V = 206.48%. On the
    contrary, Chinese has shown closer similarity with the Finno-Ugric
    languages: V = 190.01%, T = 3.65.
    As a result of his investigations, Tambovtsev has come to the following
    conclusions:
    1) Front (forelingual) and occlusive consonants are most evenly
    distributed within language families.
    2) Voiced consonants represent the most variable feature; some
    languages have no category called "voiced" consonants.
    3) The Mongolian language family is the most compact by the total
    sum of the values of the coefficient of variation based on seven major
    groups of phonemes (without voiced consonants) and the coefficient T.
    The consequence with respect to total sum of the coefficient of variation
    has been established as follows: the Mongolic, the Samoyedic, the Turkic,
    the Tungus-Manchurian, and Finno-Ugric language families. The Paleo-
    Asiatic language family has yielded the highest diffusion (i.e. the lowest
    compactness) indices and consequently can be regarded not as a language
    family but as a loose language unity or community.
    4) The general tendency has been shown that in general a language
    sub-group is more compact that a group, and a group is more compact that
    a language family. The least compact, that is the most loose, is the
    language super-unity comprising all the languages of the world.
    5) A collection of two language groups or two families into one
    unit results in a higher diffusion characteristics than the original taxons.
       All I can say is that the book by Yuri Tambovtsev is a solid and profound
    investigation in the comparative analysis of the languages of the world.
    The author provides many tables with indices and coefficients generated
    through various techniques for a great number of languages. Analysis of
    these data provides linguists with a method of linguistic investigations on
    the basis of numerical procedures. The book contains a large list of
    references. It is recommended to those students, who are interested in
    phonology, linguistical statistics and typology of world languages. I guess
    that at the moment, many linguists are dealing with minor linguistic
    problems in one language. Linguistics lacks such books, which deal with
    the modern classification of world languages. Tambovtsev's book may
    give the new material for such language classifications.
         Being a linguist by education, I naturally was scared to discuss
    statistics
    methods without the consultation of the specialists in mathematical
    statistics. I must thank for consultations and generous advice Prof. Dr.
    Arkadiy Shemiakin, Prof. Dr. Vadim Efimov, Prof. Dr. Leonid Frumin and
    Prof. Dr. Valeriy Yudin.

    References
    Bolshev et al., 1983 - Bolshev, Login Nikolaevich and Nikolai Vasilyevich
    Smirnov. Tables of Mathemetical Statistics. - Moskva: Nauka, 1983. - 416
    pages. (in Russian).
    Butler, 1985 - Butler, Christopher. Statistics in Linguistics. - Oxford:
    Basil Blackwell, 1985. - 214 pages.
    Fallik et al., 1983 - Fallik, Fred and Bruce Brown. Statistics for Behavioral
    Sciences. - Homewood, Illinois: The Dorsey Press, 1983. - 538 pages.
    Marcantonio, 2002 - Marcantonio, Angela. The Uralic Language Fimily:
    Myths and Statistics. - Oxford: Blackwell Publishers, 2002. - 335 pages.
    Tambovtsev, 1994 -a - Tambovtsev, Yuri. Dinamika funktsionirovanija
    fonem v zvukovyh tsepochkah jazykov razlichnogo stroja. [Dynamics of
    functioning of phonemes in the languages of different structure]. -
    Novosibirsk: Novosibirsk University Press, 1994-a. - 133 pages.
    Tambovtsev, 1994-b - Tambovtsev, Yuri. Tipologija uporjadochennosti
    zvukovyh tsepej v jazyke. [Typology of Oderliness of Sound Chains in
    Language]. - Novosibirsk: Novosibirsk University Press, 1994-b. - 199
    pages.
    Tambovtsev, 2001-a - Tambovtsev, Yuri. Kompendium osnovnyh
    statisticheskih harakteristik funktsionirovanija soglasnyh fonem v
    zvukovoj tsepochke anglijskogo, nemetskogo, frantsuzkogo i drugih
    indoevropejskih jazykov. [A compendium of the major statistical
    characteristics within the paradigm of consonant phonemes functioning in
    the sound chains of the English, German, French, and other Indo-European
    languages.] - Novosibirsk: Novosibirsk Classical Institute, Novosibirsk,
    2001. - 129 pages.
    Tambovtsev, 2001-c - Tambovtsev, Yuri. Nekotorye teoreticheskie
    polozhenia tipologii uporiadochennosti fonem v zvukovoi tzepochke
    yazyka i kompendium statisticheskikh kharakteristik osnovnykh grupp
    soglasnykh fonem. [Theoretical concepts of typology of the order of
    phonemes in language sound chains and a compendium of statistical
    characteristics of the main groups of consonant phonemes]. -
    Novosibirsk: Novosibirsk Classical Institute, 2001. - 130 pages.
    Tambovtsev, 2003 - Lingvisticheskaja taksonomija: kompaktnost'
    jazykovyh podgrupp, grupp i semej. [Linguistical taxonomy: coppactness
    of language subgruops, groups and families]. - In: Baltistika, Volume 37, #
    1, (Vilnius), 2003, p. 131 - 161.
    Teshitelova, 1992 - Teshitelova, Marie. Quantitative Linguistics. -
    Amsterdam/Philadelphia: John Benjamins publishing company, 1992. -
    253 pages.
    Wray et al., 1998 - Wray, Alison; Trott, Kate and Aileen Bloomer with
    Shirley Reay and Chris Butler. Projects in Linguistics: A Practical Guide
    to Researching Language. - London and New York: Arnold, 1998. - 303
    pages.
    Zagorujko, 1991 - Zagorujko, Nikolaj Grigorjevich. Applied Methods of
    Data and Knowledge Analysis [in Russian]. - Novosibirsk: Institute of
    Mathematics of the Siberian Branch of the Russian Academy, 1999. - 268
    pages.
       Yuri Tambovtsev, Dept of Linguistics and English of NPU, P. O. Box 104,
    Novosibirsk-123, 630123. Russia.
    e-mail: <mailto:yutamb@hotmail.com>yutamb@hotmail.com



    This archive was generated by hypermail 2b30 : Tue Jun 15 2004 - 04:04:33 EDT