13.0464 ELRA News

From: Humanist Discussion Group (willard@lists.village.virginia.edu)
Date: Sat Mar 04 2000 - 20:39:01 CUT

  • Next message: Humanist Discussion Group: "13.0465 DRH2000 cfp extension; workshops"

                   Humanist Discussion Group, Vol. 13, No. 464.
           Centre for Computing in the Humanities, King's College London
                   <http://www.princeton.edu/~mccarty/humanist/>
                  <http://www.kcl.ac.uk/humanities/cch/humanist/>

             Date: Sat, 04 Mar 2000 20:35:07 +0000
             From: "David L. Gants" <dgants@english.uga.edu>
             Subject: ELRA News

    >> From: Valerie Mapelli <mapelli@elda.fr>

    ___________________________________________________________
                                    ELRA
                    European Language Resources Association
                                   ELRA News
    ___________________________________________________________

                         *** ELRA NEW RESOURCES ***

    We are happy to announce new resources available via ELRA

    ELRA-W0020 ICE-GB (British English component of the
    International Corpus of English)
    ELRA-S0077 Telephone Speech Data Collection for Czech
    ELRA-S0078 Finnish Speechdat(II) FDB-1000
    ELRA-S0079 Finnish Speechdat(II) FDB-4000
    ELRA-S0080 Finnish-Swedish Speechdat(II) FDB-1000

    A description of each database is given below.

    _______________________________________
    ELRA-W0020 ICE-GB (British English component of
    the International Corpus of English)
    _______________________________________

    ICE-GB is the British component of the International Corpus
    of English (ICE). ICE began in 1990 with the primary aim
    of providing material for comparative studies of varieties of
    English throughout the world. Twenty centres around the
    world are preparing corpora of their own national or regional
    variety of English.

    ICE-GB is fully grammatically analysed. Like all the ICE
    corpora, ICE-GB consists of a million words of spoken and
    written English and adheres to the common corpus design.
    200 written and 300 spoken texts make up the million words.
    Every text is grammatically annotated, allowing complex and
    detailed searches across the whole corpus.

    ICE-GB contains 83,394 parse trees, including 59,640 in
    the spoken part of the corpus.

    ICE-GB has been fully checked. It was checked by linguists
    at several stages in its completion, using both a traditional
    =91post-checking=92 strategy and also by cross-sectional
    error-based searches.

    ICE-GB is distributed with the retrieval software ICECUP
    (the International Corpus of English Corpus Utility Program).
    ICECUP supports a variety of query types, including the use
    of the parse analyses to construct Fuzzy Tree Fragments to
    search the corpus.

    _______________________________________
    ELRA-S0077 Telephone Speech Data Collection for Czech
    _______________________________________

    This database contains speech collected in Czech Republic
    during summer 1999. The collection was performed at the
    Institute of Radioelectronics of Brno University of
    Technology, Faculty of Electrical Engineering and Computer
    Sciences (VUT Brno) and at the Department of Circuit
    Theory of Czech Technical University in Prague, Faculty of
    Electrical Engineering (CVUT Prague) upon demand of
    Siemens AG, Corporate Technology, Munich. This database
    comprises telephone recordings from 1227 speakers (590
    males and 637 females) recorded directly over the fixed
    telephone network using an ISDN interface.

    Speech files are stored as sequences of 8bit 8 kHz A-law
    uncompressed speech samples. Each prompted utterance
    is stored within a separate file. Each speech file has an
    accompanying ASCII SAM label file according to the
    specifications of the SpeechDat project
    (URL http//www.speechdat.com).

    Corpus contents connected digits (prompt sheet number,
    telephone number, credit card number); sequences of
    isolated digits (5 digits); answers to yes/no questions;
    common application words and phrases.

    The following age distribution has been obtained 36
    speakers are below 16 years old, 537 speakers are between
    16 and 30, 306 speakers are between 31 and 45, 259
    speakers are between 46 and 60, 88 speakers are over 60,
    and 1 speaker whose age is unknown.

    The transcription included in this database is an
    orthographic, lexical transcription with a few details that
    represent audible acoustic events (speech and non speech)
    present in the corresponding waveform files. SpeechDat
    conventions were used in this database.

    ______________________________________
    ELRA-S0078 Finnish Speechdat(II) FDB-1000
    ELRA-S0079 Finnish Speechdat(II) FDB-4000
    _______________________________________

    The Finnish SpeechDat(II) FDB-1000 and FDB-4000
    databases comprise respectively 1000 and 4000 Finnish
    speakers recorded over the Finnish fixed telephone network.
    The SpeechDat database has been collected and annotated
    by the Tampere University of Technology's Digital Media
    Institute. The speech databases made within the
    SpeechDat(II) project were validated by SPEX, the
    Netherlands, to assess their compliance with the
    SpeechDat format and content specifications.

    Speech samples are stored as sequences of 8-bit 8 kHz
    A-law. Each prompted utterance is stored in a separate file.
    Each signal file is accompanied by an ASCII SAM label file
    which contains the relevant descriptive information.

    Each speaker uttered the following items: 1 isolated digit; 1
    sequence of 10 isolated digits; 4 numbers 1 sheet number
    (5 digits), 1 telephone number (9-10 digits), 1 credit card
    number (16 digits), 1 PIN code (6 digits); 1 currency money
    amount; 1 natural number; 3 dates 1 spontaneous date
    (birthdate), 1 prompted date, 1 relative or general date
    expression; 2 time phrases 1 time of day (spontaneous), 1
    time phrase; 3 spelled words 1 spontaneous own forename,
    1 city name, 1 phonetically rich word; 5 directory assistance
    names 1 spontaneous own forename, 1 spontaneous city of
    growing up, 1 frequent city name, 1 frequent company name,
    1 common forename surname; 2 yes/no questions 1
    predominantly yes question, 1 predominantly no question;
    3 application words; 1 word spotting phrase using an
    embedded application word; 4 phonetically rich words; 9
    phonetically rich sentences.

    A pronunciation lexicon with a phonemic transcription in
    SAMPA is also included.

    ______________________________________
    ELRA-S0080 Finnish-Swedish Speechdat(II) FDB-1000
    ______________________________________

    The Finnish-Swedish SpeechDat(II) FDB-1000 comprises
    1000 Finnish speakers uttering speechdat items in the variant
    of Swedish spoken in Finland, recorded over the Finnish
    fixed telephone network. The SpeechDat database has been
    collected and annotated by the Tampere University of
    Technology's Digital Media Institute. The FDB-1000
    database is partitioned into 4 CDs, 3 CDs comprise 300
    speakers sessions, the 4th comprises 100 speakers.
    The speech databases made within the SpeechDat(II)
    project were validated by SPEX, the Netherlands, to assess
    their compliance with the SpeechDat format and content
    specifications.

    Speech samples are stored as sequences of 8-bit 8 kHz
    A-law. Each prompted utterance is stored in a separate file.
    Each signal file is accompanied by an ASCII SAM label file
    which contains the relevant descriptive information.

    Each speaker uttered the following items: 1 isolated digit; 1
    sequence of 10 isolated digits; 4 numbers 1 sheet number
    (5 digits), 1 telephone number (9-10 digits), 1 credit card
    number (16 digits), 1 PIN code (6 digits); 1 currency money
    amount; 1 natural number; 3 dates 1 spontaneous date
    (birthdate), 1 prompted date, 1 relative or general date
    expression; 2 time phrases 1 time of day (spontaneous), 1
    time phrase; 3 spelled words 1 spontaneous own forename,
    1 city name, 1 phonetically rich word; 5 directory assistance
    names 1 spontaneous own forename, 1 spontaneous city of
    growing up, 1 frequent city name, 1 frequent company name,
    1 common forename surname; 2 yes/no questions 1
    predominantly yes question, 1 predominantly no question;
    6 application words; 1 word spotting phrase using an
    embedded application word; 4 phonetically rich words; 9
    phonetically rich sentences

    The following age distribution has been obtained 178
    speakers are below 16 years old, 412 speakers are between
    16 and 30, 216 speakers are between 31 and 45, 160
    speakers are between 46 and 60, and 34 speakers are over 60.

    A pronunciation lexicon with a phonemic transcription in
    SAMPA is also included.

    =

    For further information, please contact:

           ELRA/ELDA Tel +33 01 43 13 33 33
           55-57 rue Brillat-Savarin Fax +33 01 43 13 33 30
           F-75013 Paris, France E-mail mapelli@elda.fr

    or visit our Web site:

           http//www.icp.grenet.fr/ELRA/home.html
           or http//www.elda.fr



    This archive was generated by hypermail 2b29 : Sat Mar 04 2000 - 20:47:35 CUT