12.0279 announcements

Humanist Discussion Group (humanist@kcl.ac.uk)
Wed, 28 Oct 1998 19:44:32 +0000 (GMT)

Humanist Discussion Group, Vol. 12, No. 279.
Centre for Computing in the Humanities, King's College London
<http://www.princeton.edu/~mccarty/humanist/>
<http://www.kcl.ac.uk/humanities/cch/humanist/>

[1] From: Michel Bernard <Michel.Bernard@univ-paris3.fr> (41)
Subject: [iso-8859-1] Programme de s=E9minaire : Claude Simon,
num=E9riquement

[2] From: "David L. Gants" <dgants@english.uga.edu> (236)
Subject: ELRA News

[3] From: "David L. Gants" <dgants@english.uga.edu> (48)
Subject: Partenariat / simulation /prototype

--[1]------------------------------------------------------------------
Date: Wed, 28 Oct 1998 12:11:46 +0100
From: Michel Bernard <Michel.Bernard@univ-paris3.fr>
Subject: [iso-8859-1] Programme de s=E9minaire : Claude Simon, num=
=E9riquement

[The following text is in the "iso-8859-1" character set]
[Your display is set for the "US-ASCII" character set]
[Some characters may be displayed incorrectly]

Veuillez trouver ci-dessous le programme du s=E9minaire doctoral "Claude
Simon, num=E9riquement". Les s=E9ances sont ouvertes =E0 tous.

---------------------------------------------------------------------------=
-
Universit=E9 de la Sorbonne-Nouvelle (Paris III) - Ann=E9e universitaire
1998-1999

DEA. S=E9minaire de M. B=C9HAR
avec le concours du Centre de recherche Hubert de Phal=E8se (JE 420)

Mardi 18-20 h (par quinzaine) salle 516 =E0 Censier.

CLAUDE SIMON, NUM=C9RIQUEMENT : R=C9TROLECTURE DU JARDIN DES PLANTES AU
TRICHEUR.

La num=E9risation des donn=E9es favorise une approche nouvelle des ^=DCuvre=
s
litt=E9raires. Le Centre de recherches Hubert de Phal=E8se ayant num=E9ris=
=E9 la
totalit=E9 des =E9crits de Claude Simon, nous =E9tudierons son dernier roma=
n, Le
Jardin des plantes, en le mettant en relation avec l^=D2ensemble de l^=D2^=
=DCuvre, =E0
l^=D2aide des diff=E9rents outils d^=D2analyse disponibles. Il va de soi qu=
e ces
instruments techniques ne sont que des auxiliaires de recherche. Aucune
connaissance de l^=D2informatique n^=D2est requise.

Deux pistes de r=E9flexion :
* Place du roman dans l^=D2^=DCuvre de Claude Simon
* Utilit=E9 des outils informatiques pour l^=D2=E9tude litt=E9raire (=E0 c=
et =E9gard,
les premi=E8res s=E9ances seront consacr=E9es =E0 des questions int=E9ressa=
nt autant l
^=D2=E9quipe de recherche que les =E9tudiants de DEA).

1. 3 novembre Directives. Pr=E9sentation du programme. G=E9n=E9ralit=E9s=
sur l^=D2
ELAO et sur Claude Simon
2. 17 novembre Lecture dynamique d^=D2un texte litt=E9raire (s=E9ance an=
im=E9e par
J.-P. Goldenstein).
3. 1er d=E9cembre Les postulats d^=D2Hubert de Phal=E8se : Beckett =E0 l=
a lettre,
avec Michel Corvin.
4. 15 d=E9cembre Po=E9tique et informatique, par Alckmar Luis dos Santos
(Universit=E9 F=E9d=E9rale de Santa Catarina, Br=E9sil)
5. 12 janvier Le vocabulaire simonien
6. 26 janvier Th=E9matique(s)
7. 9 f=E9vrier Intratextualit=E9
8. 9 mars Intertextualit=E9
9. 23 mars R=E9f=E9rentialit=E9
10. 6 avril Chronographie
11. 4 mai Place dans l^=D2histoire litt=E9raire
12. 18 mai Images et rh=E9torique (s=E9ance anim=E9e par Pascal Mougin)

--[2]------------------------------------------------------------------
Date: Wed, 28 Oct 1998 11:30:05 -0500 (EST)
From: "David L. Gants" <dgants@english.uga.edu>
Subject: ELRA News

>> From: Val=3DE9rie?=3D Mapelli <mapelli@elda.fr>

___________________________________________________________
ELRA
European Language Resources Association
ELRA News=3D20
___________________________________________________________

*** ELRA NEW RESOURCES ***

We are happy to announce new speech resources available via ELRA:

1) ELRA-S0052 FIXED0IT - Italian Fixed Network Speech (SpeechDat(M)) Corpus=
=3D
-
DB1
2) ELRA-S0053 FIXED0IT - Italian Fixed Network Speech (SpeechDat(M)) Corpus=
=3D
-
DB2
3) ELRA-S0054 Chilean Spanish FDB-250
4) ELRA-S0055 Russian SpeechDat-like FDB-1000
5) ELRA-S0056 Slovenian SpeechDat(II) FDB-1000
6) ELRA-S0057 Shanghai Mandarin FDB-1000
7) ELRA-S0058 RVG1 (Regional Variants of German 1, Part 1)

Below a description of each resource:

1) ELRA-S0052 FIXED0IT - Italian Fixed Network Speech (SpeechDat(M)) Corpus
DB1 Phonetically rich sentences & application oriented utterances

The Italian Fixed Network Speech Corpus version 1.0 was recorded within the
scope of the SpeechDat(M) project (LRE-63314), funded by the European
Commission. Recording was done by using a primary rate ISDN interface,
yielding
8 kHz, 8 bits per sample, A-law coded signal. The data files are formatted
according to the SAM European project. The speech data are compressed with=
=3D
the
GNU gzip program. All software needed to use the corpus is provided on the
CDs.

The corpus contains the speech of about 1000 speakers (about 500 male and=
=3D
500
female) and was designed to support the creation of voice-driven=3D
teleservices.
The callers spoke at least 39 items, comprising:
=3DB7 isolated and connected digits,
=3DB7 natural numbers,
=3DB7 money amounts,
=3DB7 spelled words,
=3DB7 time and date phrases,
=3DB7 yes/no questions,
=3DB7 city names,
=3DB7 common application words,
=3DB7 application words in phrases,
=3DB7 phonetically rich sentences.
Most items are read, some are spontaneously spoken.

The recordings come with extensive and standardised documentation. All=3D
speech
is carefully transcribed at the orthographic level; in addition, a number o=
f
clearly audible non-speech events are included in the transcription.=3D
Moreover,
age and regional background of the speakers are provided. A pronunciation
dictionary is added, containing all words that occur in the corpus, with a
corresponding SAMPA broad-class phonemic transcription.

Validation and premastering of the CD-ROMs were performed by the Speech
Processing Expertise Centre (SPEX), Leidschendam, The Netherlands.

Price for ELRA members:
for research use: 11000 ECU
for commercial use: 14000 ECU

Price for non members:
for research use: 20000 ECU
for commercial use: 20000 ECU
____________________________________________

2) ELRA-S0053 FIXED0IT - Italian Fixed Network Speech (SpeechDat(M)) Corpus
DB2 Phonetically rich sentences sub-set

See ELRA-S0052 for description. DB2 is a sub-set of DB1; it contains only=
=3D
the
phonetically rich sentences items.

Price for ELRA members:
for research use: 8,800 ECU
for commercial use: 14,000 ECU

Price for non members:
for research use: 14,000 ECU
for commercial use: 20,000 ECU
____________________________________________

3) ELRA-S0054 Chilean Spanish FDB-250

This speech database gathers Spanish data as spoken in Chile. All=3D
participants
are native speakers. The corpus consists of read speech, including digits=
=3D
and
application words for teleservices, recorded through an ISDN card. The whol=
e
database consists of 6.45 hours of speech, with 24 utterances per speaker.
There is a total of 250 speakers (68 male, 80 female, 102 untagged). Except
for
the 102 untagged speakers, the age class is divided as follows: 15 speakers
are
less than 16 year old, 72 speakers are between age 16 to 30, 44 speakers ar=
e
between age 31 to 45, and 14 speakers are between age 46 to 60 (and 102
untagged).

The callers spoke 74 different items in total:
=3DB7 isolated digits,
=3DB7 yes/no,
=3DB7 common application words.

The data is provided with orthographic transliteration for all 6,000
utterances
including 4 categories of non-speech acoustic events. A phonetic lexicon=3D
with
canonical transcription in SAMPA is also included.

The speech files are stored as sequences of 8 bits 8 kHz A-law samples. Dat=
a
are stored in a SAM file format.

Price for ELRA members: 5,000 ECU
Price for non members: 7,500 ECU
____________________________________________

4) ELRA-S0055 Russian SpeechDat-like FDB-1000

This speech database gathers Russian data. The corpus consists of read and
spontaneous speech, recorded through an ISDN card, and was validated and
accepted according to the SpeechDat(II) database exchange format. The whole
database consists of 72 hours of speech, with approx. 49 prompted utterance=
s
per speaker. A total of 1000 speakers was recorded (500 male, 500 female).
These are native speakers from 5 regions, mainly from Moscow and St.
Petersburg
(803 speakers). The speakers age class is divided as follows: 16 speakers=
=3D
are
less than 16 year old, 340 speakers are between age 16 to 30, 345 speakers=
=3D
are
between age 31 to 45, 255 speakers are between age 46 to 60, and 44 speaker=
s
are above age 60.

The callers spoke the following items:
=3DB7 isolated and connected digits,
=3DB7 natural numbers,
=3DB7 money amounts,
=3DB7 spelled words,
=3DB7 time and date phrases,
=3DB7 yes/no,
=3DB7 city names,
=3DB7 common application words,
=3DB7 application words in phrases,
=3DB7 phonetically rich sentences.

The data is provided with orthographic transliteration for all 48,812
utterances including 4 categories of non-speech acoustic events. A phonetic
lexicon with canonical pronunciation is also provided.

The speech files are stored as sequences of 8 bits 8 kHz A-law samples. The
data is stored in a SAM file format (4 CD-ROMs).

Price for ELRA members: 14,000 ECU
Price for non members: 20,000 ECU
____________________________________________

5) ELRA-S0056 Slovenian SpeechDat(II) FDB-1000

The Slovenian SpeechDat(II) FDB-1000 consists of read and spontaneous=3D
speech,
recorded through an ISDN card, and was validated and accepted according to=
=3D
the
SpeechDat(II) database exchange format. The corpus includes about 1000
speakers
(about 500 male and 500 female) who called over the Slovenian fixed network=
=2E
All are native speakers of Slovenian from all dialect regions of Slovenia.

The callers spoke the following items:
=3DB7 isolated and connected digits,
=3DB7 natural numbers,
=3DB7 money amounts,
=3DB7 spelled words,
=3DB7 time and date phrases,
=3DB7 yes/no,
=3DB7 city names,
=3DB7 common application words,
=3DB7 application words in phrases,
=3DB7 phonetically rich sentences.

The speech files are stored as sequences of 8 bits 8 kHz A-law samples. The
data is stored in a SAM file format (CD-ROMs). A phonetic lexicon with
canonical transcriptions in SAMPA is also provided.

Price for ELRA members: 14,000 ECU
Price for non members: 20,000 ECU
____________________________________________

6) ELRA-S0057 Shanghai Mandarin FDB-1000

This acoustic database gathers Mandarin data, as spoken in Shanghai as a=3D
first
or second Chinese dialect/language. The corpus consists of read speech,
including digits and application words for teleservices, recorded through a=
n
ISDN card. A total of 70 utterances was prompted by each speaker. About 100=
0
speakers were recorded (500 male, 500 female).

The callers spoke the following items:
=3DB7 isolated digits,
=3DB7 yes/no,
=3DB7 city names,
=3DB7 common application words and phrases.

The data is provided with Chinese characters and English translation,
canonical
Pinyin transcription including tone markers, and several categories of
non-speech events.

The speech files are stored as sequences of 8 bits 8 kHz A-law samples.=3D
Signal
and annotation files are stored separately.

Price for ELRA members: 10,000 ECU
Price for non members: 15,000 ECU
____________________________________________

7) ELRA-S0058 RVG1 (Regional Variants of German 1, Part 1)

The corpus consists of single digits, connected digits, phone numbers,
phonetically balanced sentences, computer command phrases and spontaneous
speech. Each speaker has read a subcorpus of 85 items:
=3DB7 11 single digits (0-9, with the two pronunciations of 2 (=3D91z=
wei=3D
=3D92,
=3D91zwo=3D92)),
=3DB7 19 connected digits (10-19, 20-100 in steps of ten),
=3DB7 12 computer command phrases,
=3DB7 30 phonetically balanced sentences,
=3DB7 5 6-digit phone numbers,
=3DB7 5 7-digit phone numbers,
=3DB7 2 phone numbers with area code,
=3DB7 1 minute spontaneous speech (monologue).

The speaker was placed in front of a standard IBM-compatible PC. The=3D
backround
noise was limited to the usual noise in office environment, eg. door slam,
backround crosstalk, phone ringing, paper rustle, PC noise, etc. The head o=
f
the speaker is in a range between 2-4 feet to the screen, 1-2 feet from the
desktop microphones. The speaker is not forced into a special position. The
speaker is wearing a Sennheiser HD 410 and is free to use the keyboard or=
=3D
the
mouse in front of him. The three desktop microphones are: Sennheiser MD 441=
=3D
U,
Telex (Soundblaster) and Talk Back (AT&T). Speakers were selected to achiev=
e
the demoscopic density of the German spoken areas in Europe (including=3D
Austria
and Switzerland).

The recorded sound samples are stored in NIST SPHERE format. The resolution=
=3D
is
16 Bits. The sampling frequency is 22.050 Hz except for speakers 001 to 036
which were recorded with 11.025 Hz. Each microphone channel is stored into =
a
separate file. A transliteration of spontaneous speech according to=3D
Verbmobil
Format is also provided.

RVG1, Part 1 contains 197 speakers recorded through 2 microphones.
(RVG1, Part 2, with 303 speakers recorded through 2 microphones will be
available from the beginning of 1999.)

Price for ELRA members:
for research use: 4,949 ECU
for commercial use: 8,198 ECU

Price for non members:
for research use: 5,838 ECU
for commercial use: 9,898 ECU

=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=
=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=3D
=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D=3D3D
For further information, please contact :

ELRA/ELDA Tel : +33 01 43 13 33 33
55-57 rue Brillat-Savarin Fax : +33 01 43 13 33 30
F-75013 Paris, France E-mail : mapelli@elda.fr

or visit our Web site:

http://www.icp.grenet.fr/ELRA/home.html

--[3]------------------------------------------------------------------
Date: Wed, 28 Oct 1998 11:26:46 -0500 (EST)
From: "David L. Gants" <dgants@english.uga.edu>
Subject: Partenariat / simulation /prototype

>> From:=09"dominique dutoit" <MEMODATA@wanadoo.fr>

Bonjour,

Avant de finaliser ses produits, notre soci=3DE9t=3DE9 recherche trois =3D
entreprises pour effectuer trois manipulations de qualification du =3D
Semiographe.
Les soci=3DE9t=3DE9s retenues pourront =3DEAtre soit :
- des utilisateurs TAL avertis, avec une =3DE9quipe disponible
- des soci=3DE9t=3DE9s agissant dans le secteur TAL.

Les manipulations de qualification sont des r=3DE9alisations de prototypes =
=3D
op=3DE9rationnels dans les secteurs suivants, une d=3DE9finition pr=3DE9cis=
e =3D
pouvant =3DEAtre adapt=3DE9e aux besoins du partenaire :

- Polys=3DE9mie : d=3DE9termination du sens d'un mot en contexte;
application possible : choix d'une traduction.
- R=3DE9sum=3DE9 de texte/indexation/Diffusion s=3DE9lective d'informations=
: =3D
d=3DE9termination des mots cl=3DE9s importants d'un texte du point de vue =
=3D
th=3DE9matique
application possible : indexation plein texte d=3DE9sambigu=3DEFs=3DE9e et =
=3D
enrichie
- Diffusion s=3DE9lective d'information/acc=3DE8s =3DE0 des nomenclatures
application possible : routage, indexation automatique sur un plan de =3D
classement, acc=3DE8s =3DE0 des nomenclatures.

Le sch=3DE9ma fonctionnel de ces trois applications sera pour nous assez =
=3D
semblable, aussi est-il possible pour nous d'op=3DE9rer dans une m=3DEAme =
=3D
unit=3DE9 de temps.

Les fonctions utilis=3DE9es par les partenaires seront les API JAVA du =3D
s=3DE9miographe pour ces applications ainsi que l'outil de gestion du =3D
dictionnaire int=3DE9gral (=3DE9galement =3DE9crit en Java).

Le s=3DE9miographe sera finalis=3DE9 courant novembre. Une version 5 langue=
s =3D
existe -moins riche que pour le fran=3DE7ais. Pour le fran=3DE7ais, la =3D
version actuelle g=3DE8re 185.000 mots-sens.

Dans le cas o=3DF9 le partenaire est un =3DE9ventuel client final du =3D
syst=3DE8me, ce dernier pourra b=3DE9n=3DE9ficier de conditions =3D
particuli=3DE8res de vente.

Nous saluons amicalement ceux de cette liste avec lesquels nous avons =3D
d=3DE9j=3DE0 travaill=3DE9 et adressons nos meilleures salutations aux autr=
es=3D20

Dominique Dutoit
Mail :dutoit@info.unicaen.fr

The INTEGRAL DICTIONARYtm DICOLOGICtm The largest world model for =3D
linguistic computing and semantic analysis.=3D20
LEXIDIOMtm
SEMIOGRAPHtm Computational semantic for five languages =3D
(French, English, German, Italian, Spanish)=3D20
and... Phonetic (French, English), morphology (French, English, =3D
German, Italian, Spanish), Syntactic parsing (French)....
Full-text Indexing (Bibiotexttm), resources for search =3D
engines... =3D20
=3D20
Adresse : MEMODATA - 17 rue Dumont d'Urville - 14000 CAEN (FRANCE) =3D
Tel: (33)02.31.35.75.21 - fax: (33)02.31.35.75.28

-------------------------------------------------------------------------
Humanist Discussion Group=20
Information at <http://www.kcl.ac.uk/humanities/cch/humanist/>
<http://www.princeton.edu/~mccarty/humanist/>
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D