11.0251 LDC releases; New Chorus

Humanist Discussion Group (humanist@kcl.ac.uk)
Sat, 30 Aug 1997 08:42:15 +0100 (BST)

Humanist Discussion Group, Vol. 11, No. 251.
Centre for Computing in the Humanities, King's College London
<http://www.princeton.edu/~mccarty/humanist/>
<http://www.kcl.ac.uk/humanities/cch/humanist/>

[1] From: "David L. Gants" <dgants@parallel.park.uga.edu> (45)
Subject: New Corpus from the Linguistic Data Consortium

[2] From: "David L. Gants" <dgants@parallel.park.uga.edu> (53)
Subject: New Collection from the Linguistic Data Consortium

[3] From: "David L. Gants" <dgants@parallel.park.uga.edu> (23)
Subject: New Chorus

--[1]------------------------------------------------------------------
Date: Fri, 29 Aug 1997 18:54:33 -0400 (EDT)
From: "David L. Gants" <dgants@parallel.park.uga.edu>
Subject: New Corpus from the Linguistic Data Consortium

>> From: LDC Office <ldc@unagi.cis.upenn.edu>

Announcing a NEW RELEASE from the
LINGUISTIC DATA CONSORTIUM

Boston University Radio Speech Corpus

The Boston University Radio Speech Corpus was collected by Mari
Ostendorf of Boston University, primarily to support research in
text-to-speech synthesis, particularly generation of prosodic
patterns. The corpus consists of professionally read radio news data,
including speech and accompanying annotations, suitable for speech and
language research.

The corpus includes speech from seven (4 male, 3 female) FM radio news
announcers associated with WBUR, a public radio station. The main
radio news portion of the corpus consists of over seven hours of news
stories recorded in the WBUR radio studio during broadcasts over a two
year period. In addition, the announcers were also recorded in a
laboratory at Boston University. In this, the lab news portion, the
announcers read a total of 24 stories from the radio news portion.
The announcers were first asked to read the stories in their non-radio
style and then, 30 minutes later. to read the same stories in their
radio style.

Each story read by an announcer was digitized in paragraph size
units, which typically include several sentences. The files were
digitized at a 16k Hz sample rate using a 16 bit A/D. The paragraphs
were annotated with the orthographic transcription, phonetic
alignments, part-of-speech tags and prosodic markers. The
orthographic transcripts were generated by hand and include
indication of where the speaker took a breath. The phonetic
alignments and part-of-speech tags were generated automatically and
hand corrected. The prosodic labels were marked by hand and are
available only for a subset of the corpus.

Institutions that have membership in the LDC for either the 1996 or
1997 Membership Year will be able to receive the BU Radio Corpus
at no additional charge, in the same manner as all other speech
corpora published by the LDC.

Nonmembers can receive a copy of this corpus for research purposes
only for a fee of US$400. If you would like to order a copy of this
corpus, please email your request to ldc@unagi.cis.upenn.edu. If you
need additional information before placing your order, or would like
to inquire about membership in the LDC, please send email or call
(215) 898-0464.

Further information about the LDC and its available corpora can be
accessed on the Linguistic Data Consortium WWW Home Page at URL
http://www.ldc.upenn.edu/. Information is also available via ftp at
ftp.cis.upenn.edu under pub/ldc; for ftp access, please use
"anonymous" as your login name, and give your email address when
asked for password.

--[2]------------------------------------------------------------------
Date: Fri, 29 Aug 1997 18:56:15 -0400 (EDT)
From: "David L. Gants" <dgants@parallel.park.uga.edu>
Subject: New Collection from the Linguistic Data Consortium

>> From: LDC Office <ldc@unagi.cis.upenn.edu>

Announcing a NEW RELEASE from the
LINGUISTIC DATA CONSORTIUM

CALLFRIEND Collection in 12 Languages
and 3 Dialect Comparisons

The CALLFRIEND project supports the development of language
identification technology. Calls were collected in the following
languages: American English, Canadian French, Egyptian Arabic, Farsi,
German, Hindi, Japanese, Korean, Mandarin, Spanish, Tamil, and
Vietnamese. Two major dialect groups were collected for English,
Mandarin, and Spanish. The dialect comparison groups include: southern
vs. non-southern American English, Caribbean Spanish vs. non-Caribbean
Spanish, and Mainland Mandarin (China) vs. Mandarin as spoken in
Taiwan.

Each CALLFRIEND language consists of 60 unscripted telephone
conversations, lasting between 5 and 30 minutes. The corpora also
include documentation describing speaker information (sex, age,
education, callee telephone number) and call information (channel
quality, number of speakers).

For each conversation, both the caller and callee are native speakers
of the designated language. All calls are domestic and were placed
inside the continental United States, Canada, Puerto Rico, or the
Dominican Republic.

Institutions that have membership in the LDC for either the 1996 or
1997 Membership Year will be able to receive the CALLFRIEND materials
at no additional charge, in the same manner as all other speech
corpora published by the LDC.

Nonmembers can purchase CALLFRIEND materials for research purposes
only. The cost of the CALLFRIEND collection is $600 per language or
per dialect. If you would like to order any of these corpora, please
email your request to ldc@unagi.cis.upenn.edu. If you need additional
information before placing your order, or would like to inquire about
membership in the LDC, please send email or call (215) 898-0464.

Further information about the LDC and its available corpora can be
accessed on the Linguistic Data Consortium WWW Home Page at URL
http://www.ldc.upenn.edu/. Information is also available via ftp
at ftp.cis.upenn.edu under pub/ldc; for ftp access, please use
"anonymous" as your login name, and give your email address when asked
for password.

LDC96S46 CALLFRIEND American English-Non-Southern Dialect
LDC96S47 CALLFRIEND American English-Southern Dialect
LDC96S48 CALLFRIEND Canadian French
LDC96S49 CALLFRIEND Egyptian Arabic
LDC96S50 CALLFRIEND Farsi
LDC96S51 CALLFRIEND German
LDC96S52 CALLFRIEND Hindi
LDC96S53 CALLFRIEND Japanese
LDC96S54 CALLFRIEND Korean
LDC96S55 CALLFRIEND Mandarin Chinese-Mainland Dialect
LDC96S56 CALLFRIEND Mandarin Chinese-Taiwan Dialect
LDC96S57 CALLFRIEND Spanish-Caribbean Dialect
LDC96S58 CALLFRIEND Spanish-Non-Caribbean Dialect
LDC96S59 CALLFRIEND Tamil
LDC96S60 CALLFRIEND Vietnamese

--[3]------------------------------------------------------------------
Date: Fri, 29 Aug 1997 19:00:00 -0400 (EDT)
From: "David L. Gants" <dgants@parallel.park.uga.edu>
Subject: New Chorus

>> From: "Todd J. B. Blayone (media north)" <todd@cyberjunkie.com>

Announcement

A reconstituted Chorus has opened at its new home: the College Writing
Programs, University of California, Berkeley.

Chorus is a rich, WWW-based publication that explores new media in the
arts and humanities. Developed and maintained by an independent committee
of scholars and new-media professionals, it features essays and reviews
related to computer-assisted language learning, textual analysis of the
bible, citation management and electronic research, and information
management. A new "Mixed Reviews" section will give special attention to
electronic publishing and the adaptation of literary and artistic culture
in electronic media. Finally, a writing and composition section is under
development.

Chorus is mirrored by Cycor, Canada and archived by the National Library
of Canada as part of their Electronic Publications Project. We invite your
feedback, especially during this beta-testing phase of development. Please
visit us at:

http://www-writing.berkeley.edu/chorus

Todd J. B. Blayone
Founding Editor, Chorus
http://www-writing.berkeley.edu/chorus
media north, web publishing
http://medianorth.simplenet.com

-------------------------------------------------------------------------
Humanist Discussion Group
Information at <http://www.kcl.ac.uk/humanities/cch/humanist/>
<http://www.princeton.edu/~mccarty/humanist/>
<http://www.kcl.ac.uk/humanities/cch/humanist/>
=========================================================================