6.0447 Workshop on Very Large Corpora (June 93) (1/65)

Elaine Brennan (EDITORS@BROWNVM.BITNET)
Thu, 21 Jan 1993 16:01:00 EST

Humanist Discussion Group, Vol. 6, No. 0447. Thursday, 21 Jan 1993.

Date: Mon, 18 Jan 93 09:40:27 +0100
From: ide@grtc.cnrs-mrs.fr (Nancy Ide)
Subject: WORKSHOP ON VERY LARGE CORPORA

From: yarowsky@unagi.cis.upenn.edu (David Yarowsky)



WORKSHOP ON VERY LARGE CORPORA:
ACADEMIC AND INDUSTRIAL PERSPECTIVES

Call for Papers

WHEN: Tuesday, June 22, 1993 (just before ACL-93)
WHERE: Ohio State University

Sponsored by the Association for Computational Linguistics (ACL), Chemical
Abstracts, Mead Data Central (MDC), Online Computer Library Center (OCLC)

Corpus linguistics is a hot topic, and for good reason. Text is more
available than ever before. And, consequently, it is easier to use
corpus data more effectively than it was in the 1950s, the last time
that empiricism was in fashion. All of this data provides a great
opportunity, as evidenced by all of the recent activity in Europe,
Asia and America.

How large is ``large''? Large can mean anything from about 10^4 words
to 10^9 words. This workshop will bring together a range of people
working at a range of different points along this scale. We expect to
hear from industrialists who routinely deliver products based on tens
of billions of words of text, and from academics who will tell us
about recent advances in text analysis. The discussion will hopefully
push the academics to think about even larger corpora, and the
industrialists to think about somewhat more ambitious analysis
techniques.

Authors should submit three copies of a full-length paper (5-10 pages)
to the program chair by April 1, 1993. Paper submissions are strongly
preferred over electronic submissions. Notifications of acceptance or
rejection will be sent out by May 1, 1993. Relevant topics include
(but are not limited to)

Text Analysis Techniques:
- ``robust'' parsing
- part of speech tagging
- sense tagging
- identification of phrases
- collocation
- morphology
- discourse structure

Applications:
- Information Retrieval (IR)
- Recognition: Speech, OCR, handwriting, etc.
- Spelling Correction
- Translation
- Lexicography

Program Chair:
Kenneth Ward Church
AT&T Bell Laboratories, 2b422
600 Mountain Ave
Murray Hill, NJ 07974
USA
tel: 908-582-5325
fax: 908-582-7550
email: kwc@research.att.com