8.0454 Penn Treebank (1/59)
Elaine Brennan (EDITORS@BROWNVM.BITNET)
Sun, 9 Apr 1995 23:26:41 EDT
Humanist Discussion Group, Vol. 8, No. 0454. Sunday, 9 Apr 1995.
Date: Fri, 07 Apr 1995 13:15:23 EDT
From: LDC Office <ldc@pine.ling.upenn.edu>
Subject: Penn Treebank, Release 2
Announcing a NEW RELEASE from the
LINGUISTIC DATA CONSORTIUM:
THE PENN TREEBANK PROJECT
Release 2
The Penn Treebank Project Release 2 CDROM features the new Penn
Treebank II bracketing style, which is designed to allow the
extraction of simple predicate/argument structure. Over one million
words of text are provided with this bracketing applied, along with
a complete style manual explaining the bracketing, and new versions
of tools for searching and treating bracketed data.
This CDROM also contains all the annotated text material from the
earlier Treebank Preliminary Release, including the Brown Corpus.
While these materials have not all been converted to the newer
bracketing style, they have been cleaned up to remove problems that
had appeared in the earlier release.
The contents of Treebank Release 2 are as follows:
* 1 million words of 1989 Wall Street Journal material annotated in
Treebank II style.
* A small sample of ATIS-3 material annotated in Treebank II style.
* 300-page style manual for Treebank II bracketing, as well as the
part-of-speech tagging guidelines.
* Tools for processing Treebank data, including a new version of
tgrep (a tree-searching and manipulation package).
* The contents of the previous Treebank CDROM (Version 0.5), with
cleaner versions of the WSJ, Brown Corpus, and ATIS material
(annotated in Treebank I style).
In addition, the Penn Treebank Project will be providing updates,
announcements and a discussion forum for users. A file of updates and
further information available via anonymous ftp from
ftp.cis.upenn.edu, in pub/treebank/doc/update.cd2. This file will
also contain pointers to a gradually expanding body of relatively
technical suggestions on how to extract certain information from the
corpus.
Detailed questions about the corpus may be sent to
treebank@unagi.cis.upenn.edu, while questions and requests for
obtaining Treebank Release 2 should be sent to
ldc@unagi.cis.upenn.edu.