7.0033 Text Software Initiative (1/110)

Elaine Brennan (EDITORS@BROWNVM.BITNET)
Tue, 1 Jun 1993 19:39:36 EDT

Humanist Discussion Group, Vol. 7, No. 0033. Tuesday, 1 Jun 1993.

Date: Fri, 28 May 93 09:57:25 +0200
From: ide@grtc.cnrs-mrs.fr (Nancy Ide)
Subject: For publication: Text Software Initiative



The Text Software Initiative
----------------------------
An international effort to promote
the development and use of free text software


The widespread availability of large amounts of electronic text and
linguistic data in recent years has dramatically increased the need
for generally available, flexible text software. Commercial software
for text analysis and manipulation covers only a fraction of
research needs, and it is often expensive and hard to adapt or
extend to fit a particular research problem. Software developed by
individual researchers and labs is often experimental and hard to
get, hard to install, under-documented, and sometimes unreliable.
Above all, most of this software is incompatible.

As a result, it is not at all uncommon for researchers to develop
tailor-made systems that replicate much of the functionality of
other systems and in turn create programs that cannot be re-used by
others, and so on in an endless software waste cycle. The
reusability of data is a much-discussed topic these days; similarly,
we need "software reusability", to avoid the re-inventing of the
wheel characteristic of much language-analytic research in the past
three decades.

The Text Software Initiative (TSI) is committed to solving this
problem by working to

o establish and publish guidelines and standards for the
development of text software;

o promulgate and coordinate the development of free TSI-
conformant software.

The scope of the TSI covers all areas of analysis and manipulation
of all kinds of texts (written or spoken, mono-lingual or multi-
lingual parallel, etc.), including markup of physical and logical
text features, linguistic analysis and annotation, browsing and
retrieval, statistical analysis, and other text-related tasks in
research in computational linguistics, humanities computing,
terminology and lexicography, speech, etc.

The TSI software development effort is distributed, that is, anyone
can contribute on a voluntary basis. This means that tools will be
developed according to the contributors' priorities; however, the
TSI is ultimately working towards the development of a comprehensive
text handling system.

To ensure software compatibility and reusability and enable
distributed development, the TSI is committed to:

o design and publish program interface conventions
o determine and publish guidelines for programming style and
documentation
o stress separation of code and linguistic data to ensure
(natural) language independence
o emphasize breaking high-level text-handling tasks into
more primitive, reusable functions
o provide a library of primitive text-handling tools
o maintain a task list and set priorities
o circulate information such as progress reports, revisions to
the standard, availability of new software, etc.
o set up a mechanism for testing and evaluation
o maintain mailing lists for comments, bug reports,
suggestions, etc.

The TSI works in relation with other standardization groups, notably
the Text Encoding Initiative and the Expert Advisory Group on
Language Engineering Standards (EAGLES).

All TSI software is free in the sense defined in the Free Software
Foundation's General Public License, which guarantees the freedom to
copy, redistribute, and modify software, and protects this freedom
by requiring those who pass on the software to include the rights to
further redistribute it and see and change the code.

Distribution of TSI software is accomplished in relation with other
dissemination groups such as the Free Software Foundation, RELATOR,
and the Linguistic Data Consortium. The TSI does not provide
technical support, but organizes a network of voluntary consultants
and support people.


PROJECT COORDINATORS

Nancy Ide, Vassar College, Poughkeepsie, New York, USA
ide@cs.vassar.edu

Jean Veronis, Universite de Provence/CNRS, Aix-en-Provence, France
veronis@grtc.cnrs-mrs.fr


GENERAL ADVISORY BOARD

Susan Armstrong, ISSCO, Geneva
Mark Liberman, Linguistic Data Consortium, University of Pennsylvania
Makoto Nagao, Kyoto University
Mark Olsen, ARTFL Project, University of Chicago
Richard Stallman, Free Software Foundation, Cambridge, Massachusetts
Donald Walker, Bellcore, Morristown New Jersey
Antonio Zampolli, Istituto di Linguistica Computazionale, Pisa


The TSI also includes a TECHNICAL ADVISORY BOARD of software
developers.