3.1305 Summary of Rutgers-Princeton Conference (290)

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Thu, 19 Apr 90 18:44:55 EDT

Humanist Discussion Group, Vol. 3, No. 1305. Thursday, 19 Apr 1990.

Date: Wed, 18 Apr 90 23:44:41 -0400
From: Robert Hollander <bobh@phoenix.Princeton.EDU>
Subject: Chauncey Conference Summary

SUMMARY
Conference on A National Center
for
Machine-Readable Texts in the Humanities
Chauncey Center, Princeton, N.J.
March 14-16, 1990



On March 14-16, 1990 approximately 50 academicians, librarians,
publishers and computing professionals gathered at the Chauncey Center in
Princeton, N.J. to attend an invitational conference on a National Center
for Machine-Readable Texts in the Humanities. The conference was
organized by Marianne Gaunt, Rutgers University, and Robert Hollander,
Princeton University, as part of a one year planning grant funded by the
National Endowment for the Humanities awarded to Rutgers University in
July 1989. This grant proposed that Rutgers University and Princeton
University jointly collaborate on the development of model for such a
national center. The purpose of the conference was to provide experts in
the field a forum for discussion which would lend guidance to the
continued planning efforts of the project staff. A working draft document
was distributed in advance to conference participants and served as an
outline for discussion at the conference, which included some 45
participants.


Participants gathered the evening of March 14 for a brief overview of
the conference agenda and a review of planning that has taken place thus
far. Joanne Euster, Vice President for Information Services and
University Librarian welcomed the participants on behalf of Rutgers
University. Provost Paul Banacerraf of Princeton University made
welcoming remarks on Thursday morning.

The concept of a National Center for Machine-Readable Texts in the
Humanities has been discussed in many forums for some time. However, it
wasn't until three years ago that Rutgers and Princeton decided that
similar interests in humanities computing and experience gained through
national humanities computing projects might provide the impetus for a
combined effort to bring the concept to the fore again. With expressed
interest from the state in the form of a grant from the New Jersey
Committee for the Humanities, the 2 major research institutions in the
state proposed a planning grant to NEH. Funding from NEH and from the
Mellon Foundation paid for the conference and for continuing cataloging
activities.

During these last six months, project staff have conducted site visits
to the Inter-University Consortium for Political and Social Research, the
Oxford Text Archive, and the University of Pisa, to obtain first hand
information on text production and archiving, and to establish
communication and cooperative links in the formation of a new center.
Other site visits need to be completed. In addition, project staff have
followed discussions on electronic bulletin boards and in professional
journals, contacted individuals and institutions with major projects, to
gather information which could be used in planning the center. Of
concern to the project staff is the development of a mechanism by which
the Center, once established, would be self-sustaining after an initial
start-up period. Overall goals include the preparation of another
document which presents the final concept, with associated costs, which
could go to funding agencies in early Fall 1990. The working document
distributed describes the parameters of the proposed center based on data
gathering thus far. It was arranged by broad topic/function and the
agenda for the conference followed that outline. The purpose of the
discussions for the two days was to critique the draft and provide
guidance as we continue to define the Center's mission and operations.

Although separate break-out rooms were provided, it was decided to begin
as a discussion of the whole until such time as separate groups would be
needed. We did not, however, break into smaller groups. Over the
course of the two days of discussion a wide divergence of opinion was
expressed on many of the issues. Several discussion topics were
inextricably linked to other topics so that from time to time discussion
moved in many directions. Candid and thoughtful opinions were expressed
on all topics for which the project staff are extremely greatful. While
at no time did project staff hear any indication that a National Center
should not exist, opinions split on the broad range of activities the
Center could undertake.

The broadest role for the Center was described on the last day of the
conference. A need was expressed for the coordination of text-based
humanities computing activities in North America (including Canada and
Mexico) so that a Center could act as the North American contact and
facilitator for international humanities computing relations, especially
as they relate to the exchange of data, telecommunications links,
standards, resources, etc. The Center could fill an educational and
clearinghouse role by acting as a resource for information on projects,
data production, software, scanning, expertise, training, advice and
consultation. This was not envisioned to take the place of the local
institution in the training and educational role. The Center would fill
its networking mission by providing an inventory and catalog of existing
machine-readable texts and making this available in the most cost
effective and accessible manner. It would also collect texts and provide
access to them. It would promote encoding standards and be an advocate
for standardization with individuals and societies. While these functions
are laudable and undoubtedly useful, project staff need to consider the
feasibility of achieving success in filling these roles, as well as the
costs involved in doing so.

Inventory and Cataloging Function:

All participants agreed that it is essential to maintain an inventory of
existing machine-readable texts and to make the catalog records
available in an easily accessible manner. Questions related to this
topic included: does LC MARC (Library of Congress Machine-Readable
Cataloging Record) provide sufficient information for a practitioner to
identify the text he/she needs; is there a place for a "quality"
statement in the record; can the record be searched and retrieved by a
variety of useful elements; can the record include information on
retrieval software; are encoding features included in the record; can
new editions of the same text be recorded; how do you get inventory
information; how many texts exist; cataloging is expensive, will you
catalog everything; what about individuals not on RLIN, how will they
get inventory information?

Gaunt reviewed the inventory and cataloging process and addressed the
questions raised. The LC MARC record has a sufficient number of fields
to include all information that is of potential use to a researcher.
However, many designated fields do not exist for specific information,
such as encoding details, even though the information will be recorded
in the record. This means that the searching/retrieval software may not
be able to locate an item by certain search terms. Catalog records by
their nature do not include a "quality" statement. That must be
ascertained by the individual using the item. However, it may be
apparent in the record in notes which indicate what the compiler has or
has not done to the text. As for editions, we can add new editions as
compilers report them. A new edition is a substantial change from the
original edition. While cataloging is expensive, the project has not
been overwhelmed by inventory information. Records which have the most
complete information will be cataloged first. Others with incomplete
information will be cataloged in brief as project staff contact the
compiler for more information. Our intention from the beginning was to
inventory all that we could, and let the user make the judgment of what
was useful, and to use existing standards rather than create our own
formats.

Gathering inventory information is a continual problem. Individuals
with fewer that ten texts will usually respond to a survey instrument,
provided the individual can be identified. Centers and projects require
special attention. The Mellon Foundation has provided a grant to catalog
the Oxford Text Archive's holdings. Rutgers will be creating records on
the RLIN database. This is one method for dealing with a large repository.
Mike Neuman, Georgetown University, has compiled a list of
projects/centers which he has shared with project staff. This is being
used to conduct follow-up surveys to ascertain details on the individual
text level. Antonio Zampolli, University of Pisa, is working on a major
survey to be sent to 4000-5000 individuals involved in humanities
computing. The results will be shared with project staff for inventory
information. He also thinks 80% of European texts are in 5-6 large
centers. Gaunt thinks that there is more in European centers/projects than
in the US; also, they are easier to identify. The inventory process will
also work with professional societies, as many of these are polling their
members for just such information now.

Dissemination of the inventory is initially on RLIN (the Research
Libraries Information Network). RLIN is available on the Internet,
through individual accounts, through RLG libraries. Tapes are also
given to OCLC, another major online network. The database will be
searched by project staff for any interested individuals. Disks can
also be made available. Dial-in access to the Rutgers catalog will be
available soon. This is another option. While everyone is not on
Bitnet, it could be used as another point of access.


Data Collection/Archiving:

The original proposal for the center included an archival and
dissemination function for texts which would otherwise become available.
The center would also accept texts for deposit. It would work with
publishers to determine if their tapes could be archived and disseminated.
Issues raised in this discussion included: what constitutes a humanities
text; is there a measure of quality in accepting a text; who determines
quality; is there a need to collect texts when online access may be
available?

It was suggested that our definitions of a humanities text remain broad
and that as the center gains experience with usage the definitions may
emerge. If we are polling the humanities computing community, one could
assume that we have placed some parameters on the materials which will
be used by the same. Accept what is deposited and formulate more
specific guidelines as experience is gained. The center should not be
judgmental of texts, but report impartially on the materials it archives.
It was suggested that the funding agencies require that all projects
resulting in a machine-readable text deposit it with the center. While
online access may be available for many texts, it is not available for
all. There is certainly no need to archive texts which are available
from the compiler directly or online. There may be ways to work with
publishers for the deposit of texts with the center and their
availability for individual research.


Compilation of Texts/Copyright Issues:

These two issues relate to each other and can be treated independently
as well. There was considerable debate regarding the need for the Center
to compile texts through scanning as well as copyright issues concerning
the dissemination of texts in the archive. Issues raised include: will
the center undertake the verification of texts; can the center employ
experts in all areas for text production; can the Center scan a work
under copyright; is there a need to scan at the Center when OCR
equipment is readily available?

While copyright issues are clearly a problem, Kahin pointed out that
there are certain benefits under the copyright laws of the Center acting
as a library. It does not mitigate all problems but provides a more
liberal environment for scholarly use of materials. OCR equipment is
more readily available and the Center may not wish to undertake scanning
by contract, but the center staff should be knowledgeable of the
technical details and also be able to refer users to other appropriate
sources where the specific expertise (language, discipline, scripts,
etc.) is available.


Organizational Issues/Staffing/Budget:

Specific details of staffing and organization were not to be determined
at this stage in the planning process, however, general information
concerning the division of responsibilities between Princeton and
Rutgers were outlined in the draft. Gaunt explained that the staffing
and budget were weak sections of the draft because the planning had not
progressed to that level. The Advisory Board and its functions were
delineated. Issues raised included: priorities for Princeton and Rutgers
staff; costs too low to support functions; staffing levels too low to
support activities; separate executive functions from policy functions.

Gaunt suggested that the staffing and budget should be determined
following agreement on the activities of the center. Consideration would
be given to more realistic figures, since these appeared to be low by all
estimates. Governing Board and Advisory Board functions would be
separated. Inclusion of members of the publishing industry, especially
Society for Scholarly Publishing and American Association of University
Presses. Representatives from Rutgers and Princeton would not be present
on the policy making board, nor would the Center's Director chair the
Board. It was clear that Rutgers and Princeton faculty/staff would not
receive preferential treatment if the Center wished to remain a North
American Center of activities. It is important that this not be seen as a
local project.


Other:

The remaining sections to be discussed following the outline included
technical issues related to computing resources and facilities. Gaunt
suggested that these issues not be discussed in the time remaining as
they relate to specifics of the final operation. Instead, the time
should be spent on reviewing the role of the center, discussion of
issues which require refinement and other items which have not been
covered. Topics for discussion included: what's the basis for the
inventory; how would the consortial approach work; what happens next?

It was generally agreed that the inventory base should be broad to
elicit as much information as possible. The areas of humanities text
information include: spoken texts, textual materials (individual texts,
collections, corpora), lexical data (machine-readable dictionaries,
lexical databases).

The consortial approach using the ICPSR model was determined to be a
viable option for a self-sustaining financial base. Gaunt explained
that a pay as you go operation would hardly keep the inventory/cataloging
operation going. It is difficult to determine the use of the Center
without experience. Membership in the center which would guarantee
access to all materials provided by the center plus a role in the
direction of the Center's activities seemed a good approach. Many
libraries would consider this an opportunity to provide information
resources to their user community. Many, like Rutgers, now pay the
ICPSR membership on their campus. Most agreed that they needed a
product to sell in order to support the Center financially. Although
the ICPSR model is based on the social sciences, which are organized
differently than the humanities, it is not the only model. RLG and the
Center for Research Libraries are consortial models based on the
provision of goods and services for the group which are beyond the
finances of any individual institution. Support on campus by individual
faculty members was mentioned as a leverage for the institutional
support. Membership would be open to all institutions with a scaled
membership fee based on size of the institution.

Project staff will now meet to evaluate the comments and suggestions
received during this conference and prepare a revised proposal. We will
be doing this in the immediate future. Minutes of this conference were
distributed to all participants for review. Summary minutes will go on
the Listserver and to various publications. Many thanks to all for their
time and participation. Any additional comments may be sent to Gaunt or
Hollander at any time.