This is not the final version of the section in the Mellon report on the
deliberations of the policy committee, but does reflect my revision of
text. It includes the questions we have been considering, but I have
revised them a bit. I'd like to use the revise list of questions for our
next meeting, perhaps jumping to the end of the list to deal with
bibliographic description. Jumping to this, because I think we have already
covered under identity issues to some extent, and I like to finish it while
that discussion is still fresh.
1. Policy committee report
The primary charge of the policy committee is to recommend policy for
libraries collecting digital works. The committee began its deliberations
with an explicit distinction between traditional and digital collecting,
based on the widely held assumption that technical differences between
analog and digital publications require changes in library methods and
policy in order to fulfill traditional library objectives and responsibilities.
We first looked at the activities and objectives associated with
traditional library collecting (discovery, selection, acquisition,
preservation, description, access, control, and deselecting) and asked if
any or all of them are relevant in digital collecting, or if there are new
activities that need to be added. The preliminary conclusion is that all of
the traditional activities and objectives remain relevant. Indeed, much of
the methodology for discovery and deselection is only slightly changed
(e.g., the discovery process should include Internet accessible resources).
Major differences are to found, however, in the techniques employed to
administer or manage digital works, their computer representations
(standards such as XML, JPEG, TIFF, MPEG, and the like), and the file or
files that embody and fix the works. We have no firm conclusions as of yet,
but we tentatively believe that selection, acquisition, description,
access, and control need substantially different policies and methods.
1.1. Current issues
We are currently examining the more difficult administrative issues. These
include physical and bibliographic control, work identification and
integrity, authenticity, persistence, versioning, and copyright. The
remainder of this section will discuss these issues and the key questions
we are asking.
1.1.1. Control and collecting
Traditional collecting involves assuming specific obligations towards works
collected and the responsibility to reliably fulfill these obligations. The
core obligations are ongoing preservation and access to the works
collected. The Library will have essentially the same obligation and
responsibility for digital collections.
The first and essential requirement of digital collecting is asserting
physical control over the file or files comprising works. Without physical
control, the library cannot realistically and responsibly assume any
obligations, especially obligations to provide ongoing preservation and
access. Simply linking to resources controlled by others thus does not
constitute collecting or any of the obligations associated with collecting.
Such physical control may be delegated to a trusted outside agent.
1.1.2. Work identification and integrity
How do we identify works? Identifying works in traditional media has long
been recognized as an intractable philosophical problem. The library
literature abounds with articles and books exploring the challenging
epistemological problems. The library profession, however, is ultimately
practical, and so the library community identified relatively stable
characteristics of traditional media on which they established criteria and
methods for identifying and controlling works.
First and foremost, traditional media come in discrete physical forms that
present clear boundaries: books, journals, issues of journals, CDs, and the
like. Taken together with the title page (or title pages in the case of
multipart monographs and serials), the physical boundaries of traditional
publications provide a reasonably recognizable unitary identity. Libraries
generally create a bibliographic surrogate (or record) for each discrete
work so identified. Discrete works are generally recognized at the "macro"
level. Articles, chapters, individual poems, illustrations, and the like,
contained in the works are not described in the surrogate. The rationale
behind this "macro" approach is largely economic, as detailed analysis of
the contents of works is generally beyond the means of the library
community. To compensate for this approach, libraries frequently subscribe
to abstracting and indexing services that provide detailed analysis of
works, and thus complement the bibliographic descriptions provided by
libraries. While librarians recognize that the approach taken is not
perfect, it has been proven to be a workable methodology that provides
acceptable access, description, and control.
Digital publications have so far proven to be less susceptible to practical
approach similar to that taken with traditional publications. Digital
publications do not, with the exception of discretely packaged objects on
portable storage media (such as CD-ROMS), have explicit physical
characteristics that can help. A work can be made up of more than one file,
a file can contain more than one work, or multiple works can straddle
multiple files. A digital publication may not even have an easily
identified starting point.
Identifying the principal work a publication embodies may be insufficient.
A given digital publication may be a collection of works. As noted above,
catalogers have traditionally cataloged at the "book-level" or
"serial-level," without providing detailed analysis of content. The library
community established practical limits on the depth of analysis of
traditional media. Digital publications lack the clear boundaries and
commonplace internal characteristics of traditional media that made
establishing such limits practical. It may be difficult and perhaps
impossible to determine, for example, whether a given text is intended to
be an independent work or simply a dependent part of a larger
work. Digital publications have yet (and may never) to establish
predictable and widely adopted conventions that would serve as a foundation
in which to establish consistent and affordable bibliographic description
In the absence of such digital publication conventions, it will be
necessary for the library to impose obligations on the creators to supply
information to assist librarians in the identification of the contents of
digital publications. In particular, it will be necessary for creators to
explicitly identify and provide information about the "starting point" of
digital publications, and information identifying the principal work, and
if present, subworks represented in the publication.
1.2. Other issues
This is a large and complex issue that the committee intends to take up in
more detail soon. But some of the questions that we are starting to
· Is all or part of the digital work explicitly protected by the creator's
· Is all or part of the digital work explicitly protected by copyrights of
someone besides the creator?
· Does the creator have digital rights?
· Does the creator have exclusive rights?
· What is the library's responsibilities for tracking and enforcing copyright?
When a library collects a book or journal, it is generally most interested
in the information content, in the words on the pages. Given this interest,
the book itself is generally not considered of interest, with exceptions
made for rare and special materials. A book, for example, is considered as
authentic if the library replaces a soft-cover with a hard cover. A digital
work's authenticity, however, may be inextricably intertwined with the
software that indexes and renders it, and the specific aesthetics of the
rendering. A carefully written stylesheet, for example, can give a very
specific look and feel to data and can demonstrate scholarly analysis and
criticism. Changing the specific look and feel may destroy the authenticity
of the creation. The work as a whole, both its content and the rendering of
those contents, may have intrinsic value. This leads to several difficult
· Does the work as a whole have intrinsic value? That is, is there value in
both the data (information content) and the data behaviors (look and feel)?
· When the content and rendering are both considered to have value, are the
circumstances under which it would be acceptable only to collect the
content? And others under which it would not?
· Or, can the data be collected independently of the data behaviors and
new, library-determined, behaviors be associated with it?
· Does the creator, publisher, or library have the right to make this
determination? Should it be negotiated between all three?
Maintaining the persistence of the content and behavior of digital works,
and the persistence of references and links among works (both within and
outside the collection) presents many technical and policy challenges. The
committee is operating under the assumption that if content, and associated
behaviors and relations of digital works are maintained in standard forms,
then preserving them over an extended time will be manageable. References
from works within the collection to works outside of the collection (and
therefore outside the control of the library), and references to works
within the collection from works outside of the collection, presents a
particular difficult challenge because of the lack of control of one or the
other end of a relation, and the lack of a global solution to the problem
of persistent identifiers and addresses. Among other questions, the
committee intends to address the following questions:
· Is the data and/or the data behaviors comprising the work based on open,
· Are all files comprising the work controlled and transferable by the creator?
· Are the links between resources comprising the work based on standards?
Are the links embedded in the data representation? Or are the links
· If links to a work-in-progress exist in other collected works, is the
library responsible for maintaining versions in order to maintain the
link's integrity? What about links to works that are not collected and
therefore not under the library's control?
· If a work is collected in successive editions and is linked by other
collected works, is the library responsible for persistent identification
of these works (or sub-works and sub-addresses) in other collected
projects? What about other non-collected projects that link to it?
· If successive editions of a work are collected, are the editions
interrelated? If successive editions of a work that has sub-works (i.e.,
editions of sub-works) are collected, are those sub-works interrelated?
Works that are stable and relatively unchanging present many difficult
administrative challenges. But not all works are stable and relatively
unchanging. Many digital works are databases undergoing constant revision.
Large text and image collections are developed over many years, but are
sufficiently useful during development to be published. Many of the complex
challenges discussed above and below, are made even more challenging when
works are undergoing constant or relatively frequent revision Among
questions to be address by the committee in this area are the following:
· Is the work complete and stable?
· Or is the work undergoing constant or frequent revision?
· If undergoing revision, do users need real-time access?
· If undergoing revision, will the production methods support controlled
publication of stable editions or versions?
1.2.5. Bibliographic, administrative, and structural control
Digital work control involves control of works (intellectual description,
access, and rights), of the files embodying works, and of the relations
between files that are essential for reproducing works on demand. These
areas of control have been identified as bibliographic, administrative, and
structural data. Each requires data over and above the content of the work
itself, so-called metadata.
· Is the library responsible for providing descriptive cataloging of all
· Should the creator provide descriptive data? If so, what descriptive
standards and practices will be required, and how will creators be trained
to supply this data?
· Who is responsible for creating, identifying, and maintaining
administrative data such as file inventories, creation and standards
information (context of creation), and copyright information?
· Who is responsible for creating structural data, such as the data needed
to reproduce a "book" from multiple page image files?
Daniel V. Pitti Project Director
Institute for Advanced Technology in the Humanities
Alderman Library University of Virginia Charlottesville, Virginia 22903
Phone: 434 924-6594 Fax: 434 982-2363 Email: dpitti@Virginia.edu
AREA CODE IS NEW EFFECTIVE JUNE 2001
This archive was generated by hypermail 2b30 : Thu Jan 17 2002 - 11:31:36 EST