SDS Report

From: Daniel Pitti (dpitti@virginia.edu)
Date: Thu Jan 17 2002 - 11:31:31 EST

  • Next message: Melinda Baumann: "Re: SDS Report"

    All,

    This is not the final version of the section in the Mellon report on the
    deliberations of the policy committee, but does reflect my revision of
    text. It includes the questions we have been considering, but I have
    revised them a bit. I'd like to use the revise list of questions for our
    next meeting, perhaps jumping to the end of the list to deal with
    bibliographic description. Jumping to this, because I think we have already
    covered under identity issues to some extent, and I like to finish it while
    that discussion is still fresh.

    Thanks,
    Daniel

    1. Policy committee report

    The primary charge of the policy committee is to recommend policy for
    libraries collecting digital works. The committee began its deliberations
    with an explicit distinction between traditional and digital collecting,
    based on the widely held assumption that technical differences between
    analog and digital publications require changes in library methods and
    policy in order to fulfill traditional library objectives and responsibilities.

    We first looked at the activities and objectives associated with
    traditional library collecting (discovery, selection, acquisition,
    preservation, description, access, control, and deselecting) and asked if
    any or all of them are relevant in digital collecting, or if there are new
    activities that need to be added. The preliminary conclusion is that all of
    the traditional activities and objectives remain relevant. Indeed, much of
    the methodology for discovery and deselection is only slightly changed
    (e.g., the discovery process should include Internet accessible resources).

    Major differences are to found, however, in the techniques employed to
    administer or manage digital works, their computer representations
    (standards such as XML, JPEG, TIFF, MPEG, and the like), and the file or
    files that embody and fix the works. We have no firm conclusions as of yet,
    but we tentatively believe that selection, acquisition, description,
    access, and control need substantially different policies and methods.

    1.1. Current issues

    We are currently examining the more difficult administrative issues. These
    include physical and bibliographic control, work identification and
    integrity, authenticity, persistence, versioning, and copyright. The
    remainder of this section will discuss these issues and the key questions
    we are asking.

    1.1.1. Control and collecting

    Traditional collecting involves assuming specific obligations towards works
    collected and the responsibility to reliably fulfill these obligations. The
    core obligations are ongoing preservation and access to the works
    collected. The Library will have essentially the same obligation and
    responsibility for digital collections.

    The first and essential requirement of digital collecting is asserting
    physical control over the file or files comprising works. Without physical
    control, the library cannot realistically and responsibly assume any
    obligations, especially obligations to provide ongoing preservation and
    access. Simply linking to resources controlled by others thus does not
    constitute collecting or any of the obligations associated with collecting.
    Such physical control may be delegated to a trusted outside agent.

    1.1.2. Work identification and integrity

    How do we identify works? Identifying works in traditional media has long
    been recognized as an intractable philosophical problem. The library
    literature abounds with articles and books exploring the challenging
    epistemological problems. The library profession, however, is ultimately
    practical, and so the library community identified relatively stable
    characteristics of traditional media on which they established criteria and
    methods for identifying and controlling works.

    First and foremost, traditional media come in discrete physical forms that
    present clear boundaries: books, journals, issues of journals, CDs, and the
    like. Taken together with the title page (or title pages in the case of
    multipart monographs and serials), the physical boundaries of traditional
    publications provide a reasonably recognizable unitary identity. Libraries
    generally create a bibliographic surrogate (or record) for each discrete
    work so identified. Discrete works are generally recognized at the "macro"
    level. Articles, chapters, individual poems, illustrations, and the like,
    contained in the works are not described in the surrogate. The rationale
    behind this "macro" approach is largely economic, as detailed analysis of
    the contents of works is generally beyond the means of the library
    community. To compensate for this approach, libraries frequently subscribe
    to abstracting and indexing services that provide detailed analysis of
    works, and thus complement the bibliographic descriptions provided by
    libraries. While librarians recognize that the approach taken is not
    perfect, it has been proven to be a workable methodology that provides
    acceptable access, description, and control.

    Digital publications have so far proven to be less susceptible to practical
    approach similar to that taken with traditional publications. Digital
    publications do not, with the exception of discretely packaged objects on
    portable storage media (such as CD-ROMS), have explicit physical
    characteristics that can help. A work can be made up of more than one file,
    a file can contain more than one work, or multiple works can straddle
    multiple files. A digital publication may not even have an easily
    identified starting point.

    Identifying the principal work a publication embodies may be insufficient.
    A given digital publication may be a collection of works. As noted above,
    catalogers have traditionally cataloged at the "book-level" or
    "serial-level," without providing detailed analysis of content. The library
    community established practical limits on the depth of analysis of
    traditional media. Digital publications lack the clear boundaries and
    commonplace internal characteristics of traditional media that made
    establishing such limits practical. It may be difficult and perhaps
    impossible to determine, for example, whether a given text is intended to
    be an independent work or simply a dependent part of a larger
    work. Digital publications have yet (and may never) to establish
    predictable and widely adopted conventions that would serve as a foundation
    in which to establish consistent and affordable bibliographic description
    and control.

    In the absence of such digital publication conventions, it will be
    necessary for the library to impose obligations on the creators to supply
    information to assist librarians in the identification of the contents of
    digital publications. In particular, it will be necessary for creators to
    explicitly identify and provide information about the "starting point" of
    digital publications, and information identifying the principal work, and
    if present, subworks represented in the publication.

    1.2. Other issues

    1.2.1. Copyright

    This is a large and complex issue that the committee intends to take up in
    more detail soon. But some of the questions that we are starting to
    formulate include:

    · Is all or part of the digital work explicitly protected by the creator's
    copyright?
    · Is all or part of the digital work explicitly protected by copyrights of
    someone besides the creator?
    · Does the creator have digital rights?
    · Does the creator have exclusive rights?
    · What is the library's responsibilities for tracking and enforcing copyright?

    1.2.2. Authenticity

    When a library collects a book or journal, it is generally most interested
    in the information content, in the words on the pages. Given this interest,
    the book itself is generally not considered of interest, with exceptions
    made for rare and special materials. A book, for example, is considered as
    authentic if the library replaces a soft-cover with a hard cover. A digital
    work's authenticity, however, may be inextricably intertwined with the
    software that indexes and renders it, and the specific aesthetics of the
    rendering. A carefully written stylesheet, for example, can give a very
    specific look and feel to data and can demonstrate scholarly analysis and
    criticism. Changing the specific look and feel may destroy the authenticity
    of the creation. The work as a whole, both its content and the rendering of
    those contents, may have intrinsic value. This leads to several difficult
    issues.

    · Does the work as a whole have intrinsic value? That is, is there value in
    both the data (information content) and the data behaviors (look and feel)?
    · When the content and rendering are both considered to have value, are the
    circumstances under which it would be acceptable only to collect the
    content? And others under which it would not?
    · Or, can the data be collected independently of the data behaviors and
    new, library-determined, behaviors be associated with it?
    · Does the creator, publisher, or library have the right to make this
    determination? Should it be negotiated between all three?

    1.2.3. Persistence

    Maintaining the persistence of the content and behavior of digital works,
    and the persistence of references and links among works (both within and
    outside the collection) presents many technical and policy challenges. The
    committee is operating under the assumption that if content, and associated
    behaviors and relations of digital works are maintained in standard forms,
    then preserving them over an extended time will be manageable. References
    from works within the collection to works outside of the collection (and
    therefore outside the control of the library), and references to works
    within the collection from works outside of the collection, presents a
    particular difficult challenge because of the lack of control of one or the
    other end of a relation, and the lack of a global solution to the problem
    of persistent identifiers and addresses. Among other questions, the
    committee intends to address the following questions:

    · Is the data and/or the data behaviors comprising the work based on open,
    public standard?
    · Are all files comprising the work controlled and transferable by the creator?
    · Are the links between resources comprising the work based on standards?
    Are the links embedded in the data representation? Or are the links
    maintained separately?
    · If links to a work-in-progress exist in other collected works, is the
    library responsible for maintaining versions in order to maintain the
    link's integrity? What about links to works that are not collected and
    therefore not under the library's control?
    · If a work is collected in successive editions and is linked by other
    collected works, is the library responsible for persistent identification
    of these works (or sub-works and sub-addresses) in other collected
    projects? What about other non-collected projects that link to it?
    · If successive editions of a work are collected, are the editions
    interrelated? If successive editions of a work that has sub-works (i.e.,
    editions of sub-works) are collected, are those sub-works interrelated?

    1.2.4. Editions/versions

    Works that are stable and relatively unchanging present many difficult
    administrative challenges. But not all works are stable and relatively
    unchanging. Many digital works are databases undergoing constant revision.
    Large text and image collections are developed over many years, but are
    sufficiently useful during development to be published. Many of the complex
    challenges discussed above and below, are made even more challenging when
    works are undergoing constant or relatively frequent revision Among
    questions to be address by the committee in this area are the following:

    · Is the work complete and stable?
    · Or is the work undergoing constant or frequent revision?
    · If undergoing revision, do users need real-time access?
    · If undergoing revision, will the production methods support controlled
    publication of stable editions or versions?

    1.2.5. Bibliographic, administrative, and structural control

    Digital work control involves control of works (intellectual description,
    access, and rights), of the files embodying works, and of the relations
    between files that are essential for reproducing works on demand. These
    areas of control have been identified as bibliographic, administrative, and
    structural data. Each requires data over and above the content of the work
    itself, so-called metadata.

    · Is the library responsible for providing descriptive cataloging of all
    works collected?
    · Should the creator provide descriptive data? If so, what descriptive
    standards and practices will be required, and how will creators be trained
    to supply this data?
    · Who is responsible for creating, identifying, and maintaining
    administrative data such as file inventories, creation and standards
    information (context of creation), and copyright information?
    · Who is responsible for creating structural data, such as the data needed
    to reproduce a "book" from multiple page image files?

    ----------
    Daniel V. Pitti Project Director
    Institute for Advanced Technology in the Humanities
    Alderman Library University of Virginia Charlottesville, Virginia 22903
    Phone: 434 924-6594 Fax: 434 982-2363 Email: dpitti@Virginia.edu
    http://jefferson.village.virginia.edu
    AREA CODE IS NEW EFFECTIVE JUNE 2001



    This archive was generated by hypermail 2b30 : Thu Jan 17 2002 - 11:31:36 EST