June 11 Meeting--Reading

From: Daniel Pitti (dpitti@virginia.edu)
Date: Thu May 16 2002 - 14:55:49 EDT

  • Next message: Melinda Baumann: "Re: June 11 Meeting--Reading"

    All,

    After reading a lot of the latest literature on digital repositories, I
    have made some progress in getting us moving once again on developing draft
    digital library policies.

    In preparation for our meeting in June, I would like all of you to read
    carefully Attributes of a Trusted Digital Repository (ATDR), a RLG-OCLC
    report: http://www.rlg.org/longterm/attributes01.pdf

    This joint RLG/OCLC report is inspired by the "Reference Model for an Open
    Archival Information System (OAIS)," [not to be confused with OAI] a
    framework for digital repositories developed by the space community (NASA
    and others). Though OAIS was developed by the space community, it has been
    well received by the archive, library, and museum communities, and with
    support from them, is nearing approval as an ISO standard. OAIS is very
    dense, but if you a feeling ambitious or suffering from insomnia, you will
    find the latest draft (July 2001) at
    http://www.ccsds.org/documents/pdf/CCSDS-650.0-R-2.pdf

    The ATDR uses OAIS terminology (which is quickly becoming the standard
    terminology for discussing digital repositories). It defines the OAIS
    terms, at least minimally, and so you can read it without reading OAIS.

    In addition to the readings, I have also drafted an outline organized
    around the statements of Repository Responsibilities in the ATDR report, in
    particular the "responsibility relies on" lists given under each major
    responsibility. I have tried to organize these around the three principle
    areas of responsibility and activity outlined in OAIS, which are
    submission, archiving, and dissemination. In front of these is simply a
    list of "policy areas" given in ATDR (I). The policy areas overlap with the
    submission, archiving, dissemination categories (II-IV). I nevertheless
    included this section because I wanted to make sure that we did not
    overlook anything in the "policy areas" that might not come up under the
    lists.

    My intention is that we go over each responsibility and what the
    responsibility relies on, and then begin to develop draft policies for
    each. As you can see, I am trying to approach this systematically. I think
    the first thing we will notice is that while the categories in the ATDR
    report are useful, they are insufficiently detailed. And so developing them
    will be the first order of business.

    After the "Organization of Digital Collecting Policy" below is a list of
    working assumptions that seem to me to provide a context and some guidance
    in our deliberations. These are, of course, open for debate, and, in fact,
    should be debated. And added to as well.

    That's all for now,
    Daniel

    Organization of Digital Collecting Policy

    Trusted Digital Repository
    I. Policy

    Follows documented policies and procedures that ensure the information is
    preserved against all reasonable contingencies and enables the information
    to be disseminated as authenticated copies of the original or as traceable
    to the original.
    A. Policies for collections development (e.g., selection and
    retention) that link to technical procedures about how and at what level
    materials are preserved and how access is provided both short and long term.

    B. Policies for access control to ensure all parties are protected,
    including authentication of users and disseminated materials.

    C. Policies for storage of materials, including service-level
    agreements with external suppliers.

    D. Policies that define the repository's designated community and
    describe its knowledge base.

    E. A rigorous system for updating policies and procedure in accordance
    with changes in technology and in the repository's designated community.

    F. Explicit links between these policies and procedures, allowing for
    easy application across heterogeneous collections.

    II. SIP/submission information package/intake or receipt

    A. Works closely with the repository's designated community to
    advocate the use of good and (where possible) standard practice in the
    creation of digital resources; this may include an outreach program for
    potential depositors.

    B. Negotiates for and accepts appropriate information from information
    producers and rights holders.
    a. Well-documented and agreed-on policies about what is selected for
    deposit, including, where appropriate, specific required formats.

    b. Effective procedures and workflows for obtaining copyright
    clearance for both short-term and immediate access, as necessary, and
    preservation.

    c. A comprehensive metadata specification and agreed-on standards for
    its implementation. This is critical for federated or networked
    repositories and includes standards for the provision of rights metadata
    from content providers and for representing technical metadata.

    d. Procedures and systems for ensuring the authenticity of submitted
    materials.

    e. Initial assessment of the completeness of the submission.

    f. Effective record keeping of all transactions, including ongoing
    relationships, with content providers.

    III. AIP/archive information package/care and feeding
    Obtains sufficient control of the information provided to support long-term
    preservation:

    A. Detailed analysis of an object or class of objects to assess its
    significant properties. Analysis should be automated as much as possible
    and informed by the collections management policy, rights clearances, the
    designated community's knowledge base, and policy restrictions on specific
    file formats.

    B. Verification and creation of bibliographic and technical metadata
    and documentation to support the long-term preservation of the digital
    object according to its significant properties and underlying technology or
    abstract form, with monitoring and updating of metadata as necessary to
    reflect changes in technology or access arrangements. This involves
    understanding how strategies for continuing access, such as migration and
    emulation, influence the creation of preservation metadata.

    C. A robust system of unique identification.

    D. A reliable method for encapsulating the digital object with its
    metadata in the archive.

    E. A reliable archival storage facility, including an ongoing program
    of media refreshment; a program of monitoring media; geographically
    distributed backup systems; routine authenticity and integrity checking of
    the stored object; disaster preparedness; response, and recovery policies
    and procedures; and security.

    IV. DIP/dissemination information package/delivery
    A. Determines, either by itself of with others, the users that make up
    its designated community, which should be able to understand the
    information provided. Analysis and documentation of the repository's
    designated community; for federated or cooperating repositories, a shared
    understanding of the designated community.

    B. Ensures that the information to be preserved is "independently
    understandable" to the designated community; that is, that the community
    can understand the information without needing the assistance of experts.
    a. Well-maintained and documented technical metadata that is kept
    aligned with the knowledge base of the designated community and with
    changing technologies.

    b. A "technology watch" to manage the risk as technology evolves and
    to provide continuing access and updated methods of access as necessary,
    such as new migrations or emulators.

    C. Makes the preserved information available to the designated
    community.
    a. A system for discovery of resources.

    b. Appropriate mechanisms for authentication of the digital materials.

    c. Access control mechanisms in accordance with licenses and laws, and
    an "access rights watch."

    d. Mechanisms for managing electronic commerce. User support programs.

    ---------------------------------------------------------------------
    ---------------------------------------------------------------------

    Assumption: (from TDR/RLG/OCLC)
    Preservation reqires active management that begins at creation ... [p.18]
    ----------------------------
    Assumption:

    Digital collecting or long-term preservation and access of digital
    resources (hereafter referred to as (digital preservation and access: DPA)
    is and will be a responsibility shared with other respositories (libraries,
    archives, museums, and related non-profit and for-profit organizations and
    institutions. This assumption is based on three interrelated assumptions:

             1) the long-term preservation and access of digital resources will
    be expensive, requiring that the burden of remembering the digital cultural
    artifacts deemed worth remembering be shared. No one repository will be
    able to effectively remember all worth remembering.

             2) the authenticity and reliability of the "remembered" (the
    accuracy of our memory) and our judgement with respect to what is to be
    remembered will necessary be subjected to evaluation. Such evaluation will
    necessarily be conducted by an authoritative body or bodies that arise from
    the cultural heritage repository communities, and, while users will rely on
    in large part on the evaluation of the authoritative bodies, users will
    also, ultimately, be the arbiters of the quality of our memory, and the
    extent to which we can be trusted.

             3) remembering, especially shared remembering, will necessarily
    require a large number of hardware, software, communication, and
    intellectual and procedural standards. In that standards are necessarily
    the product of communities sharing common interests and objectives, digital
    collecting will necessarily involve participating in the development of
    standards and mastering them.

    Assumption:

    In order to take responsibility for DPA, the repository must "control" that
    which is collected. In other words, the repository must have control over
    the files (both content and, if necessary, software) in order to be able to
    manange the DPA. Therefore access to digital content that is licensed, or
    licensed access software, cannot be "collected." As a long-term strategy,
    the respository needs to work with other respositories and with licensed
    content provider on a strategy for the development of "DPA-friendly"
    content, and for arrangments for transfer of control of such content to a
    trusted respository. (See e-journal Mellon project at
    www.clir.org/diglib/preserve/ejp.htm

    Assumption:

    There are no existing, proven methods for DPA. There several competing
    theoretical models that are being tested. No one of these may emerge as THE
    method, and a combination of methods may well emerge, with different
    production methods, technology and standards (or lack thereof), and
    publication content and functional objectives and different known and
    anticipated user requirements being taken into account.

    Assumption:

    The mutability of the technology, the growing interdependency of the
    various participants in scholarly communication (creators,
    producers-publishers, repositories, and users) and the lack of an effective
    political infrastructure to promote and develop cooperation and
    collaboration among them leads to economic uncertainty, but also the need
    to develop policy that reflects both what is known and understand, and what
    is uncertain and changing.

    Assumption:

    The Reference Model for an Open Archive Information System (OAIS),
    originally developed by the space research community, has gained wide
    international acceptance as a "framework" for DPA. OAIS is currently being
    considered by the International Standards Organization, largely at the
    urging of the international archive and library communities. Virginia
    policy, as a member of the international library community, will work
    within the broad framework of OAIS, and will work within and participate in
    the ongoing international application of OAIS to the cultural heritage
    repository communities. At the level of respository administration,
    Virginia needs to in particular to follow the OCLC/RLG report "Attributes
    of a Trusted Digital Repository."

    Assumption:

    Inspired in part by OAIS, there are several metadata initiatives which need
    to be followed. Some of these initiatives deal only with semantics, but
    others with both semantics and syntax. In the semantic category, particular
    attention needs to be paid to two reports by the OCLC/RLG Working Group on
    Preservation Metadata, "A Recommendation for Preservation Description
    Information," and "A Recommendation for Content Information." These two
    metadata initiatives are addressing essential OAIS requirements.

    Addressing descriptive data: Metadata Object Description Schema (MODS), an
    initiative led by the Library of Congress.

    METS ...

    Assumption:

    The current DL literature reflects two implicit (and sometimes almost
    explicit) assumptions: 1) for large scale collections, digital publications
    collected will be relatively simple (or discrete or close to it: one file,
    or at most only a "few" files), and either created in or migrated to a hand
    full of representations (or formats). It is assumed that large, complex
    publications, with many interrelations between objects and/or or many
    signficant functional properties will be too expensive for
    archives/libraries to collect. It is assumed that the complexity of
    collecting the complex will be best addressed by emulation. SDS does not
    share this assumption. SDS, with its emphasis on behaviors, is "banking" on
    declarative standards (such as XSL and XQuery) as making the replication of
    behaviors over time affordable. This will need (humble) justification and
    argument.

    -----------



    This archive was generated by hypermail 2b30 : Thu May 16 2002 - 14:56:43 EDT