All,
After reading a lot of the latest literature on digital repositories, I
have made some progress in getting us moving once again on developing draft
digital library policies.
In preparation for our meeting in June, I would like all of you to read
carefully Attributes of a Trusted Digital Repository (ATDR), a RLG-OCLC
report: http://www.rlg.org/longterm/attributes01.pdf
This joint RLG/OCLC report is inspired by the "Reference Model for an Open
Archival Information System (OAIS)," [not to be confused with OAI] a
framework for digital repositories developed by the space community (NASA
and others). Though OAIS was developed by the space community, it has been
well received by the archive, library, and museum communities, and with
support from them, is nearing approval as an ISO standard. OAIS is very
dense, but if you a feeling ambitious or suffering from insomnia, you will
find the latest draft (July 2001) at
http://www.ccsds.org/documents/pdf/CCSDS-650.0-R-2.pdf
The ATDR uses OAIS terminology (which is quickly becoming the standard
terminology for discussing digital repositories). It defines the OAIS
terms, at least minimally, and so you can read it without reading OAIS.
In addition to the readings, I have also drafted an outline organized
around the statements of Repository Responsibilities in the ATDR report, in
particular the "responsibility relies on" lists given under each major
responsibility. I have tried to organize these around the three principle
areas of responsibility and activity outlined in OAIS, which are
submission, archiving, and dissemination. In front of these is simply a
list of "policy areas" given in ATDR (I). The policy areas overlap with the
submission, archiving, dissemination categories (II-IV). I nevertheless
included this section because I wanted to make sure that we did not
overlook anything in the "policy areas" that might not come up under the
lists.
My intention is that we go over each responsibility and what the
responsibility relies on, and then begin to develop draft policies for
each. As you can see, I am trying to approach this systematically. I think
the first thing we will notice is that while the categories in the ATDR
report are useful, they are insufficiently detailed. And so developing them
will be the first order of business.
After the "Organization of Digital Collecting Policy" below is a list of
working assumptions that seem to me to provide a context and some guidance
in our deliberations. These are, of course, open for debate, and, in fact,
should be debated. And added to as well.
That's all for now,
Daniel
Organization of Digital Collecting Policy
Trusted Digital Repository
I. Policy
Follows documented policies and procedures that ensure the information is
preserved against all reasonable contingencies and enables the information
to be disseminated as authenticated copies of the original or as traceable
to the original.
A. Policies for collections development (e.g., selection and
retention) that link to technical procedures about how and at what level
materials are preserved and how access is provided both short and long term.
B. Policies for access control to ensure all parties are protected,
including authentication of users and disseminated materials.
C. Policies for storage of materials, including service-level
agreements with external suppliers.
D. Policies that define the repository's designated community and
describe its knowledge base.
E. A rigorous system for updating policies and procedure in accordance
with changes in technology and in the repository's designated community.
F. Explicit links between these policies and procedures, allowing for
easy application across heterogeneous collections.
II. SIP/submission information package/intake or receipt
A. Works closely with the repository's designated community to
advocate the use of good and (where possible) standard practice in the
creation of digital resources; this may include an outreach program for
potential depositors.
B. Negotiates for and accepts appropriate information from information
producers and rights holders.
a. Well-documented and agreed-on policies about what is selected for
deposit, including, where appropriate, specific required formats.
b. Effective procedures and workflows for obtaining copyright
clearance for both short-term and immediate access, as necessary, and
preservation.
c. A comprehensive metadata specification and agreed-on standards for
its implementation. This is critical for federated or networked
repositories and includes standards for the provision of rights metadata
from content providers and for representing technical metadata.
d. Procedures and systems for ensuring the authenticity of submitted
materials.
e. Initial assessment of the completeness of the submission.
f. Effective record keeping of all transactions, including ongoing
relationships, with content providers.
III. AIP/archive information package/care and feeding
Obtains sufficient control of the information provided to support long-term
preservation:
A. Detailed analysis of an object or class of objects to assess its
significant properties. Analysis should be automated as much as possible
and informed by the collections management policy, rights clearances, the
designated community's knowledge base, and policy restrictions on specific
file formats.
B. Verification and creation of bibliographic and technical metadata
and documentation to support the long-term preservation of the digital
object according to its significant properties and underlying technology or
abstract form, with monitoring and updating of metadata as necessary to
reflect changes in technology or access arrangements. This involves
understanding how strategies for continuing access, such as migration and
emulation, influence the creation of preservation metadata.
C. A robust system of unique identification.
D. A reliable method for encapsulating the digital object with its
metadata in the archive.
E. A reliable archival storage facility, including an ongoing program
of media refreshment; a program of monitoring media; geographically
distributed backup systems; routine authenticity and integrity checking of
the stored object; disaster preparedness; response, and recovery policies
and procedures; and security.
IV. DIP/dissemination information package/delivery
A. Determines, either by itself of with others, the users that make up
its designated community, which should be able to understand the
information provided. Analysis and documentation of the repository's
designated community; for federated or cooperating repositories, a shared
understanding of the designated community.
B. Ensures that the information to be preserved is "independently
understandable" to the designated community; that is, that the community
can understand the information without needing the assistance of experts.
a. Well-maintained and documented technical metadata that is kept
aligned with the knowledge base of the designated community and with
changing technologies.
b. A "technology watch" to manage the risk as technology evolves and
to provide continuing access and updated methods of access as necessary,
such as new migrations or emulators.
C. Makes the preserved information available to the designated
community.
a. A system for discovery of resources.
b. Appropriate mechanisms for authentication of the digital materials.
c. Access control mechanisms in accordance with licenses and laws, and
an "access rights watch."
d. Mechanisms for managing electronic commerce. User support programs.
---------------------------------------------------------------------
---------------------------------------------------------------------
Assumption: (from TDR/RLG/OCLC)
Preservation reqires active management that begins at creation ... [p.18]
----------------------------
Assumption:
Digital collecting or long-term preservation and access of digital
resources (hereafter referred to as (digital preservation and access: DPA)
is and will be a responsibility shared with other respositories (libraries,
archives, museums, and related non-profit and for-profit organizations and
institutions. This assumption is based on three interrelated assumptions:
1) the long-term preservation and access of digital resources will
be expensive, requiring that the burden of remembering the digital cultural
artifacts deemed worth remembering be shared. No one repository will be
able to effectively remember all worth remembering.
2) the authenticity and reliability of the "remembered" (the
accuracy of our memory) and our judgement with respect to what is to be
remembered will necessary be subjected to evaluation. Such evaluation will
necessarily be conducted by an authoritative body or bodies that arise from
the cultural heritage repository communities, and, while users will rely on
in large part on the evaluation of the authoritative bodies, users will
also, ultimately, be the arbiters of the quality of our memory, and the
extent to which we can be trusted.
3) remembering, especially shared remembering, will necessarily
require a large number of hardware, software, communication, and
intellectual and procedural standards. In that standards are necessarily
the product of communities sharing common interests and objectives, digital
collecting will necessarily involve participating in the development of
standards and mastering them.
Assumption:
In order to take responsibility for DPA, the repository must "control" that
which is collected. In other words, the repository must have control over
the files (both content and, if necessary, software) in order to be able to
manange the DPA. Therefore access to digital content that is licensed, or
licensed access software, cannot be "collected." As a long-term strategy,
the respository needs to work with other respositories and with licensed
content provider on a strategy for the development of "DPA-friendly"
content, and for arrangments for transfer of control of such content to a
trusted respository. (See e-journal Mellon project at
www.clir.org/diglib/preserve/ejp.htm
Assumption:
There are no existing, proven methods for DPA. There several competing
theoretical models that are being tested. No one of these may emerge as THE
method, and a combination of methods may well emerge, with different
production methods, technology and standards (or lack thereof), and
publication content and functional objectives and different known and
anticipated user requirements being taken into account.
Assumption:
The mutability of the technology, the growing interdependency of the
various participants in scholarly communication (creators,
producers-publishers, repositories, and users) and the lack of an effective
political infrastructure to promote and develop cooperation and
collaboration among them leads to economic uncertainty, but also the need
to develop policy that reflects both what is known and understand, and what
is uncertain and changing.
Assumption:
The Reference Model for an Open Archive Information System (OAIS),
originally developed by the space research community, has gained wide
international acceptance as a "framework" for DPA. OAIS is currently being
considered by the International Standards Organization, largely at the
urging of the international archive and library communities. Virginia
policy, as a member of the international library community, will work
within the broad framework of OAIS, and will work within and participate in
the ongoing international application of OAIS to the cultural heritage
repository communities. At the level of respository administration,
Virginia needs to in particular to follow the OCLC/RLG report "Attributes
of a Trusted Digital Repository."
Assumption:
Inspired in part by OAIS, there are several metadata initiatives which need
to be followed. Some of these initiatives deal only with semantics, but
others with both semantics and syntax. In the semantic category, particular
attention needs to be paid to two reports by the OCLC/RLG Working Group on
Preservation Metadata, "A Recommendation for Preservation Description
Information," and "A Recommendation for Content Information." These two
metadata initiatives are addressing essential OAIS requirements.
Addressing descriptive data: Metadata Object Description Schema (MODS), an
initiative led by the Library of Congress.
METS ...
Assumption:
The current DL literature reflects two implicit (and sometimes almost
explicit) assumptions: 1) for large scale collections, digital publications
collected will be relatively simple (or discrete or close to it: one file,
or at most only a "few" files), and either created in or migrated to a hand
full of representations (or formats). It is assumed that large, complex
publications, with many interrelations between objects and/or or many
signficant functional properties will be too expensive for
archives/libraries to collect. It is assumed that the complexity of
collecting the complex will be best addressed by emulation. SDS does not
share this assumption. SDS, with its emphasis on behaviors, is "banking" on
declarative standards (such as XSL and XQuery) as making the replication of
behaviors over time affordable. This will need (humble) justification and
argument.
-----------
This archive was generated by hypermail 2b30 : Thu May 16 2002 - 14:56:43 EDT