[ICA-EGAD-RiC] Artefactual response to RiC-CM Draft

Wed Jan 25 15:59:10 EST 2017

The following feedback has been collectively prepared by Artefactual
<https://www.artefactual.com>staff in response to the Records in Context
Conceptual Model (RiC-CM
<http://www.ica.org/en/egad-ric-conceptual-model-ric-cm-01pdf>) 0.1 Draft
released by the ICA's Expert Group on Archival Description (EGAD
<http://www.ica.org/en/about-egad>).

-----

We would like to begin by thanking EGAD for all their work on RiC to date.
We are grateful for all the efforts of EGAD, and have also enjoyed reading
the excellent feedback from members of the archival community.  We are
excited that EGAD has chosen a linked data approach to modelling archival
description to better represent the complex relationship between archival
materials and the contexts in which they are created, managed and
disseminated.

We agree with much of the feedback that has already been shared publicly,
and will try to avoid repetition here. As developers of open source
archival management software, we wanted to share some of our internal
discussions and questions that have arisen as we have reviewed the RiC-CM
draft with an eye to systems implementation. As a model whose primary
expression will be linked data, RiC is necessarily a standard that assumes
implementation in some kind of networked descriptive system - this suggests
to us several immediate considerations.

General implementation challenges

Through our experience with dozens of data migrations over many years, we
are all too aware of how many institutions still rely on Word documents,
XML authoring tools or bespoke databases as the basis of their finding aids
- and how many have yet to adopt any content standard to guide their local
descriptive practices. RiC will require even greater technical proficiency
to implement properly, incorporating technologies still novel to the
archival community (see also section 3.4 of the InterPARES Trust response
<https://interparestrust.com/2016/12/11/interpares-trust-responds-to-egad-ric/>)
which cannot be readily implemented outside of a software system. With this
radical shift, many small and medium archives risk being left behind. How
does the ICA intend to support adoption of the new standard? Will the ICA
continue to maintain the existing four standards for those archives who may
be unable or unwilling to make the move to linked data?

The role of content standards, data interoperability and harmonization

EGAD intends, in its final version of RiC, to create a “two-part standard:
a conceptual model for archival description (RiC-CM), and an ontology
(RiC-O)” (RiC-CM Consultation Draft v.01, p. 1). The flexibility of the
first draft of RiC-CM leaves much room for implementation - the same data
could be modeled a number of different ways. For example, the draft’s own
example diagram on page 93 does not make use of the top-level Date entity,
instead using date attributes present in other entities or relations to
bound time. A conceptual model and an ontology both fulfill very different
roles from a content standard, which aims to facilitate consistency in
descriptive fields and interoperability across space and time. In fact,
section I.5 of ISAD(G)’s introduction emphasizes these points:

This set of general rules for archival description is part of a process
that will

a. ensure the creation of consistent, appropriate, and self explanatory
descriptions;

b. facilitate the retrieval and exchange of information about archival
material;

c. enable the sharing of authority data; and

d. make possible the integration of descriptions from different locations
into a unified

information system.

While the ontology and conceptual model might provide enough of a framework
for consistent modeling of descriptions across space and time, it does not
seem to address the specific descriptive practices to be followed within
free-text descriptive attributes such as a scope and content. What role do
the ICA and EGAD see the existing ICA content standards playing in the
future? Will subsequent versions of RiC provide further specificity to
ensure consistent descriptive practices across domains and jurisdictions,
as ISAD(G) previously sought to provide?

Chris Hurley has pointed out the vast number of relationship types between
entities - 792 in the current draft - and we agree that these should be
constrained to better ensure consistent application. The InterPARES Trust
response rightly points out how the list of relations might easily be
simplified and halved by removing the confusing notion of past vs present
tense from the relations, relying instead on the existing date attributes
to bound time. On the other hand, we wonder whether it is necessary for
EGAD to enumerate all possible combinations of subject, relationship and
object, rather than simply providing the relationships as predicates and
allowing users to determine what kinds of connections to make with them
(using metadata application profiles - see below). For example, is it
necessary for the model to list all the different entities that the
relationship “associated with” (in both present and past form) can be used
to link together? We would be interested to hear other commenters’ thoughts
on this, since we are not certain whether others would agree that the
detailed list is unnecessary.

We also hope EGAD will consider the role that metadata application profiles
will play in implementation and interoperability, and would like to know
what guidance the ICA could provide on this. Two illustrative examples come
to mind: METS and PCDM.  The Metadata Exchange and Transmission Standard (
METS <http://www.loc.gov/standards/mets/>) was developed to facilitate data
exchange and transmission between repositories and tools. However, it is an
extremely flexible and permissive standard, making data exchange without a
shared application profile difficult, as the METS generated by one system
can rarely be parsed by that of another without intervention. The Portland
Common Data Model (PCDM <https://github.com/duraspace/pcdm/wiki>) was
similarly developed to provide a common mechanism for data interoperability
between Hydra implementers, though it has grown beyond a Hydra specific
model. However, the community found in early versions that the model was so
general and flexible that multiple interpretations of the same data, each
valid within the model, prevented interoperability anyway. They have since
set out more specific parameters and a formalized way of documenting a
specific application profile (see PCDM Profile Template)
<https://gist.github.com/anarchivist/981d25acfc1b92ac93a7cf2a9049b4c8>. RiC
might benefit from this lesson, and consider testing this kind of scenario
in advance. In some cases, constraints will produce data that can be more
readily combined and shared, giving it greater utility. Perhaps EGAD
considers this to be the responsibility of implementers - however, if this
is the case, then the role of the ICA in standards development should be
interrogated: is it not still to ensure consistency across space and time,
and to facilitate exchange and reuse? For developers to be able to
implement the system while still supporting exchange and interoperability,
we will need consistent implementation guidelines so that any systems
implementation can be designed to be able to exchange data with other
systems easily.

Additionally, we are somewhat surprised by the response of M. Clavaud on
the ICA-EGAD list-serv (2016-10-04) eschewing the reuse of existing
ontologies. While there are certainly areas in which this may be
appropriate, as a wholesale approach it strikes us contrary to the linked
data best practice of reusing standard vocabularies when possible (see for
example the W3C Best Practices <https://www.w3.org/TR/ld-bp/#VOCABULARIES>),
and represents an enormous maintenance burden for the ICA. The W3C’s SKOS
is a perfect example - is it truly necessary for RiC-O’s Concept/Thing
entity to repeat this work so completely? We urge EGAD to consider a more
balanced approach in what it chooses to reuse vs. what is designed anew. If
the approach taken by RiC is informed by metadata application profiles,
then RiC’s role becomes simpler - offering implementation guidelines for
data consistency by reusing existing vocabularies and ontologies, as well
as helpful extensions where existing ontologies do not meet the specific
needs of archivists.

Missing entities

Since RiC-CM and RiC-O seem aimed at providing resources for the management
of all archival functions and activities, we note several other possible
entities that do not seem to be covered by the proposed model. Namely, we
ask EGAD to consider the role that Rights, Accessions, and Physical storage
play in the management of archival information. Greg Bak has previously
pointed out that more might be needed to capture dependency information,
and we note that EGAD itself has acknowledged that fields that capture the
role of the archivist in shaping the record are still lacking. Rights are a
crucial element if data are to be exchanged and reused. Conditions of
access and conditions of use are listed as properties of record-related
entities, but might it not be desirable to declare how the rights are
related to an agent acting as the rights-holder of the records in question?
Why reduce a complex entity with its own properties from a thing to a
string, making it less machine-actionable in the future, and inconsistently
implemented? Similarly, while different jurisdictions and institutions will
handle accessions and physical storage information differently (or exclude
them entirely), we still see them represented in archival data often enough
to need a consistent method for expressing them within the RiC models.

We would point out as well that some entities seem to be missing important
properties - for example, we see no clear way to indicate that a Date might
be approximate or uncertain, a key feature of archival description. We also
strongly support the TS-DACS response
<https://docs.google.com/document/d/1XoQmrT-kdj5fCKNcg0umghsWFORrKRf7rMsB4OsnY5o/edit>
on RiC-P36 gender, and on identity in general.

Record vs Record-set

While conceptually we understand why EGAD has proposed the concept of the
record set, our experience suggests that implementing this distinction in
practice in an archival management system will be a hindrance over time.
Unexpected changes may bring a new record into a record set, thereby
invalidating any shared properties of a record set (3.5 and 3.6 in RiC-CM).
Further, granularity may grow over time - for example, a box that is
described as a record (an item) may have its contents described at a later
date - suddenly our item-level box record must become a record set. If a
record set and record are fundamentally different entities in the data
model, with different attributes and relationships, then switching between
entities will be difficult to implement and may lead to the loss of data
that is not valid for the new entity.

In AtoM’s data model, all records are simply “information objects” with the
same available properties, some of which may be inherited automatically
from higher levels of description. We believe a more flexible approach such
as this might ultimately be beneficial for systems implementors - it keeps
the data model simpler, thereby ensuring more consistency in
implementation, and makes all properties available to all records
regardless of type or level. A record may still describe an aggregation -
the way its properties are used would clarify this.

Next steps

Overall we are impressed with the work of EGAD to date and are excited to
see steps being taken to represent archival description as linked data.
However, as we have mentioned above, we worry about the ability of
under-resourced institutions to take advantage of the standard and its
accompanying ontology when they are finalized, given that many of these
institutions may have spent years achieving a basic level of compliance
with existing ICA standards. As software developers we also have a vested
interest in making sure that any new standard is compatible with the
ability to write software for implementation. We hope, therefore, that EGAD
keeps implementation considerations in mind as it begins work on the next
iteration of the model. We also hope that ICA does not plan to cease its
standard-related activities once RiC-CM and RiC-O are finalized, as
publishing a new standard is only the first step toward making the standard
usable by practitioners world-wide.

-----

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc <http://www.artefactual.com/>.
604-527-2056
@accesstomemory <https://twitter.com/accesstomemory>