[tei-council] Magic tokens, private URIs, and canonical references
Martin Holmes
mholmes at uvic.ca
Mon Feb 27 14:42:00 EST 2012
The final topic I'm supposed to cover during the Telco relates to the
big mess surrounding several attributes:
@cRef
@key
@lemma
@loc
metSym/@value
The current tickets are these:
<http://purl.org/TEI/BUGS/3480650> ("@cRef is a mess")
<http://purl.org/TEI/BUGS/3413346> ("Deprecation of data.key and
data.word attributes")
The basic issue is this:
Historically, we have encouraged the use of what might be termed "magic
tokens" in attributes such as @key:
<name key="bloggs_f">Fred Bloggs</name>
where the @key value is used to look up or point to a more detailed
record for the thing being referred to. The syntax or format of @key has
not been prescribed:
"@key provides an externally-defined means of identifying the entity (or
entities) being named, using a coded value of some kind."
and it has been left up to encoders to devise their own schemes for
dereferencing their @key values (usually turning them into
fully-qualified URLs, XPaths, database queries etc.), and to document
them in their own way.
In some cases, such as @cRef, a more formal methodology for
dereferencing has been prescribed. Here is how @cRef is defined on <term>:
<quote>
@cRef identifies the associated gloss element using a canonical
reference from a scheme defined in a refsDecl element in the TEI header
Status: Optional
Datatype: data.pointer
Values the result of applying the algorithm for the resolution of
canonical references (described in section 16.2.5 Canonical References)
should be a valid URI reference that resolves to a gloss element
Note
The refsDecl to use may be indicated with the decls attribute.
</quote>
The element <cRefPattern> is available for documenting the resolution
algorithm.
Recently we have become rightly uncomfortable with "magic key" values,
and have been looking for more formal ways for people to specify this
kind of pointer. The most commonly-advanced approach is to use what's
called a "private URI scheme", which essentially means an idiosyncratic,
unregistered prefix, followed by a unique identifier. For instance, in
our project "Map of London", we use private URIs like this:
<name type="person" ref="mol:HOLM3">Martin Holmes</name>
The advantage of private URI schemes is that they can be used in
attributes such as @ref and @target, which expect one or more URIs.
However, it's pretty obvious that these are still actually magic tokens.
There's no way for anyone to know what "mol:HOLM3" points at without
proper documentation and/or an algorithm for dereferencing it.
It's my belief that we should:
1. Provide a formal mechanism by which private URIs can be documented,
and where an algorithm for dereferencing them (converting them to some
sort of universal pointer such as an absolute web URI) can be specified.
2. Once we have done this, rewrite relevant parts of the guidelines to
encourage the use of attributes such as @ref and @target in place of the
old "magic token" attributes such as @key.
3. Deprecate some or all of these attributes.
4. Address the mess which is @cRef (see the relevant ticket). It will be
easier to do this after 1-3 are completed.
During the Telco, I'd like to get a sense of whether people agree with
me on #1. If they don't -- if everyone is happy that private URIs should
be used as magic tokens have historically been used, without any formal
recommendation for documentation -- then the second ticket can be
closed; the issue of @cRef is still problematic, but we could even
choose to ignore that if we wish. If there is some agreement with #1,
then I think we should put together a small working group to create a
recommendation.
Cheers,
Martin
--
Martin Holmes
University of Victoria Humanities Computing and Media Centre
(mholmes at uvic.ca)
More information about the tei-council
mailing list