[tei-council] Magic tokens, private URIs, and canonical references

Martin Holmes mholmes at uvic.ca
Mon Feb 27 14:42:00 EST 2012


The final topic I'm supposed to cover during the Telco relates to the 
big mess surrounding several attributes:

@cRef
@key
@lemma
@loc
metSym/@value

The current tickets are these:

<http://purl.org/TEI/BUGS/3480650> ("@cRef is a mess")

<http://purl.org/TEI/BUGS/3413346> ("Deprecation of data.key and 
data.word attributes")

The basic issue is this:

Historically, we have encouraged the use of what might be termed "magic 
tokens" in attributes such as @key:

<name key="bloggs_f">Fred Bloggs</name>

where the @key value is used to look up or point to a more detailed 
record for the thing being referred to. The syntax or format of @key has 
not been prescribed:

"@key provides an externally-defined means of identifying the entity (or 
entities) being named, using a coded value of some kind."

and it has been left up to encoders to devise their own schemes for 
dereferencing their @key values (usually turning them into 
fully-qualified URLs, XPaths, database queries etc.), and to document 
them in their own way.

In some cases, such as @cRef, a more formal methodology for 
dereferencing has been prescribed. Here is how @cRef is defined on <term>:

<quote>
@cRef	identifies the associated gloss element using a canonical 
reference from a scheme defined in a refsDecl element in the TEI header
Status:	Optional
Datatype: data.pointer

Values 	the result of applying the algorithm for the resolution of 
canonical references (described in section 16.2.5 Canonical References) 
should be a valid URI reference that resolves to a gloss element
Note	
The refsDecl to use may be indicated with the decls attribute.
</quote>

The element <cRefPattern> is available for documenting the resolution 
algorithm.

Recently we have become rightly uncomfortable with "magic key" values, 
and have been looking for more formal ways for people to specify this 
kind of pointer. The most commonly-advanced approach is to use what's 
called a "private URI scheme", which essentially means an idiosyncratic, 
unregistered prefix, followed by a unique identifier. For instance, in 
our project "Map of London", we use private URIs like this:

<name type="person" ref="mol:HOLM3">Martin Holmes</name>

The advantage of private URI schemes is that they can be used in 
attributes such as @ref and @target, which expect one or more URIs.

However, it's pretty obvious that these are still actually magic tokens. 
There's no way for anyone to know what "mol:HOLM3" points at without 
proper documentation and/or an algorithm for dereferencing it.

It's my belief that we should:

1. Provide a formal mechanism by which private URIs can be documented, 
and where an algorithm for dereferencing them (converting them to some 
sort of universal pointer such as an absolute web URI) can be specified.

2. Once we have done this, rewrite relevant parts of the guidelines to 
encourage the use of attributes such as @ref and @target in place of the 
old "magic token" attributes such as @key.

3. Deprecate some or all of these attributes.

4. Address the mess which is @cRef (see the relevant ticket). It will be 
easier to do this after 1-3 are completed.


During the Telco, I'd like to get a sense of whether people agree with 
me on #1. If they don't -- if everyone is happy that private URIs should 
be used as magic tokens have historically been used, without any formal 
recommendation for documentation -- then the second ticket can be 
closed; the issue of @cRef is still problematic, but we could even 
choose to ignore that if we wish. If there is some agreement with #1, 
then I think we should put together a small working group to create a 
recommendation.

Cheers,
Martin


-- 
Martin Holmes
University of Victoria Humanities Computing and Media Centre
(mholmes at uvic.ca)


More information about the tei-council mailing list