[tei-council] Proposal <idno> coverage -SF 2493417

Wed Jan 21 16:44:24 EST 2009

I just added the following comment to SF feature request 2493417. I cc 
this to Syd and Kevin, as their postings show some interest in the issue.

Peter

SF 2493417 consists of two parts. The first part asks for some extra 
examples that show idno's are not necessarily numeric. Syd provided some 
examples in SF bug 2457147.

The second part of the feature request requests of <idno> that we 
'extend its scope so that it can treat unique identifiers for core 
components of a bibliographical reference, in particular, authors (it 
should thus be part of the content model of <author> among others'. The 
rest of this comment discusses that second request.

It is clear there are many advantages to unique identification of 
scholarly authors: finding an author’s other articles, finding an 
author’s current affiliation, relating non-article publications (weblog 
entries, etc.) all require some more robust way of identifying a person 
than by name.  An illustration of that fact is given by [1]: the 
Mathematical Reviews author database contains 32 authors called "Wang, 
Wei" with no additional names.  For more literature, see [2, 3, 4].

It should therefore be possible to identify scholarly authors by 
something other than their name. There exist, perhaps unfortunately, 
several initiatives to assign unique id's to scholarly authors, such as 
Researcher ID (http://www.researcherid.com/) and Digital Author 
Identifier (http://www.surffoundation.nl/smartsite.dws?ch=ENG&id=13480). 
Others have argued researchers should be identified through their OpenID 
accounts (http://openid.net/). National libraries have their 
(overlapping) authority files. There exists an upcoming ISO standard for 
identifying names/entities: International Standard Name Identifier 
(http://www.isni.org/). Elsevier has its Scopus id’s.

It should be possible to store these author identifiers in (TEI) 
bibliographies. We could achieve that effect in a number of ways:
(1) use @key on an <author>’s <name>
(2) use @ref on an <author>’s <name>
(3) add <author> to att.canonical and use @key or @ref on <author>
(4) create a new element <authorid> and add it to <author>’s content model
(5) extend the scope of the existing element <idno> and add <idno> to 
<author>’s content model

Any solution will however have to cater for the fact that authors may 
have multiple digital author identifiers, corresponding to different 
scheme’s. E.g.:
- an International Standard Name Identifier might one day look like 
urn:isni:12341234
- a researcher id looks like C-1234-2008 or 
http://www.researcherid.com/rid/ C-1234-2008
- a Dutch DAI looks like: info:eu-repo/dai/nl/12456454
- an open id might look like: https://me.yahoo.com/johndoe61

This means that any solutions that rely on attributes will either need 
to somehow store the identification scheme in the attribute, or have to 
rely on parsing the value to guess what scheme is applicable. @key has 
the added problem that it holds by definition only one value, so even if
    key="researcherid:C-1234-2008"
would work, it could not at the same time hold the International 
Standard Name Identifier for the researcher. @ref could hold multiple 
values, but must contain uri’s; we could have
    ref="info:eu-repo/dai/nl/12456454 https://me.yahoo.com/johndoe61"
but then software would have to guess what scheme is applicable.

This implies that for a robust solution we need a repeatable element 
that stores the identifier’s scheme as a type or scheme attribute, and 
the value either as text or as a value attribute. We can either create a 
new element for the purpose, e.g. <authorid>, or reuse an existing element.

The proposal here is to use the existing <idno> element. The need to 
identify authors is exactly analogous to the need to identify 
bibliographic elements such as articles or monographs, the element has 
already an appropriately generic name, and I see no reason why not to 
use it. This does not involve, as Syd wrote on the TEI in Libraries 
mailing list 
(https://listserv.indiana.edu/cgi-bin/wa-iub.exe?A2=ind0901B&L=TEILIB-L&T=0&F=&S=&P=2774), 
a ‘semantic shift’: <idno> would have the same meaning it always had, it 
would just be applied to new elements.

This would involve:
- changing the definition of idno from ‘supplies any standard or 
non-standard number used to identify a bibliographic item’ to e.g. 
‘supplies any standard or non-standard number used to identify 
bibliographic elements’
- adding <idno> to <author>’s content model, presumably as its first 
element.

We could then have e.g.
<author>
    <idno type="nldai">info:eu-repo/dai/nl/12456454</idno>
    <idno type="openid">https://me.yahoo.com/johndoe61</idno>
   John Doe
</author>

[1] TePaske-King, B. and Richert, N. (2001), 'The identification of 
authors in the Mathematical Reviews Database', Issues in Science and 
Technology Librarianship, 31.
[2] Bourne, Philip E. and Fink, J. Lynn (2008), 'I Am Not a Scientist, I 
Am a Number', PLoS Computational Biology, 4 (12), e1000247.
[3] Danskin, Alan, et al. (2008), 'A review of the current landscape in 
relation to a proposed Name Authority Service for UK repositories of 
research outputs', (JISC).
[4] Cals, J. W. L. and Kotz, D. (2008), 'Researcher identification: the 
right needle in the haystack', The Lancet, 371 (9631), 2152-53.