[tei-council] idno, xml:lang, ref and att.pointing

Martin Holmes mholmes at uvic.ca
Thu Sep 15 10:59:20 EDT 2011

I see nothing controversial in standardizing examples of <idno>, as long 
as we're only amending the recommendations in the Guidelines. But there 
are a LOT of examples of <idno> scattered about:


One looks like it should be <biblScope>:

  <idno type="vol">1.2</idno>
  <idno type="ISSN">0 345 6789</idno>

In some cases, the <idno> seems to contain an entire bibliographic 

<idno type="cbeta">Taisho Tripitaka Vol. T08, No. 230</idno>

There's inconsistency over whether, when a protocol is used at the 
beginning of the idno, the protocol should be specified in @type:

<idno type="DOI">doi:10.1000/123</idno>
<idno type="URL">http://authority.nzetc.org/463/</idno>

Should the second be @type="http"? If not, what dictates whether @type = 
protocol, or @type = something else?

With ISBNs and ISSNs, should spaces be included or avoided?


On 11-09-14 10:42 PM, Stuart A. Yeates wrote:
> I'm currently doing some work with automatic language detection (as
> per my thesis), and am seeing interesting features in headers. The
> features are worst (or perhaps more consistent) with the idno tag.
> The underlying problem is that this tag is most commonly used with
> non-linguistic text (i.e. URLs, ISBNs, DOIs, etc), but current TEI
> practice doesn't include using xml:lang="" (meaning unknown) or
> xml:lang="zxx" (meaning non-linguistic content) for such text. The
> character string "http:" (for example) is arguably English, but when
> it appears in script which doesn't include the letter 'h' is clearly
> wrong and ends up corrupting the language model of the language I'm
> building.
> Supplemental issues are: (a) that URLs are being used with no
> indication of whether they're being URL-encoded and (b) that the ref
> and idno tags are used in practice to do very similar things, but idno
> doesn't have access to att.pointing.
> I could like to suggest that the definition of idno is updated to
> (I)
> make it clear that
> <idno type="XXX">XXX:YYYYYYY</idno>
> is syntatic sugar for
> <ref url="XXX:YYYYYYY"/>
> when XXX (matched case insensitively) is a standard or commonly used
> URI scheme. See
> https://secure.wikimedia.org/wikipedia/en/wiki/URI_scheme
> Ideally I like to switch to:
> <idno type="XXX"><ref url="XXX:YYYYYYY"/></idno>  or
> <idno type="XXX" url="XXX:YYYYYYY"/>
> for representing this. But that may be a little disruptive.
> (II)
> Recommend that ISBNs, ISSNs, etc, be represented as URNs and fit
> within the above. See
> https://secure.wikimedia.org/wikipedia/en/wiki/Uniform_Resource_Name
> (III)
> Recommend the use of xml:lang="" or xml:lang="zxx" for content that is
> of unknown linguistic content and non-linguistic content respectively.
> cheers
> stuart
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
> PLEASE NOTE: postings to this list are publicly archived
> .

Martin Holmes
University of Victoria Humanities Computing and Media Centre
(mholmes at uvic.ca)

More information about the tei-council mailing list