[tei-council] idno, xml:lang, ref and att.pointing

Kevin Hawkins kevin.s.hawkins at ultraslavonic.info
Thu Sep 15 15:01:42 EDT 2011

I prefer Stuart's (I) and (II) over (III), which seems needlessly obtuse 
to me.

Martin raises good questions to which I'm not sure of the answer at this 

Related, I think, to this question is this ticket:


Lou's comment says from late April, during the Dublin Council meeting, 
says, "Council will discuss recommendations for use of URIs in general." 
  And the Dublin minutes say we agreed "to deprecate @key and make clear 
that @ref can be used."  However, no comment was made on this ticket 
that any action was taken in SourceForge.


On 9/15/2011 10:59 AM, Martin Holmes wrote:
> I see nothing controversial in standardizing examples of<idno>, as long
> as we're only amending the recommendations in the Guidelines. But there
> are a LOT of examples of<idno>  scattered about:
> <http://www.tei-c.org/release/doc/tei-p5-doc/en/html/examples-idno.html>
> One looks like it should be<biblScope>:
> <seriesStmt>
>    <title>《印度文學研究》的電腦可讀文件</title>
>    <respStmt>
>     <resp>編者</resp>
>     <name>珍.崗妲</name>
>    </respStmt>
>    <idno type="vol">1.2</idno>
>    <idno type="ISSN">0 345 6789</idno>
> </seriesStmt>
> In some cases, the<idno>  seems to contain an entire bibliographic
> reference:
> <idno type="cbeta">Taisho Tripitaka Vol. T08, No. 230</idno>
> There's inconsistency over whether, when a protocol is used at the
> beginning of the idno, the protocol should be specified in @type:
> <idno type="DOI">doi:10.1000/123</idno>
> <idno type="URL">http://authority.nzetc.org/463/</idno>
> Should the second be @type="http"? If not, what dictates whether @type =
> protocol, or @type = something else?
> With ISBNs and ISSNs, should spaces be included or avoided?
> Cheers,
> Martin
> On 11-09-14 10:42 PM, Stuart A. Yeates wrote:
>> I'm currently doing some work with automatic language detection (as
>> per my thesis), and am seeing interesting features in headers. The
>> features are worst (or perhaps more consistent) with the idno tag.
>> The underlying problem is that this tag is most commonly used with
>> non-linguistic text (i.e. URLs, ISBNs, DOIs, etc), but current TEI
>> practice doesn't include using xml:lang="" (meaning unknown) or
>> xml:lang="zxx" (meaning non-linguistic content) for such text. The
>> character string "http:" (for example) is arguably English, but when
>> it appears in script which doesn't include the letter 'h' is clearly
>> wrong and ends up corrupting the language model of the language I'm
>> building.
>> Supplemental issues are: (a) that URLs are being used with no
>> indication of whether they're being URL-encoded and (b) that the ref
>> and idno tags are used in practice to do very similar things, but idno
>> doesn't have access to att.pointing.
>> I could like to suggest that the definition of idno is updated to
>> (I)
>> make it clear that
>> <idno type="XXX">XXX:YYYYYYY</idno>
>> is syntatic sugar for
>> <ref url="XXX:YYYYYYY"/>
>> when XXX (matched case insensitively) is a standard or commonly used
>> URI scheme. See
>> https://secure.wikimedia.org/wikipedia/en/wiki/URI_scheme
>> Ideally I like to switch to:
>> <idno type="XXX"><ref url="XXX:YYYYYYY"/></idno>   or
>> <idno type="XXX" url="XXX:YYYYYYY"/>
>> for representing this. But that may be a little disruptive.
>> (II)
>> Recommend that ISBNs, ISSNs, etc, be represented as URNs and fit
>> within the above. See
>> https://secure.wikimedia.org/wikipedia/en/wiki/Uniform_Resource_Name
>> (III)
>> Recommend the use of xml:lang="" or xml:lang="zxx" for content that is
>> of unknown linguistic content and non-linguistic content respectively.
>> cheers
>> stuart
>> .

