[tei-council] idno, xml:lang, ref and att.pointing

Martin Holmes mholmes at uvic.ca
Thu Sep 15 10:59:20 EDT 2011


I see nothing controversial in standardizing examples of <idno>, as long 
as we're only amending the recommendations in the Guidelines. But there 
are a LOT of examples of <idno> scattered about:

<http://www.tei-c.org/release/doc/tei-p5-doc/en/html/examples-idno.html>

One looks like it should be <biblScope>:

<seriesStmt>
  <title>《印度文學研究》的電腦可讀文件</title>
  <respStmt>
   <resp>編者</resp>
   <name>珍.崗妲</name>
  </respStmt>
  <idno type="vol">1.2</idno>
  <idno type="ISSN">0 345 6789</idno>
</seriesStmt>

In some cases, the <idno> seems to contain an entire bibliographic 
reference:

<idno type="cbeta">Taisho Tripitaka Vol. T08, No. 230</idno>

There's inconsistency over whether, when a protocol is used at the 
beginning of the idno, the protocol should be specified in @type:

<idno type="DOI">doi:10.1000/123</idno>
<idno type="URL">http://authority.nzetc.org/463/</idno>

Should the second be @type="http"? If not, what dictates whether @type = 
protocol, or @type = something else?



With ISBNs and ISSNs, should spaces be included or avoided?

Cheers,
Martin




On 11-09-14 10:42 PM, Stuart A. Yeates wrote:
> I'm currently doing some work with automatic language detection (as
> per my thesis), and am seeing interesting features in headers. The
> features are worst (or perhaps more consistent) with the idno tag.
>
> The underlying problem is that this tag is most commonly used with
> non-linguistic text (i.e. URLs, ISBNs, DOIs, etc), but current TEI
> practice doesn't include using xml:lang="" (meaning unknown) or
> xml:lang="zxx" (meaning non-linguistic content) for such text. The
> character string "http:" (for example) is arguably English, but when
> it appears in script which doesn't include the letter 'h' is clearly
> wrong and ends up corrupting the language model of the language I'm
> building.
>
> Supplemental issues are: (a) that URLs are being used with no
> indication of whether they're being URL-encoded and (b) that the ref
> and idno tags are used in practice to do very similar things, but idno
> doesn't have access to att.pointing.
>
> I could like to suggest that the definition of idno is updated to
>
> (I)
>
> make it clear that
>
> <idno type="XXX">XXX:YYYYYYY</idno>
>
> is syntatic sugar for
>
> <ref url="XXX:YYYYYYY"/>
>
> when XXX (matched case insensitively) is a standard or commonly used
> URI scheme. See
> https://secure.wikimedia.org/wikipedia/en/wiki/URI_scheme
>
> Ideally I like to switch to:
>
> <idno type="XXX"><ref url="XXX:YYYYYYY"/></idno>  or
> <idno type="XXX" url="XXX:YYYYYYY"/>
>
> for representing this. But that may be a little disruptive.
>
> (II)
>
> Recommend that ISBNs, ISSNs, etc, be represented as URNs and fit
> within the above. See
> https://secure.wikimedia.org/wikipedia/en/wiki/Uniform_Resource_Name
>
> (III)
>
> Recommend the use of xml:lang="" or xml:lang="zxx" for content that is
> of unknown linguistic content and non-linguistic content respectively.
>
> cheers
> stuart
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>
> PLEASE NOTE: postings to this list are publicly archived
> .
>

-- 
Martin Holmes
University of Victoria Humanities Computing and Media Centre
(mholmes at uvic.ca)


More information about the tei-council mailing list