[tei-council] idno, xml:lang, ref and att.pointing
kevin.s.hawkins at ultraslavonic.info
Thu Sep 15 15:01:42 EDT 2011
I prefer Stuart's (I) and (II) over (III), which seems needlessly obtuse
Martin raises good questions to which I'm not sure of the answer at this
Related, I think, to this question is this ticket:
Lou's comment says from late April, during the Dublin Council meeting,
says, "Council will discuss recommendations for use of URIs in general."
And the Dublin minutes say we agreed "to deprecate @key and make clear
that @ref can be used." However, no comment was made on this ticket
that any action was taken in SourceForge.
On 9/15/2011 10:59 AM, Martin Holmes wrote:
> I see nothing controversial in standardizing examples of<idno>, as long
> as we're only amending the recommendations in the Guidelines. But there
> are a LOT of examples of<idno> scattered about:
> One looks like it should be<biblScope>:
> <idno type="vol">1.2</idno>
> <idno type="ISSN">0 345 6789</idno>
> In some cases, the<idno> seems to contain an entire bibliographic
> <idno type="cbeta">Taisho Tripitaka Vol. T08, No. 230</idno>
> There's inconsistency over whether, when a protocol is used at the
> beginning of the idno, the protocol should be specified in @type:
> <idno type="DOI">doi:10.1000/123</idno>
> <idno type="URL">http://authority.nzetc.org/463/</idno>
> Should the second be @type="http"? If not, what dictates whether @type =
> protocol, or @type = something else?
> With ISBNs and ISSNs, should spaces be included or avoided?
> On 11-09-14 10:42 PM, Stuart A. Yeates wrote:
>> I'm currently doing some work with automatic language detection (as
>> per my thesis), and am seeing interesting features in headers. The
>> features are worst (or perhaps more consistent) with the idno tag.
>> The underlying problem is that this tag is most commonly used with
>> non-linguistic text (i.e. URLs, ISBNs, DOIs, etc), but current TEI
>> practice doesn't include using xml:lang="" (meaning unknown) or
>> xml:lang="zxx" (meaning non-linguistic content) for such text. The
>> character string "http:" (for example) is arguably English, but when
>> it appears in script which doesn't include the letter 'h' is clearly
>> wrong and ends up corrupting the language model of the language I'm
>> Supplemental issues are: (a) that URLs are being used with no
>> indication of whether they're being URL-encoded and (b) that the ref
>> and idno tags are used in practice to do very similar things, but idno
>> doesn't have access to att.pointing.
>> I could like to suggest that the definition of idno is updated to
>> make it clear that
>> <idno type="XXX">XXX:YYYYYYY</idno>
>> is syntatic sugar for
>> <ref url="XXX:YYYYYYY"/>
>> when XXX (matched case insensitively) is a standard or commonly used
>> URI scheme. See
>> Ideally I like to switch to:
>> <idno type="XXX"><ref url="XXX:YYYYYYY"/></idno> or
>> <idno type="XXX" url="XXX:YYYYYYY"/>
>> for representing this. But that may be a little disruptive.
>> Recommend that ISBNs, ISSNs, etc, be represented as URNs and fit
>> within the above. See
>> Recommend the use of xml:lang="" or xml:lang="zxx" for content that is
>> of unknown linguistic content and non-linguistic content respectively.
>> tei-council mailing list
>> tei-council at lists.village.Virginia.EDU
>> PLEASE NOTE: postings to this list are publicly archived
More information about the tei-council