[tei-council] Internationalised domains

Martin Holmes mholmes at uvic.ca
Tue Sep 20 18:21:54 EDT 2011


I agree. I think punycode is a temporary solution to problems with 
Internet infrastructure and user-agent limitations; if it's to be used, 
it should be generated during output processing, rather than being part 
of the core document. TEI XML should be in UTF-8, I think.

Cheers,
Martin

On 11-09-20 03:12 PM, Kevin Hawkins wrote:
> I guess what I'm saying is that Punycode is prescribed for use with the
> Domain Name System, but our TEI documents might outlive DNS or be used
> in a system that uses doesn't use DNS.  After all, even URIs (as
> prescribed in RFC 3986) give DNS as an example of a name registry
> mechanism, not the only one.
>
> We tie ourselves to a few external standards (maintained by the W3C)
> which may become obsolete at some point, but I'm not sure whether we
> should add systems maintained by ICANN to the list.
>
> --Kevin
>
> On 9/20/2011 2:31 PM, Stuart A. Yeates wrote:
>> Punycode is already required (and happens automatically with modern
>> tools and formats) for URIs. View the source of the (UTF-8) web page
>> of my example website to see what I mean.
>>
>> The issue is when people put URIs and in free text fields where the
>> tools are unaware that these are URIs and expect them to 'just work'.
>>
>> cheers
>> stuart
>>
>>
>>
>> On Wed, Sep 21, 2011 at 1:26 AM, Kevin Hawkins
>> <kevin.s.hawkins at ultraslavonic.info>   wrote:
>>> I'm not sure about prescribing use of RFC 3492.  This seems to me like
>>> prescribing use of US-ASCII with character entity references instead of
>>> UTF-8 within XML documents to ensure that we can use our documents with
>>> a full range of software toolS -- something that fewer and fewer people
>>> support doing.
>>>
>>> On 9/20/2011 4:49 AM, Stuart A. Yeates wrote:
>>>> Currently domain names in TEI can occur in typed fields (such as
>>>> data.pointer) or in many other fields where type checking is more
>>>> relaxed (or non-existent). I would like to propose the following note
>>>> to appear somewhere in the standard (I'm thinking the data.pointer
>>>> page, but I'm open to suggestions). The URL in the example is perhaps
>>>> the best-known punycode URL (see
>>>> http://en.wikipedia.org/wiki/Masr_%28domain_name%29 ), but if Arabic
>>>> script causes problems in the publishing process I can probably find a
>>>> more Latin-esque one.
>>>>
>>>> cheers
>>>> stuart
>>>>
>>>> ----
>>>>
>>>> Internationalised domains containing non-ASCII characters should
>>>> always be escaped using RFC 3492 syntax ("punycode") Thus
>>>> http://موقع.وزارة-الاتصالات.مصر/ is written
>>>> http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/ Such escaping
>>>> permits internationalised domains to be used with a full range of
>>>> software tools.
>>>>
>>>> ----
>>>> _______________________________________________
>>>> tei-council mailing list
>>>> tei-council at lists.village.Virginia.EDU
>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>
>>>> PLEASE NOTE: postings to this list are publicly archived
>>> _______________________________________________
>>> tei-council mailing list
>>> tei-council at lists.village.Virginia.EDU
>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>
>>> PLEASE NOTE: postings to this list are publicly archived
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>
> PLEASE NOTE: postings to this list are publicly archived

-- 
Martin Holmes
University of Victoria Humanities Computing and Media Centre
(mholmes at uvic.ca)


More information about the tei-council mailing list