[tei-council] Internationalised domains
kevin.s.hawkins at ultraslavonic.info
Tue Sep 20 18:12:52 EDT 2011
I guess what I'm saying is that Punycode is prescribed for use with the
Domain Name System, but our TEI documents might outlive DNS or be used
in a system that uses doesn't use DNS. After all, even URIs (as
prescribed in RFC 3986) give DNS as an example of a name registry
mechanism, not the only one.
We tie ourselves to a few external standards (maintained by the W3C)
which may become obsolete at some point, but I'm not sure whether we
should add systems maintained by ICANN to the list.
On 9/20/2011 2:31 PM, Stuart A. Yeates wrote:
> Punycode is already required (and happens automatically with modern
> tools and formats) for URIs. View the source of the (UTF-8) web page
> of my example website to see what I mean.
> The issue is when people put URIs and in free text fields where the
> tools are unaware that these are URIs and expect them to 'just work'.
> On Wed, Sep 21, 2011 at 1:26 AM, Kevin Hawkins
> <kevin.s.hawkins at ultraslavonic.info> wrote:
>> I'm not sure about prescribing use of RFC 3492. This seems to me like
>> prescribing use of US-ASCII with character entity references instead of
>> UTF-8 within XML documents to ensure that we can use our documents with
>> a full range of software toolS -- something that fewer and fewer people
>> support doing.
>> On 9/20/2011 4:49 AM, Stuart A. Yeates wrote:
>>> Currently domain names in TEI can occur in typed fields (such as
>>> data.pointer) or in many other fields where type checking is more
>>> relaxed (or non-existent). I would like to propose the following note
>>> to appear somewhere in the standard (I'm thinking the data.pointer
>>> page, but I'm open to suggestions). The URL in the example is perhaps
>>> the best-known punycode URL (see
>>> http://en.wikipedia.org/wiki/Masr_%28domain_name%29 ), but if Arabic
>>> script causes problems in the publishing process I can probably find a
>>> more Latin-esque one.
>>> Internationalised domains containing non-ASCII characters should
>>> always be escaped using RFC 3492 syntax ("punycode") Thus
>>> http://موقع.وزارة-الاتصالات.مصر/ is written
>>> http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/ Such escaping
>>> permits internationalised domains to be used with a full range of
>>> software tools.
>>> tei-council mailing list
>>> tei-council at lists.village.Virginia.EDU
>>> PLEASE NOTE: postings to this list are publicly archived
>> tei-council mailing list
>> tei-council at lists.village.Virginia.EDU
>> PLEASE NOTE: postings to this list are publicly archived
More information about the tei-council