[tei-council] Internationalised domains

Stuart A. Yeates syeates at gmail.com
Tue Sep 20 23:27:58 EDT 2011


The situations I am trying to avoid are:

<name sameas="urn:example:%D9%85%D9%88%D9%82%D8%B9.%D9%88%D8%B2%D8%A7%D8%B1%D8%A9-%D8%A7%D9%84%D8%A7%D8%AA%D8%B5%D8%A7%D9%84%D8%A7%D8%AA.%D9%85%D8%B5%D8%B1"
copyOf="urn:example:xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c"
corresp="http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/"
key="http://موقع.وزارة-الاتصالات.مصر/">Egyptian Ministry of
Communication and Information Technology</name>

and

<idno>http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/</idno>
vs <idno>http://موقع.وزارة-الاتصالات.مصر/</idno>

etc

URLs are already using punycode in the domain part and percent
escaping in the file part (at least when they're used in data.pointer
), and XML has some pretty strong dependencies on URLs, so neither can
be prohibited without serious consequences.

Both punycode and percent encoding are mappings of UTF-8, they can be
converted back and forth with a 1:1 mapping. They are not violations
of the "use UTF-8" rule.

cheers
stuart


On Wed, Sep 21, 2011 at 10:21 AM, Martin Holmes <mholmes at uvic.ca> wrote:
> I agree. I think punycode is a temporary solution to problems with
> Internet infrastructure and user-agent limitations; if it's to be used,
> it should be generated during output processing, rather than being part
> of the core document. TEI XML should be in UTF-8, I think.
>
> Cheers,
> Martin
>
> On 11-09-20 03:12 PM, Kevin Hawkins wrote:
>> I guess what I'm saying is that Punycode is prescribed for use with the
>> Domain Name System, but our TEI documents might outlive DNS or be used
>> in a system that uses doesn't use DNS.  After all, even URIs (as
>> prescribed in RFC 3986) give DNS as an example of a name registry
>> mechanism, not the only one.
>>
>> We tie ourselves to a few external standards (maintained by the W3C)
>> which may become obsolete at some point, but I'm not sure whether we
>> should add systems maintained by ICANN to the list.
>>
>> --Kevin
>>
>> On 9/20/2011 2:31 PM, Stuart A. Yeates wrote:
>>> Punycode is already required (and happens automatically with modern
>>> tools and formats) for URIs. View the source of the (UTF-8) web page
>>> of my example website to see what I mean.
>>>
>>> The issue is when people put URIs and in free text fields where the
>>> tools are unaware that these are URIs and expect them to 'just work'.
>>>
>>> cheers
>>> stuart
>>>
>>>
>>>
>>> On Wed, Sep 21, 2011 at 1:26 AM, Kevin Hawkins
>>> <kevin.s.hawkins at ultraslavonic.info>   wrote:
>>>> I'm not sure about prescribing use of RFC 3492.  This seems to me like
>>>> prescribing use of US-ASCII with character entity references instead of
>>>> UTF-8 within XML documents to ensure that we can use our documents with
>>>> a full range of software toolS -- something that fewer and fewer people
>>>> support doing.
>>>>
>>>> On 9/20/2011 4:49 AM, Stuart A. Yeates wrote:
>>>>> Currently domain names in TEI can occur in typed fields (such as
>>>>> data.pointer) or in many other fields where type checking is more
>>>>> relaxed (or non-existent). I would like to propose the following note
>>>>> to appear somewhere in the standard (I'm thinking the data.pointer
>>>>> page, but I'm open to suggestions). The URL in the example is perhaps
>>>>> the best-known punycode URL (see
>>>>> http://en.wikipedia.org/wiki/Masr_%28domain_name%29 ), but if Arabic
>>>>> script causes problems in the publishing process I can probably find a
>>>>> more Latin-esque one.
>>>>>
>>>>> cheers
>>>>> stuart
>>>>>
>>>>> ----
>>>>>
>>>>> Internationalised domains containing non-ASCII characters should
>>>>> always be escaped using RFC 3492 syntax ("punycode") Thus
>>>>> http://موقع.وزارة-الاتصالات.مصر/ is written
>>>>> http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/ Such escaping
>>>>> permits internationalised domains to be used with a full range of
>>>>> software tools.
>>>>>
>>>>> ----
>>>>> _______________________________________________
>>>>> tei-council mailing list
>>>>> tei-council at lists.village.Virginia.EDU
>>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>>
>>>>> PLEASE NOTE: postings to this list are publicly archived
>>>> _______________________________________________
>>>> tei-council mailing list
>>>> tei-council at lists.village.Virginia.EDU
>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>
>>>> PLEASE NOTE: postings to this list are publicly archived
>> _______________________________________________
>> tei-council mailing list
>> tei-council at lists.village.Virginia.EDU
>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>
>> PLEASE NOTE: postings to this list are publicly archived
>
> --
> Martin Holmes
> University of Victoria Humanities Computing and Media Centre
> (mholmes at uvic.ca)
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>
> PLEASE NOTE: postings to this list are publicly archived


More information about the tei-council mailing list