[tei-council] Internationalised domains

Martin Holmes mholmes at uvic.ca
Thu Sep 22 17:03:13 EDT 2011


I think I see the source of the confusion. Older W3C drafts seem to have 
explicitly addressed the issue of encoding URIs in US-ASCII:

<http://www.w3.org/TR/2001/WD-charmod-20010126/#sec-URIs>

but that section seems to have disappeared from the current draft:

<http://www.w3.org/TR/charmod/>

which, on a quick reading, leaves me with the impression that UTF-8, 
UTF-16 etc. are acceptable encodings.

Cheers,
Martin

On 11-09-22 12:25 PM, Stuart A. Yeates wrote:
> I was nuder the impression that non-latin-1 wasn't allowed in
> data.pointer (and looking through the relevant standards I still can't
> see how it is), but such things seem to validate, so I guess you are.
>
> So I'd like to apologize for for my misunderstanding and and withdraw
> my suggestion.
>
> cheers
> stuart
>
> On Thu, Sep 22, 2011 at 4:03 AM, Kevin Hawkins
> <kevin.s.hawkins at ultraslavonic.info>  wrote:
>> I still don't see why Stuart wouldn't simply put this in the TEI:
>>
>> <name sameAs=""http://موقع.وزارة-الاتصالات.مصر/">Egyptian Ministry of
>>   >  Communication and Information Technology</name>
>>
>> <idno>http://موقع.وزارة-الاتصالات.مصر/</idno>
>>
>> and be done with it.  Generation of percent encoding and Punycode would
>> be done by XSLT that produces whatever is used by the delivery system.
>>
>> --Kevin
>>
>> On 9/20/2011 11:27 PM, Stuart A. Yeates wrote:
>>> The situations I am trying to avoid are:
>>>
>>> <name sameas="urn:example:%D9%85%D9%88%D9%82%D8%B9.%D9%88%D8%B2%D8%A7%D8%B1%D8%A9-%D8%A7%D9%84%D8%A7%D8%AA%D8%B5%D8%A7%D9%84%D8%A7%D8%AA.%D9%85%D8%B5%D8%B1"
>>> copyOf="urn:example:xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c"
>>> corresp="http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/"
>>> key="http://موقع.وزارة-الاتصالات.مصر/">Egyptian Ministry of
>>> Communication and Information Technology</name>
>>>
>>> and
>>>
>>> <idno>http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/</idno>
>>> vs<idno>http://موقع.وزارة-الاتصالات.مصر/</idno>
>>>
>>> etc
>>>
>>> URLs are already using punycode in the domain part and percent
>>> escaping in the file part (at least when they're used in data.pointer
>>> ), and XML has some pretty strong dependencies on URLs, so neither can
>>> be prohibited without serious consequences.
>>>
>>> Both punycode and percent encoding are mappings of UTF-8, they can be
>>> converted back and forth with a 1:1 mapping. They are not violations
>>> of the "use UTF-8" rule.
>>>
>>> cheers
>>> stuart
>>>
>>>
>>> On Wed, Sep 21, 2011 at 10:21 AM, Martin Holmes<mholmes at uvic.ca>    wrote:
>>>> I agree. I think punycode is a temporary solution to problems with
>>>> Internet infrastructure and user-agent limitations; if it's to be used,
>>>> it should be generated during output processing, rather than being part
>>>> of the core document. TEI XML should be in UTF-8, I think.
>>>>
>>>> Cheers,
>>>> Martin
>>>>
>>>> On 11-09-20 03:12 PM, Kevin Hawkins wrote:
>>>>> I guess what I'm saying is that Punycode is prescribed for use with the
>>>>> Domain Name System, but our TEI documents might outlive DNS or be used
>>>>> in a system that uses doesn't use DNS.  After all, even URIs (as
>>>>> prescribed in RFC 3986) give DNS as an example of a name registry
>>>>> mechanism, not the only one.
>>>>>
>>>>> We tie ourselves to a few external standards (maintained by the W3C)
>>>>> which may become obsolete at some point, but I'm not sure whether we
>>>>> should add systems maintained by ICANN to the list.
>>>>>
>>>>> --Kevin
>>>>>
>>>>> On 9/20/2011 2:31 PM, Stuart A. Yeates wrote:
>>>>>> Punycode is already required (and happens automatically with modern
>>>>>> tools and formats) for URIs. View the source of the (UTF-8) web page
>>>>>> of my example website to see what I mean.
>>>>>>
>>>>>> The issue is when people put URIs and in free text fields where the
>>>>>> tools are unaware that these are URIs and expect them to 'just work'.
>>>>>>
>>>>>> cheers
>>>>>> stuart
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Sep 21, 2011 at 1:26 AM, Kevin Hawkins
>>>>>> <kevin.s.hawkins at ultraslavonic.info>       wrote:
>>>>>>> I'm not sure about prescribing use of RFC 3492.  This seems to me like
>>>>>>> prescribing use of US-ASCII with character entity references instead of
>>>>>>> UTF-8 within XML documents to ensure that we can use our documents with
>>>>>>> a full range of software toolS -- something that fewer and fewer people
>>>>>>> support doing.
>>>>>>>
>>>>>>> On 9/20/2011 4:49 AM, Stuart A. Yeates wrote:
>>>>>>>> Currently domain names in TEI can occur in typed fields (such as
>>>>>>>> data.pointer) or in many other fields where type checking is more
>>>>>>>> relaxed (or non-existent). I would like to propose the following note
>>>>>>>> to appear somewhere in the standard (I'm thinking the data.pointer
>>>>>>>> page, but I'm open to suggestions). The URL in the example is perhaps
>>>>>>>> the best-known punycode URL (see
>>>>>>>> http://en.wikipedia.org/wiki/Masr_%28domain_name%29 ), but if Arabic
>>>>>>>> script causes problems in the publishing process I can probably find a
>>>>>>>> more Latin-esque one.
>>>>>>>>
>>>>>>>> cheers
>>>>>>>> stuart
>>>>>>>>
>>>>>>>> ----
>>>>>>>>
>>>>>>>> Internationalised domains containing non-ASCII characters should
>>>>>>>> always be escaped using RFC 3492 syntax ("punycode") Thus
>>>>>>>> http://موقع.وزارة-الاتصالات.مصر/ is written
>>>>>>>> http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/ Such escaping
>>>>>>>> permits internationalised domains to be used with a full range of
>>>>>>>> software tools.
>>>>>>>>
>>>>>>>> ----
>>>>>>>> _______________________________________________
>>>>>>>> tei-council mailing list
>>>>>>>> tei-council at lists.village.Virginia.EDU
>>>>>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>>>>>
>>>>>>>> PLEASE NOTE: postings to this list are publicly archived
>>>>>>> _______________________________________________
>>>>>>> tei-council mailing list
>>>>>>> tei-council at lists.village.Virginia.EDU
>>>>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>>>>
>>>>>>> PLEASE NOTE: postings to this list are publicly archived
>>>>> _______________________________________________
>>>>> tei-council mailing list
>>>>> tei-council at lists.village.Virginia.EDU
>>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>>
>>>>> PLEASE NOTE: postings to this list are publicly archived
>>>>
>>>> --
>>>> Martin Holmes
>>>> University of Victoria Humanities Computing and Media Centre
>>>> (mholmes at uvic.ca)
>>>> _______________________________________________
>>>> tei-council mailing list
>>>> tei-council at lists.village.Virginia.EDU
>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>
>>>> PLEASE NOTE: postings to this list are publicly archived
>>> _______________________________________________
>>> tei-council mailing list
>>> tei-council at lists.village.Virginia.EDU
>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>
>>> PLEASE NOTE: postings to this list are publicly archived
>> _______________________________________________
>> tei-council mailing list
>> tei-council at lists.village.Virginia.EDU
>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>
>> PLEASE NOTE: postings to this list are publicly archived
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>
> PLEASE NOTE: postings to this list are publicly archived

-- 
Martin Holmes
University of Victoria Humanities Computing and Media Centre
(mholmes at uvic.ca)


More information about the tei-council mailing list